First, the library and the example data is loaded.
library(twDEMC)
data(twLinreg1)
set.seed(0815) # for reproducable results
The simple example model is just a linear regression. The model functions takes two arguments, a vector theta
with entries “a” and “b”, and a vector xval
with the values of the predictor. The output is a vector of predictions.
dummyTwDEMCModel
## function (theta, xval)
## {
## theta["a"] + theta["b"] * xval
## }
## <environment: namespace:twDEMC>
In order to calculate the model-data misfit one can use cost function logDenGaussian
and provide the information on observations and their uncertainty and, optionally, the prior information on parameters in an argument list. For this example these are all provided with list twLinreg1
.
The cost function can then be called with additional parameter vector as the first argument.
The return value lists all the components of the misfit, or more precisely the log-Density = -½ misfit. Here obs
is the misfit to
the single data stream of observations, parms
is the misfit to the prior information on the parameters.
argsFLogDen <- with( twLinreg1, list(
fModel=dummyTwDEMCModel, ### the model function, which predicts the output based on theta
obs=obs, ### vector of data to compare with
invCovar=invCovar, ### the inverse of the Covariance of obs (its uncertainty)
thetaPrior = thetaTrue, ### the prior estimate of the parameters
invCovarTheta = invCovarTheta, ### the inverse of the Covariance of the prior parameter estimates
xval=xval ### further arguments to the model, here the vector of predictors
))
do.call( logDenGaussian, c( list(theta=c(a=10.8,b=5.2)), argsFLogDen ))
## obs parms
## -16.29288 -1.60000
Specifying the cost function and its additional arguments, is the main ingredient for fitting the model. Here, only a single block of parameters is optimized. Therefore, there is only one entry in the list of logDensity functions:
dInfos=list(den1=list(fLogDen=logDenGaussian, argsFLogDen=argsFLogDen))
For the generation of an initial population, a mean and covariance for the parameters need to be specified. Other parameters control the temperature decrease. The number of observations in each data stream needs to be specified.
mcPops <- twDEMCSA(
dInfos=list(den1=list(fLogDen=logDenGaussian, argsFLogDen=argsFLogDen)),
theta=twLinreg1$theta0, covarTheta=diag(twLinreg1$sdTheta^2), # for generating an initial population
ctrlT=list(TFix=c(parms=1)), # do not use increased temperature for priors
nObs=c(obs=length(argsFLogDen$obs)) # number of records in observation data stream(s)
)
## Calculating logDensity for 64 initial states for m0= 4 rows per population.
## [1] "initial T=2.5,1 for 48 generations Thu Feb 12 18:05:45 2015"
## ............
## [1] "finished initial 48 out of 512 gens. Thu Feb 12 18:05:45 2015"
## [1] "gelmanDiagPops=1.2 max(gelmanDiagPop)=1.5 max(specVarRatio)=2.5 T0=0.051 logDen=-16.9"
## [1] "doing 128 generations to TEnd0=0.051, TEnd=2.5,1"
## ................................
## [1] "finished 128 out of 464 gens. Thu Feb 12 18:05:46 2015"
## relTChange= -0.042
## [1] "twDEMCSA: Maximum Temperture change only -4.2% and no drift in logDensity (lden=-17.2). Finishing early."
## [1] "gelmanDiagPops=1.1 max(gelmanDiagPop)=1.1 max(specVarRatio)=2.6 T0=0.051 logDen=-17.2"
During the sampling, some information on the progress is displayed. The sampling is ended before reaching the maximum number of steps, because Temperature change is negligible and convergence diagnostics are ok.
The resulting object is of class twDEMCPops
. To plot the sample, it can be converted to a standard mcmc.list
object
rescoda <- as.mcmc.list(mcPops)
plot(rescoda, smooth=FALSE)
If there is a decrease in temperature or a visible burnin period, the converged chains need to be sampled at the current temperature by function twDEMCBlock.twDEMCPops
.
One needs to specify the number of generations. If one wants, e.g., 256 samples, one has to regard the thinning interval (only after a number of generations a value is stored to avoid autocorrelation of the sample). Note also that nPops times nChains are sampled in parallel.
mcPopsConv <- twDEMCBlock( mcPops, nGen=ceiling(256 * mcPops$thin / getNChains(mcPops)), extendRun=FALSE )
The actual parameter samples as an array across chains can be obtained by function stackChains
. Each row is a sampled parameter vector. The first column reports their log-Density.
ss <- stackChains(mcPopsConv)
summary(ss)
## den1 a b
## Min. :-21.70 Min. : 8.582 Min. :4.366
## 1st Qu.:-17.35 1st Qu.: 9.601 1st Qu.:4.775
## Median :-16.57 Median : 9.888 Median :4.939
## Mean :-16.90 Mean : 9.924 Mean :4.948
## 3rd Qu.:-16.11 3rd Qu.:10.238 3rd Qu.:5.145
## Max. :-15.72 Max. :11.644 Max. :5.572
The estimate of target aggregated temperature T0
has high uncertainty. If one can assume that the the model structural error is negligible, then Temperature can be decreased to 1.
# decrease Temp to exponentially to 1
mcPopsT1 <- twDEMCBlock( mcPops, TEnd = 1, nGen=ceiling(256 * mcPops$thin / getNChains(mcPops)) )
# sample at this temperature
mcPopsT1Conv <- twDEMCBlock( mcPopsT1, nGen=ceiling(256 * mcPops$thin / getNChains(mcPops)), extendRun=FALSE )
summary( stackChains(mcPopsT1Conv) ) # difference to mcPopsConv negligible