Before we jump into coding, let’s start by loading the package and
the data that we will be using; the lifeexpect
database.
This data set was simulated using official statistics and research on
life expectancy in the US (see ?lifeexpect
for more
details). Each row corresponds to the observed age of an individual at
the time of disease.
## smoke female age
## 1 0 0 78.10840
## 2 0 1 83.42009
## 3 0 1 85.46404
## 4 1 0 69.83565
## 5 0 1 80.40131
## 6 1 0 72.18115
The data is characterized by the following model:
\[\begin{equation*} y_i \sim \mbox{N}\left(\theta_0 + \theta_{smk}smoke_i + \theta_{fem}female_i, \sigma^2\right) \end{equation*}\]
So the logposterior can be written as:
\[\begin{equation*} \sum_i \log\phi\left(\frac{y_i - (\theta_0 + \theta_{smk}smoke_i + \theta_{fem}female_i)}{\sigma}\right) \end{equation*}\]
Where \(y_i\) is the age of the i-th individual. Using R, we could write the logposterior as follows:
In some cases, the user would like to go beyond what
MCMC()
does. In those cases, we can directly access the
environment in which the main loop of the MCMC
-call is
being executed, using the function ith_step()
.
With ith_step()
, we can access the environment
containing the existing elements while the MCMC loop occurs. Among
these, we have: i
(the step number), logpost
(a vector storing the trace of the unnormalized logposterior),
draws
, (a matrix storing the kernel’s proposed states),
etc. The complete list of available objects is available either in the
manual or when printing the function:
## Available objects via the ith_step() function:
## - i : (int) Step (iteration) number.
## - nsteps : (int) Number of steps.
## - chain_id : (int) Id of the chain (goes from 1 to -nchains-)
## - theta0 : (double vector) Current state of the chain.
## - theta1 : (double vector) Proposed state of the chain.
## - ans : (double matrix) Set of accepted states (it will be NA for rows >= i).
## - draws : (double matrix) Set of proposed states (it will be NA for rows >= i).
## - logpost : (double vector) Value of -fun- (it will be NA for elements >= i).
## - R : (double vector) Random values from U(0,1). This is used with the Hastings ratio.
## - thin : (int) Thinning (applied after the last step).
## - burnin : (int) Burn-in (applied after the last step).
## - conv_checker : (function) Convergence checker function.
## - kernel : (fmcmc_kernel) Kernel object.
## - fun : (function) Passed function to MCMC.
## - f : (function) Wrapper of -fun-.
## - initial : (double vector) Starting point of the chain.
##
## The following objects always have fixed values (see ?ith_step): nchains, cl, multicore
## Other available objects: MCMC_OUTPUT, cnames, funargs, passedargs, progress
For example, sometimes accidents happen, and your computing environment could crash (R, your PC, your server, etc.). It could be a good idea to keep track of the current state of the chain. A way to do this is printing out the state of the process every n-th step.
Using the lifeexpect
data, let’s rewrite the
logpost()
function using ith_step()
. We will
print the latest accepted state every 1,000 steps:
logpost2 <- function(p, D) {
# Getting the number of step
i <- ith_step("i")
# After the first iteration, every 1000 steps:
if (i > 1L && !(i %% 1000)) {
# The last accepted state. Accepted states are
# stored in -ans-.
s <- ith_step()$ans[i - 1L,]
cat("Step: ", i, " state: c(\n ", paste(s, collapse = ", "), "\n)\n", sep = "")
}
# Just returning the previous value
logpost(p, D)
}
Note that the posterior distribution, i.e., accepted states, is
stored in the matrix ans
within the MCMC loop. Let’s use
the Robust Adaptive Metropolis Kernel to fit this model. Since we need
to estimate the standard error, we can set a lower-bound for the
variables. For the starting point, let’s use the vector
[70, 0, 0, sd(age)]
(more than a good guess!):
# Generating kernel
kern <- kernel_ram(warmup = 1000, lb = c(-100,-100,-100,.001))
# Running MCMC
ans0 <- MCMC(
initial = c(70, 0, 0, sd(lifeexpect$age)),
fun = logpost2,
nsteps = 10000,
kernel = kern,
seed = 555,
D = lifeexpect,
progress = FALSE
)
## Step: 1000 state: c(
## 69.9961016297189, -0.00345239203214055, 0.0070167653826948, 5.90378993611787
## )
## Step: 2000 state: c(
## 70.0504366122137, -0.0292853676293392, 0.076260722355002, 5.97687773201868
## )
## Step: 2000 state: c(
## 70.0504366122137, -0.0292853676293392, 0.076260722355002, 5.97687773201868
## )
## Step: 3000 state: c(
## 70.8055075956862, 0.279930458592619, 0.309906484934854, 6.51371636035571
## )
## Step: 3000 state: c(
## 70.8055075956862, 0.279930458592619, 0.309906484934854, 6.51371636035571
## )
## Step: 4000 state: c(
## 74.4132756921824, -1.22342712322437, 2.14666613219242, 5.70679470552763
## )
## Step: 4000 state: c(
## 74.4132756921824, -1.22342712322437, 2.14666613219242, 5.70679470552763
## )
## Step: 5000 state: c(
## 80.0205621762773, -9.83179753872683, 5.3780173228871, 1.99929550970238
## )
## Step: 5000 state: c(
## 80.0205621762773, -9.83179753872683, 5.3780173228871, 1.99929550970238
## )
## Step: 6000 state: c(
## 79.7724276801434, -9.56814688878631, 5.64865694365965, 2.04970696423922
## )
## Step: 6000 state: c(
## 79.7724276801434, -9.56814688878631, 5.64865694365965, 2.04970696423922
## )
## Step: 7000 state: c(
## 79.8413102462821, -9.80272237981217, 5.6332429761754, 1.97489510036092
## )
## Step: 7000 state: c(
## 79.8413102462821, -9.80272237981217, 5.6332429761754, 1.97489510036092
## )
## Step: 8000 state: c(
## 80.1744325184163, -9.79630960138564, 5.38165644273994, 1.99091898768071
## )
## Step: 8000 state: c(
## 80.1744325184163, -9.79630960138564, 5.38165644273994, 1.99091898768071
## )
## Step: 9000 state: c(
## 79.9347937306275, -9.63123904834537, 5.49489884044609, 2.07352104789203
## )
## Step: 9000 state: c(
## 79.9347937306275, -9.63123904834537, 5.49489884044609, 2.07352104789203
## )
## Step: 10000 state: c(
## 80.1499993519617, -9.74186365776425, 5.25019766087275, 1.99986537562651
## )
## Step: 10000 state: c(
## 80.1499993519617, -9.74186365776425, 5.25019766087275, 1.99986537562651
## )
The ith_step()
makes MCMC
very easy to
tailor. Now what happens when we deal with multiple chains?
Using the function ith_step()
could be of real help when
dealing with multiple chains in a single run. In such a case, we can use
the variable chain_id
that can be found with
ith_step()
. From the previous example:
logpost3 <- function(p, D) {
# Getting the number of step
i <- ith_step("i")
# After the first iteration, every 1000 steps:
if (i > 1L && !(i %% 1000)) {
# The last accepted state. Accepted states are
# stored in -ans-.
s <- ith_step()$ans[i - 1L,]
chain <- ith_step("chain_id")
cat("Step: ", i, " chain: ", chain, " state: c(\n ",
paste(s, collapse = ",\n "), "\n)\n", sep = ""
)
}
# Just returning the previous value
logpost(p, D)
}
# Rerunning using parallel chains
ans1 <- MCMC(
initial = ans0,
fun = logpost3, # The new version of logpost includes chain
nsteps = 1000,
kernel = kern, # Reusing the kernel
thin = 1,
nchains = 2L, # Two chains, two different prints
multicore = FALSE,
seed = 555,
progress = FALSE,
D = lifeexpect
)
## Step: 1000 chain: 1 state: c(
## 80.3915443781388,
## -10.0470301689128,
## 5.21253077826112,
## 2.03135644056862
## )
## Step: 1000 chain: 1 state: c(
## 80.3915443781388,
## -10.0470301689128,
## 5.21253077826112,
## 2.03135644056862
## )
## Step: 1000 chain: 2 state: c(
## 80.1338926312614,
## -9.83033416596521,
## 5.34896148581804,
## 1.9961282989995
## )
## Step: 1000 chain: 2 state: c(
## 80.1338926312614,
## -9.83033416596521,
## 5.34896148581804,
## 1.9961282989995
## )
Using ith_state()
increases the computational burden of
the process. Yet, since most of the load lies on the objective function
itself, the additional time can be neglected.
Another thing the user may need to do is storing data as the MCMC
algorithm runs. In such cases, you can use the
set_userdata()
function, which, as the name suggests, will
store the required data.
For a simple example, suppose we wanted to store the proposed state, we could do it in the following way:
logpost4 <- function(p, D) {
# Timestamp
set_userdata(
p1 = p[1],
p2 = p[2],
p3 = p[3]
)
# Just returning the previous value
logpost(p, D)
}
# Rerunning using parallel chains
ans1 <- MCMC(
initial = ans0,
fun = logpost4, # The new version of logpost includes chain
nsteps = 1000,
kernel = kern, # Reusing the kernel
thin = 10, # We are adding thinning
nchains = 2L, # Two chains, two different prints
multicore = FALSE,
seed = 555,
progress = FALSE,
D = lifeexpect
)
In this case, since nchains == 2
, MCMC
will
store a list of length two with the user data. To retrieve the generated
data frame, we can call the function get_userdata()
. We can
also inspect the MCMC_OUTPUT
as follows:
## Last call to MCMC holds the following elements:
## Chain N: 1
## draws : num [1:100, 1:4] 80.5 80 80.3 80.4 80 ...
## logpost : Named num [1:100] -2151 -2150 -2142 -2142 -2122 ...
## Chain N: 2
## draws : num [1:100, 1:4] 80.1 80.2 80.3 79.9 79.9 ...
## logpost : Named num [1:100] -2127 -2125 -2124 -2122 -2125 ...
##
## Including the following userdata (use -get_userdata()- to access it):
## Chain N 1
## p1 p2 p3
## 10 80.49477 -9.387659 4.783305
## 20 79.97738 -9.806571 4.881653
## 30 80.33199 -10.182043 5.669943
## 40 80.39235 -9.788603 5.192905
## 50 79.98730 -9.608717 5.231799
## 60 80.60093 -9.843509 5.278163
## ... 94 more...
## Chain N 2
## p1 p2 p3
## 10 80.13857 -9.772963 5.038519
## 20 80.18688 -10.174983 5.463652
## 30 80.32215 -9.929603 5.138446
## 40 79.93485 -9.581106 5.514230
## 50 79.90929 -9.502281 5.229895
## 60 80.04301 -9.991739 5.558207
## ... 94 more...
## List of 2
## $ :'data.frame': 100 obs. of 3 variables:
## ..$ p1: num [1:100] 80.5 80 80.3 80.4 80 ...
## ..$ p2: num [1:100] -9.39 -9.81 -10.18 -9.79 -9.61 ...
## ..$ p3: num [1:100] 4.78 4.88 5.67 5.19 5.23 ...
## $ :'data.frame': 100 obs. of 3 variables:
## ..$ p1: num [1:100] 80.1 80.2 80.3 79.9 79.9 ...
## ..$ p2: num [1:100] -9.77 -10.17 -9.93 -9.58 -9.5 ...
## ..$ p3: num [1:100] 5.04 5.46 5.14 5.51 5.23 ...