Baorista is an R package that provides robust alternatives to aoristic analyses based on parametric and non-parametric Bayesian inference. The package consists of a series of utility and wrapper functions as well as custom statistical distributions for the NIMBLE probabilistic programming language. This documents provides a short guide for the key steps required to setup datasets and execute the core functionalities of baorista.
Stable version of Baorista is available directly on CRAN
while development version can be installed via devtools
with the following command:
library(devtools)
install_github('ercrema/baorista')
library(baorista)
## Loading required package: nimble
## nimble version 1.2.0 is loaded.
## For more information on NIMBLE and a User Manual,
## please visit https://R-nimble.org.
##
## Note for advanced users who have written their own MCMC samplers:
## As of version 0.13.0, NIMBLE's protocol for handling posterior
## predictive nodes has changed in a way that could affect user-defined
## samplers in some situations. Please see Section 15.5.1 of the User Manual.
##
## Attaching package: 'nimble'
## The following object is masked from 'package:stats':
##
## simulate
Please note that in order to execute all commands you need a working C++ compiler. For more further details please read the nimble package documentation
baorista can read two types of data: a)
data.frame
class objects containing two fields recording
the start and end date of the time-spans of existence recorded in BP,
and b) matrices containing the probability of existence for each event
(row) at each time-block (column). Sample datasets of the two formats
are provided:
data(sampledf) #two column data.frame format
data(samplemat) #events x time-block matrix format
The function createProbMat()
creates an object of class
probMat
which standardise the data format and includes
additional information such as chronological range and resolution:
<- createProbMat(x=sampledf,timeRange=c(6500,4001),resolution=50) #50 year timeblock ranging from 6500 to 4001 (i.e. 6500-6451,6450-6401,...)
x.df <- createProbMat(pmat=samplemat,timeRange=c(5000,3001),resolution=100) x.mat
The plot()
function displays probMat
class
object using aoristic sums:
par(mfrow=c(1,2))
plot(x.df,main='x.df')
plot(x.mat,main='x.mat')
baorista contains functions for running two types of
Bayesian analyses: parametric and non-parametric. Parametric analyses
requires the selection of a suitable growth model (each one is currently
a separate function) and provides estimates of key model parameters.
Non-parametric models employs an intrinsic conditional auto-regressive
(ICAR) Gaussian model which enables the calculation of the probability
mass function (i.e. it estimates the probability of any event
occurring at each timeblock, effectively recovering the shape of the
time-frequency distribution).
In all cases baorista provides a wrapper function
which internally initialises and runs an MCMC via the NIMBLE package.
These function allow for user-defined MCMC settings (e.g. number of
iterations, number chains, etc.), samplers, and priors and automatically
assesses convergence statistics.
The most straightforward model is a truncated discrete exponential
distribution. Users can fit the data to such distribution and estimate
growth rates using the function expfit()
.
<- expfit(x.mat) #fitting using default MCMC/Prior settings exponential.fit
## Compiling nimble model...
## ===== Monitors =====
## thin = 1: r
## ===== Samplers =====
## RW sampler (1)
## - r
## thin = 1: r
## running chain 1...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
## running chain 2...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
## running chain 3...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
## running chain 4...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
# expfit(x.mat,rPrior='dnorm(mean=0,sd=1)') #Using a Gaussian prior with mean 0 and sd 1 for the growth rate r
# expfit(x.mat,niter=50000) #Fitting the model with 50k iterations
# expfit(x.mat,nchains=4,parallel=T) #Running 4 chains in parallel
Summary statistics on the posterior parameters and the MCMC
diagnostics can be retrieved using the summary()
function:
summary(exponential.fit)
## Parameter Posterior Median 95% HPDI Rhat
## r r 0.002074402 0.00176572362700897~0.00238598571702006 1.000405
## ESS
## r 20142.04
Alternatively values can be extracted directly:
#Compute 90% HPDI with coda package
library(coda)
mcmc(exponential.fit$posterior.r) |> HPDinterval(prob=0.90)
## lower upper
## r 0.00181297 0.002335741
## attr(,"Probability")
## [1] 0.9
#Plot histogram of posterior
hist(exponential.fit$posterior.r[,1],xlab='r',main='Posterior of Growth Rate')
A dedicated plot function can also be used to visualise the fitted exponential model:
plot(exponential.fit)
baorista offers also a non-parametric model based Random
Walk ICAR. The function icarfit()
estimates the probability
mass for each time-block accounting for the information from adjacent
blocks (and hence temporal autocorrelation). Given the larger number of
parameters, the MCMC requires longer chains and the execution over
multiple cores is recommended. Convergence issues are also more likely
to arise, in which case adjust running the model with longer chains,
changing the sampler, or adjusting the priors is recommended.
# The following reaches convergence but takes a couple of hours:
# non.param.fit <- icarfit(x.df,niter=2000000,nburnin=1000000,thin=100,nchains=4,parallel=TRUE)
# Shorter number of iterations (does not reach convergence)
<- icarfit(x.df,niter=100000,nburnin=50000,thin=50,nchains=4,parallel=TRUE) non.param.fit
## Running in parallel - progress bar will no be visualised
## Warning in icarfit(x.df, niter = 1e+05, nburnin = 50000, thin = 50, nchains =
## 4, : Highest Rhat value above 1.01 (1.012). Consider rerunning the model with a
## higher number of iterations
A dedicated function can display the HPDI of the probability mass of each time-block:
plot(non.param.fit)