This vignette will walk you through the procedure of estimating
effective migration surfaces from single-nucleotide polymorphism data
(SNPs). It is designed to fit within the spatial population genetics
ecosystem and to be compatible in its input format with other
R packages such as conStruct.
The best introduction to the EEMS method itself is the original
article by Petkova et al. doi:10.1038/ng.3464, a preprint of which is also
available on bioRxiv
(https://www.biorxiv.org/content/10.1101/011809v2.full).
Both the preprint version of the original article and the documentation
of the original command-line utility are also available on their
GitHub repository and contain many high-resolution examples of what
the EEMS outplot plots look like.
At the beginning of this process, we assume you have an allele
frequency matrix freqs and a coordinate
matrix coords in your workspace. This package includes
example objects ex.freqs and ex.coords.
The simplest way to run an EEMS analysis is by using the
eems() function of this package. It can be invoked with
only minimal arguments:
This runs an EEMS analysis for a set of diploid individuals on a
single MCMC chain for 2000000 iterations, and with a default area
outline derived from the coordinates themselves by adding a buffer
around them. It also assumes a default number of demes 6 times the
number of individuals. The results will be stored in an output directory
full/path/to/file/prefix-chain1. The return value of the
function is the output directory, so it is easy to retrieve.
Those defaults are designed to ensure an adequate resolution of the analysis, but they may not be suitable for your exact application context. In that case, you can set those parameters manually by specifying additional arguments. The most useful ones are illustrated below, where the MCMC chain has been shortened by an order of magnitude for a more exploratory analysis on haploid samples. It also shows that the seed can be set manually to ensure reproducibility.
eems(
seed = 1024,
freqs = freqs,
coords = coords,
mcmcpath = "/full/path/to/directory/prefix",
diploid = FALSE,
numMCMCIter = 100000,
numBurnIter = 50000,
numThinIter = 999
) Usually, it is advisable to run more than one MCMC chain and combine
the results. eems() allows you to do so either sequentially
or in parallel. This example will run 5 chains in parallel with default
settings, although it will not use more workers than there are cores
available. Note that when run in parallel, the chains will run silently
and console output is redirected to a log file which you can find among
the other output files in the output directory.
eems(
freqs = freqs,
coords = coords,
mcmcpath = "full/path/to/directory/prefix",
nChains = 5,
parallel = TRUE
) The output will then be in 5 separate directories, called
/full/path/to/directory/prefix-chain1 to
full/path/to/directory/prefix-chain5.
Sometimes, one wants to use a more complex habitat which cannot be
described as everything within a single outline. In those cases, we can
use the function grid.make() to create a grid that has both
boundary points and holes. This grid can then be passed to
eems() using the optional parameters demes and
edges. The following code loads an example outline and
places two holes inside it before running an EEMS analysis with the
input.
data_path <- system.file("extdata", package = "reems")
input <- file.path(data_path, "barrier-schemeX-nIndiv300-nSites3000")
outer <- read.table(paste0(input, ".outer"))
hole1 <- data.frame(V1 = c(2., 5., 5., 2., 2.), V2 = c(2., 2., 5., 5., 2.))
hole2 <- data.frame(V1 = c(6.5, 10., 8., 6.5), V2 = c(2.5, 5., 5., 2.5))
# Create the new grid with holes
grid <- grid.make(outer = outer, holes = list(hole1, hole2), nDemes = 300)
eems(
freqs = freqs,
coords = coords,
mcmcpath = "full/path/to/directory/prefix",
demes = grid$demes,
edges = grid$edges,
nChains = 5,
parallel = TRUE
) Various additional parameters can be set with additional arguments to
eems(), documentation of which can be found in the function
documentation for eems(). This includes the various
hyperparameters available for tuning.
Regarding those, in the documentation of the EEMS
command-line utility, available at https://github.com/dipetkov/eems/blob/master/Documentation/EEMS-doc.pdf,
Petkova writes the following:
Choosing values for the proposal variances is something of a dark art– the goal is to choose the parameters so that proposals are accepted about 20% 30% of the time. In practice, it seems to be sufficient to choose the variances so that proposals are not accepted too rarely (less than 10% of the time) or too often (more than 40% of the time).
Suppose that we run EEMS (say, with the default values for all proposal distributions) and at the end EEMS reports the following information about the frequency of accepting proposals of different types.
1 (2050/2593) = 79.06% for type "qTileRate"
## change the rate of one tile in the diversity tessellation
2 (1414/2478) = 57.06% for type "qTileMove"
## move the seed of one tile in the diversity tessellation
3 (1047/2494) = 41.98% for type "qBirthDeath"
## add/remove one tile in the diversity tessellation
4 (20625/47530) = 43.39% for type "mTileRate"
## change the rate of one tile in the migration tessellation
5 (15486/50098) = 30.91% for type "mMeanRate"
## change the overall log10 migration rate
6 (22352/47532) = 47.03% for type "mTileMove"
## move the seed of one tile in the migration tessellation
7 (26305/47275) = 55.64% for type "mBirthDeath"
## add/remove one tile in the migration tessellation
It seems that qTileRate and qTileMove are
accepted too often. So it might be a good idea to increase
qEffctProposalS2 and qSeedsProposalS2. [If the
proposal variance is higher, then bigger changes/moves will be proposed
and hence the updates will be accepted less often.]
Finally, the parameter qVoronoiPr specifies how often to
propose updating the diversity Voronoi tessellation. Its default value
is 0.05, which means that about 5% of the time
the proposals are about the diversity tessellation and about
95% of the time the proposals are about the migration
tessellation.
The output of the EEMS analysis performed with eems()
can be plotted using eems.plots(), which produces several
figures:
mrates01 is the effective migration surface. This
contour plot visualizes the estimated effective migration rates \(m\), on the log10 scale after mean
centering.
mrates02 shows posterior probabilities \(P(m > 0 | \texttt{diffs})\) and \(P(m < 0 | \texttt{diffs})\) for each
location in the habitat. Since migration rates are visualized on the
log10 scale after mean centering, 0 corresponds to the overall mean
migration rate. This contour plot emphasizes regions with effective
migration that is significantly higher/lower than the overall
average.
qrates01 plots the effective diversity surface. This
contour plot visualizes the estimated effective diversity rates \(q\), on the log10 scale after mean
centering.
qrates02 shows the posterior probabilities \(P(q > 0 | \texttt{diffs})\) and \(P(q < 0 | \texttt{diffs})\). Similar to
mrates02 but applied to the effective diversity
rates.
rdist01 is a scatter plot of the observed vs the
fitted between-deme component of genetic dissimilarity, where one point
represents a pair of sampled demes.
rdist02 is a scatter plot of the observed vs the
fitted between-deme component of genetic dissimilarity, where one point
represents a sampled deme.
rdist03 is a scatter plot of observed genetic
dissimilarities between demes vs observed geographic distances between
demes.
pilogl01 is the posterior probability trace
plot.
The mrates and qrates figures visualize
(properties of) the effective migration and diversity rates across the
habitat. The other figures can help to check that the MCMC sampler has
converged (the trace plot pilogl) and that the EEMS model
fits the data well (the scatter plots of genetic dissimilarities
rdist).
The function eems.plots() will work given the results
from a single EEMS run (one directory in mcmcpath) but it
is better to run EEMS several times, randomly initializing the MCMC
chain each time.
Hence, eems.plots() takes as input a vector of EEMS
output directories. If the chains have been obtained from a single call
of eems() with nChains greater than 1, or if
only a single run is to be plotted, one can use the return value of
eems(), which is precisely this vector of output
directories. The other required arguments for eems.plots()
are plotpath, which is the full path to a file prefix that
will then be completed by the names of the plots given above, and
longlat, a logical value that is set TRUE if
the X coordinate is the longitude and the Y coordinate the latitude, and
FALSE otherwise.
eems_results <- eems(
freqs = freqs,
coords = coords,
mcmcpath = "/full/path/to/directory/prefix",
nChains = 5,
parallel = TRUE
)
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/prefix",
longlat = TRUE
)Various optional parameters add additional plotting functionality, which we will illustrate with a series of examples.
We can add the original sampling locations on top of the contour plot.
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/with-sampling",
longlat = TRUE,
m.plot.xy = {
points(coord, col = "purple")
},
q.plot.xy = {
points(coord, col = "purple")
}
)We can flip the x and y axis, i.e., assume that the x coordinate is the latitude and the y coordinate is the longitude.
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/axes-flipped",
longlat = FALSE
)We can also control the output size and format. The code below
generates all figures as PDF, with the mrates and
qrates plots of height 12 inches and width 10 inches.
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/output-PDFs",
longlat = TRUE,
plot.height = 12,
plot.width = 10
)The following code will generate all figures as PNG, with the
mrates and qrates plots of height 9 inches,
width 8 inches and resolution 600 dots per inch.
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/output-PNGs",
longlat = TRUE,
plot.height = 9,
plot.width = 8,
res = 600,
out.png = TRUE
)We can set various colors and shapes for the outline, the grid and the demes.
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/demes-and-edges",
longlat = TRUE,
add.grid = TRUE,
col.grid = "gray90",
lwd.grid = 2,
add.outline = TRUE,
col.outline = "blue",
lwd.outline = 5,
add.demes = TRUE,
col.demes = "red",
pch.demes = 5,
min.cex.demes = 0.5,
max.cex.demes = 1.5
)If we have the RColorBrewer package, we can also use a
divergent Red to Blue color scheme instead of the default DarkOrange to
Blue color scheme.
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/new-eems-colors",
longlat = TRUE,
eems.colors = RColorBrewer::Rbrewer.pal(11, "RdBu")
)In a similar vein, we can specify the color scales for the migration and diversity rates.
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/fix-colscales",
longlat = TRUE,
add.outline = TRUE,
col.outline = "gray",
m.colscale = c(-3, 3),
q.colscale = c(-0.3, +0.3)
)Often, especially when dealing with larger spatial scales, it is helpful to produce contour plots that follow a certain cartographic projection, in this case the Mercator projection.
projection_none <- "+proj=longlat +datum=WGS84"
projection_mercator <- "+proj=merc +datum=WGS84"
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/cartographic-projections",
longlat = TRUE,
projection.in = projection_none,
projection.out = projection_mercator
)Using the additional packages rworldmap and
rworldxtra, we can even add a high-resolution geographic
map as a backdrop to our plots. Setting the corresponding option
add.map = TRUE requires both those packages, and also
requires that projection.in is specificed.
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/geographic-map",
longlat = TRUE,
projection.in = projection_none,
projection.out = projection_mercator,
add.map = TRUE,
col.map = "black",
lwd.map = 5
)We can explicitly add a specific map by passing the shape file.
map_world <- rworldmap::getMap()
map_africa <- map_world[which(map_world@data$continent == "Africa"), ]
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/shapefile",
longlat = TRUE,
m.plot.xy = {
plot(map_africa, col = NA, add = TRUE)
},
q.plot.xy = {
plot(map_africa, col = NA, add = TRUE)
}
)Combining both, we can apply the Mercator projection and add the map of Africa. We must be careful to apply the same projection to the map as well.
map_africa <- sp::spTransform(map_africa, sp::CRS(projection_mercator))
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/shapefile-projected",
longlat = TRUE,
projection.in = projection_none,
projection.out = projection_mercator,
m.plot.xy = {
plot(map_africa, col = NA, add = TRUE)
},
q.plot.xy = {
plot(map_africa, col = NA, add = TRUE)
}
)Similarly, we can add points, lines, labels, etc. Here is how to add a set of colored “labels” on top of the migration/diversity rates and the Africa map. Note that we also need to project any points we want to plot.
coords <- matrix(c(
-10, 10,
10, 10,
30, 0,
40, -10,
30, -20
), ncol = 2, byrow = TRUE)
colors <- c("red", "green", "blue", "purple", "orange")
labels <- LETTERS[1:5]
coords_merc <- sp::spTransform(
sp::SpatialPoints(coords, sp::CRS(projection_none)),
sp::CRS(projection_mercator)
)
coords_merc <- coords_merc@coords
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/labels-projected",
longlat = TRUE,
projection.in = projection_none,
projection.out = projection_mercator,
m.plot.xy = {
text(coords_merc, col = colors, pch = labels, font = 2)
},
q.plot.xy = {
text(coords_merc, col = colors, pch = labels, font = 2)
}
)Lastly, we can also use eems.plots to compute the
migration and diversity rates at specific points. They will then be
saved in an RData file together with the figures.
xy.coords <- matrix(c(
13.70, 3.20,
37.10, -7.20,
36.10, -4.10,
34.58, -5.67
), ncol = 2, byrow = TRUE)
eems.plots(
mcmcpath = eems_results,
plotpath = "/full/path/to/file/default",
longlat = TRUE,
xy.coords = xy.coords
)
load(paste0("/full/path/to/file/default", "-rdist.RData"))The RData file contains xym.values and
xyq.values. xym.values store the migration
rate for each coordinate, on the log10 scale and after mean-centering.
Similarly, xyq.values stores the diversity rate for each
coordinate, on the log10 scale and after mean-centering.
An overview of all the parameters to eems.plots() can be
found in its function documentation.
Besides the two main user-facing functions eems() and
eems.plots(), which should suffice for the more common use
cases, this package also provides some additional functions, which we
will just describe very briefly here. More information about them can be
found in their respective function documentation.
eems.from.files() runs an EEMS analysis from input
data available as tabular files on disk, and corresponds precisely to
the original EEMS utility available at https://github.com/dipetkov/eems.
eems.population.grid() plots the population grid,
with all edges in the same color, to visualize the grid before
estimating migration and diversity rates. This is sometimes useful for
troubleshooting issues with grid connectivity.
eems.voronoi.samples() and
eems.posterior.draws() both take random samples from the
posterior distribution of the migration and diversity rates and plot
them to visualize the posterior variance (in a slightly different
way).
grid.make() constructs a regular triangular grid
with circa nDemes vertices within a habitat outline. It can also be used
to make a more complex habitat with holes.
outer.buffered() computes the boundary polygon of a
habitat encompassing a set of points, while keeping a specific buffer
distance from the supplied points. This is used internally by
eems() to create a default habitat, but can also be used
directly to make the outer boundary of a more complex habitat.