Most real-world ecological studies are characterized by imperfect detectability, i. e. the inability to detect a species or taxon despite its presence in a location. Imperfect detectability is a potential source of bias that must be avoided or at least estimated, particularly since it influences estimates of colonization and extinction. Unfortunately, it is not always possible to avoid or estimate the effects of imperfect detectability. We should be cautious in interpreting estimates derived from the methods that assume perfect detectability. However, when we have a replicated sampling design we can account for detectability while estimating colonization and extinction rates (MacKenzie et al. 2003).
MacKenzie (2003) presents a likelihood function to estimate site occupancy, colonization, and local extinction when a species is detected imperfectly. The method relies on replicate observations per sampling time. The implementation of this likelihood is not trivial because there might be several underlying colonization-extinction trajectories that are compatible with the same observed detection history. For example, a detection history such as \(\lbrace 101 \ 100 \rbrace\) means that in the first sampling time we have three replicates, \(101\), where we detected our hypothetical species twice, and a second sampling time, where we observed \(100\), this is, we detected the species only once. Since we detected it at least once at both time 0 and time 1, there is only one underlying colonization-extinction trajectory compatible with it, which, we take the convention of collapsing it into \(( 1 \ 1 )\). However, imagine we fail to detect the species at time 1, being then our detection history \(\lbrace 101 \ 000 \rbrace\). In this case, there are two underlying trajectories that are both compatible with this observation, since the species could have or could have not gone extinct at time 1. These are \(( 1 \ 1 )\) and \(( 1 \ 0 )\). Therefore, the probability of the observed detection history \(\lbrace 101 \ 000 \rbrace\) should sum over the two ways in which that detection history could have been observed, either through the trajectory \((1 \ 1 )\) or \((1 \ 0)\). For simplicity, let us analyse first what is the probability for the observed detection history \(\lbrace 101 \ 100 \rbrace\). The first sampling time always considers the probability of the species being present at the site, \(P_0\), as the fourth model parameter, and given that, the probability of making two out of three possible detections, \(d^2·(1-d)\). The probability of being also present at the time 1 given that the species was present at time 0 is given by \(T_{11}\), and given that, the probability of making only one out three possible detections is \(d · (1-d)^2\), where \(d\) is the detectability per replicate or probability of detecting a species when is present per observation. Taking all together, this leads us to the following probability for the full detection history: \[Pr(\lbrace 101 \ 100 \rbrace) = P_0 · d^2·(1-d)·T_{11} · d · (1-d)^2\] Now, let us examine the detection history \(\lbrace 101 \ 000 \rbrace\). As mentioned, we have two possibilities for the second sampling time: the species could be present and have not been detected or could have been truly absent. Notice then that the probability of the full detection history should sum over the two underlying colonization-extinction histories, \(\lbrace 1 \ 1 \rbrace\) and \(\lbrace 1 \ 0 \rbrace\). It would be: \[Pr(\lbrace 101 \ 000 \rbrace) = P_0 · d^2·(1-d) · T_{11} · (1-d)^3 + P_0 · d^2·(1-d) · T_{10} \] where \(T_{10}\) is the probability of colonization.
As a final example, consider the detection history \(\lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace\). This detection history can be produced by four underlying colonization-extinction trajectories. These are: \((1 \ 1 \ 1\ 1\ 1)\), \((1 \ 0 \ 1\ 1\ 1)\), \((1 \ 1 \ 1\ 0\ 1)\), \((1 \ 0 \ 1\ 0\ 1)\). The probability of this detection history should sum over these four possible underlying colonization-extinction trajectories because all are compatible with it. Below we detailed the four conditional probabilities:
\[Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace | ( 1 \ 1 \ 1\ 1\ 1 ) ) = P_0·d·(1-d)^2 · T_{11}·(1-d)^3 · T_{11}·d^2·(1-d) · T_{11}·(1-d)^3 · T_{11}·d^3 \]
\[Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace | ( 1 \ 0 \ 1\ 1\ 1 ) ) = P_0·d·(1-d)^2 · T_{01} · T_{10}·d^2·(1-d) · T_{11}·(1-d)^3 · T_{11}·d^3 \]
\[Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace | ( 1 \ 1 \ 1\ 0\ 1 ) ) = P_0·d·(1-d)^2 · T_{11}·(1-d)^3 · T_{11}·d^2·(1-d) · T_{01} · T_{10}·d^3 \]
\[Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace | ( 1 \ 0 \ 1\ 0\ 1 ) ) = P_0·d·(1-d)^2 · T_{01} · T_{10}·d^2·(1-d) · T_{01} · T_{10}·d^3 \]
The algorithm implemented in island would sum over these four conditional probabilities to calculate the total probability for the initial detection history,\(Pr( \lbrace 001 \ 000 \ 101 \ 000 \ 111 \rbrace )\). Please note that, for real-life examples, when a species goes fully undetected for many sampling times, the full total sum becomes unfeasible because the number of compatible trajectories undergoes rapidly a combinatorial explosion. This may happen in practice if detectability per replicate is very low. In this case, only approximated likelihoods can be given. Alternatively, one could get around this problem by redesigning the full survey and taking more replicates per sampling time. As we have discussed in the main vignette of the package, transition probabilities \(T_{00}, \ T_{10}, \ T_{01}, \ T_{11}\) are functions of the rates \(c\) and \(e\) for a given time interval \(dt\) between observations. Therefore, we have all the elements required to estimate the likelihood of any detection history, even if time intervals between observations vary, which allows to find maximum likelihood estimates for the four model parameters, colonization and extinction rates, \(c\) and \(e\), along with the detectability, \(d\), and the probability of initial presence, \(P_0\).
In order to estimate detectability, we need to provide
presence-absence data with replicated samples for the same sampling
time, as in the example below extracted from data set
lakshadweepPLUS
, where column X2000 and X2000.1 correspond
to two replicate transects sampled in the same year. In addition, the
data can have groups that can be treated as levels of a factor, as in
column “Guild”.
Species | Atoll | Guild | X2000 | X2000.1 | X2001 | X2001.1 | X2001.2 | X2001.3 |
---|---|---|---|---|---|---|---|---|
Acanthurus_auranticavus | AGATHI | Algal_Feeder | 1 | 1 | 1 | 0 | 1 | 0.1 |
Acanthurus_leucosternon | AGATHI | Algal_Feeder | 1 | 1 | 1 | 0 | 0 | 0.1 |
Acanthurus_lineatus | AGATHI | Algal_Feeder | 1 | 1 | 1 | 0 | 0 | 0.1 |
Acanthurus_nigrofuscus | AGATHI | Algal_Feeder | 0 | 0 | 0 | 0 | 0 | 0.1 |
Acanthurus_thompsoni | AGATHI | Zooplanktivore | 0 | 0 | 0 | 0 | 0 | 0.1 |
Acanthurus_triostegus | AGATHI | Algal_Feeder | 1 | 1 | 0 | 1 | 1 | 0.1 |
Functions sss_cedp
, mss_cedp
allow the
estimation of colonization and extinction rates with imperfect
detectability with simple and multiple sampling schemes, respectively.
The function sss_cedp
allows estimation for a single
sampling scheme with repeated measures that has to be specified with
arguments Time
, that contains the unique sampling times,
and argument Transects
that specifies the number of of
transects per sampling time. By contrast, mss_cedp
allows
the estimation of rates with perfect or imperfect detectability for
multiple sampling schemes, via the use of flags for missing values
specified by argument MV_FLAG
, for the whole data set or
groups of factors. A full sampling scheme should be specified with
argument Time
, which is internally used to calculate the
particular sampling schemes associated to each separate row with the
help of the missing value flags on the columns that have not been
sampled. In the next example, we use data sets lakshadweep
and lakshadweepPLUS
to demonstrate the use of the previous
functions. These data sets are extensions of data set
alonso15
, and include raw data (information of up to 4
transects per atoll at each sampling time, and additional samples for
2012 and 2013). Transects are considered as replicates.
lakshadweepPLUS
differs in marking missing data with a
flag, combining the data for the three atolls in a single
data.frame
.
### Using sss_cedp
Data1 <- lakshadweep[[1]]
Name_of_Factors <- c("Species","Atoll","Guild")
Factors <- Filter(is.factor, Data1)
No_of_Factors <- length(Factors[1,])
n <- No_of_Factors + 1
D1 <- as.matrix(Data1[1:nrow(Data1),n:ncol(Data1)])
Time <- as.double(D1[1,])
P1 <- as.matrix(D1[2:nrow(D1),1:ncol(D1)])
Time_Vector <- as.numeric(names(table(Time)))
Transects <- as.numeric((table(Time)))
R1 <- sss_cedp(P1, Time_Vector, Transects,
Colonization=0.5, Extinction=0.5, Detectability=0.5,
Phi_Time_0=0.5,
Tol=1.0e-8, Verbose = F)
knitr::kable(unlist(R1))
x | |
---|---|
C | 0.3212542 |
E | 0.2007582 |
D | 0.4980380 |
P | 0.5082734 |
NLL | 2200.7687093 |
### Using mss_cedp
Data <- lakshadweepPLUS[[1]]
Guild_Tag = c("Alg","Cor","Mac","Mic","Omn","Pis","Zoo") # In alphabetical order.
Time <- as.vector(c(2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002,
2002, 2003, 2003, 2003, 2003, 2010, 2010, 2011, 2011, 2011, 2011, 2012,
2012, 2012, 2012, 2013, 2013, 2013, 2013))
R2 <- mss_cedp(Data, Time, Factor=3, Tags=Guild_Tag, PerfectDetectability=FALSE, z=4)
#> Group 0 (Alg): NLL (Col = 0.544756, Ext = 0.167592, Dtc = 0.598818, P_0 = 0.662212) = 1399.03
#> Group 1 (Cor): NLL (Col = 0.366281, Ext = 0.216484, Dtc = 0.49887, P_0 = 0.529949) = 817.113
#> Group 2 (Mac): NLL (Col = 0.314384, Ext = 0.244689, Dtc = 0.4636, P_0 = 0.596814) = 1591.12
#> Group 3 (Mic): NLL (Col = 0.332609, Ext = 0.202035, Dtc = 0.501711, P_0 = 0.588815) = 849.614
#> Group 4 (Omn): NLL (Col = 0.184111, Ext = 0.167086, Dtc = 0.525903, P_0 = 0.426777) = 323.018
#> Group 5 (Pis): NLL (Col = 0.364651, Ext = 0.31265, Dtc = 0.371796, P_0 = 0.450578) = 712.661
#> Group 6 (Zoo): NLL (Col = 0.404582, Ext = 0.184582, Dtc = 0.558295, P_0 = 0.64373) = 619.948
Model selection aims to select the best model for a given phenomenon
with a reasonable number of parameters describing it and avoiding
over-fitting. Our procedure is intended to distinguish, for example,
guilds or islands with different colonization and extinction dynamics.
The function upgma_model_selection
incorporates an UPGMA
algorithm based model selection procedure intended to find an optimal
partition that minimizes AIC values. The algorithm needs a vector of
tags in order to estimate the partition. This function allows the
estimation of colonization and extinction rates with or without
imperfect detectability.
The following example (using lakshadweepPLUS
) shows the
best model describing the dynamics of coral reef fishes in the
Lakshadweep Archipelago, based on their guilds.
Data <- lakshadweepPLUS[[1]]
Guild_Tag = c("Alg", "Cor", "Mac", "Mic", "Omn", "Pis", "Zoo")
Time <- as.vector(c(2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002,
2002, 2003, 2003, 2003, 2003, 2010, 2010, 2011, 2011, 2011, 2011, 2012,
2012, 2012, 2012, 2013, 2013, 2013, 2013))
R3 <- upgma_model_selection(Data, Time, Factor = 3, Tags = Guild_Tag,
PerfectDetectability = FALSE, z = 4)
#> Number of Columns: 28
#> Group 0 (Alg): NLL (Col = 0.544756, Ext = 0.167592, Dtc = 0.598818, P_0 = 0.662212) = 1399.03
#> Group 1 (Cor): NLL (Col = 0.366281, Ext = 0.216484, Dtc = 0.49887, P_0 = 0.529949) = 817.113
#> Group 2 (Mac): NLL (Col = 0.314384, Ext = 0.244689, Dtc = 0.4636, P_0 = 0.596814) = 1591.12
#> Group 3 (Mic): NLL (Col = 0.332609, Ext = 0.202035, Dtc = 0.501711, P_0 = 0.588815) = 849.614
#> Group 4 (Omn): NLL (Col = 0.184111, Ext = 0.167086, Dtc = 0.525903, P_0 = 0.426777) = 323.018
#> Group 5 (Pis): NLL (Col = 0.364651, Ext = 0.31265, Dtc = 0.371796, P_0 = 0.450578) = 712.661
#> Group 6 (Zoo): NLL (Col = 0.404582, Ext = 0.184582, Dtc = 0.558295, P_0 = 0.64373) = 619.948
#> Partition 0-th: Number of estimated parameters: 2
#> { Alg Cor Mac Mic Omn Pis Zoo }
#> NLL = 6395.98 AIC = 12800 AIC (corrected) = 12800 AIC_d = 127.573 AIC_w = 1.80675e-28
#> Partition 1-th: Number of estimated parameters: 4
#> { Omn Pis Zoo Cor Mic Mac } { Alg }
#> NLL = 6348.51 AIC = 12713 AIC (corrected) = 12713 AIC_d = 40.641 AIC_w = 1.36121e-09
#> Partition 2-th: Number of estimated parameters: 6
#> { Pis Zoo Cor Mic Mac } { Omn } { Alg }
#> NLL = 6345.48 AIC = 12715 AIC (corrected) = 12715 AIC_d = 42.6136 AIC_w = 5.07662e-10
#> Partition 3-th: Number of estimated parameters: 8
#> { Zoo Cor Mic Mac } { Pis } { Omn } { Alg }
#> NLL = 6324.1 AIC = 12680.2 AIC (corrected) = 12680.3 AIC_d = 7.87612 AIC_w = 0.0177308
#> Partition 4-th: Number of estimated parameters: 10
#> { Cor Mic Mac } { Zoo } { Pis } { Omn } { Alg }
#> NLL = 6316.15 AIC = 12672.3 AIC (corrected) = 12672.4 AIC_d = 0 AIC_w = 0.909928
#> Partition 5-th: Number of estimated parameters: 12
#> { Mic Mac } { Cor } { Zoo } { Pis } { Omn } { Alg }
#> NLL = 6314.83 AIC = 12677.7 AIC (corrected) = 12677.8 AIC_d = 5.39963 AIC_w = 0.0611635
#> Partition 6-th: Number of estimated parameters: 14
#> { Mac } { Mic } { Cor } { Zoo } { Pis } { Omn } { Alg }
#> NLL = 6312.5 AIC = 12681 AIC (corrected) = 12681.2 AIC_d = 8.79881 AIC_w = 0.0111781
#> \begin{table}
#> \centering
#> \begin{tabular}{lccccc}
#> Model& NLL& AIC& AIC corrected& AIC difference& AIC weights\\
#> \hline
#> 2-parameter model& 6395.98& 12800& 12800& 127.573& 1.80675e-28\\
#> 4-parameter model& 6348.51& 12713& 12713& 40.641& 1.36121e-09\\
#> 6-parameter model& 6345.48& 12715& 12715& 42.6136& 5.07662e-10\\
#> 8-parameter model& 6324.1& 12680.2& 12680.3& 7.87612& 0.0177308\\
#> 10-parameter model& 6316.15& 12672.3& 12672.4& 0& 0.909928\\
#> 12-parameter model& 6314.83& 12677.7& 12677.8& 5.39963& 0.0611635\\
#> 14-parameter model& 6312.5& 12681& 12681.2& 8.79881& 0.0111781\\
#> \end{tabular}
#> \caption{Caption goes here}
#> \label{tab:myfirsttable}
#> \end{table}
#> \begin{table}
#> \centering
#> \begin{tabular}{lcc}
#> Species Group& Extinction Rate& Colonization Rate\\
#> \hline
#> { Cor Mic Mac }& 0.22788& 0.334553\\
#> { Zoo }& 0.184582& 0.404582\\
#> { Pis }& 0.31265& 0.364651\\
#> { Omn }& 0.167086& 0.184111\\
#> { Alg }& 0.167592& 0.544756\\
#> \end{tabular}
#> \caption{Caption goes here}
#> \label{tab:myfirsttable}
#> \end{table}
The function upgma_model_selection
also generates two
output files in latex format (.tex) with: a) the parameters of the best
model found under the model selection procedure and b) the summary of
the procedure. Rmarkdown equivalent tables are included below.
Species Group | Extinction Rate | Colonization Rate |
---|---|---|
Cor Mic Mac | 0.22788 | 0.334553 |
Zoo | 0.184582 | 0.404582 |
Pis | 0.31265 | 0.364651 |
Omn | 0.167086 | 0.184111 |
Alg | 0.167592 | 0.544756 |
In table 3 (a), we find that corallivores, microinvertivores and macroinvertivores group together while the other guilds have their own estimates.
Model | NLL | AIC | AIC corrected | AIC difference | AIC weights |
---|---|---|---|---|---|
2-parameter model | 6395.98 | 12800 | 12800 | 127.573 | 1.80675e-28 |
4-parameter model | 6348.51 | 12713 | 12713 | 40.641 | 1.36121e-09 |
6-parameter model | 6345.48 | 12715 | 12715 | 42.6136 | 5.07662e-10 |
8-parameter model | 6324.1 | 12680.2 | 12680.3 | 7.87612 | 0.0177308 |
10-parameter model | 6316.15 | 12672.3 | 12672.4 | 0 | 0.909928 |
12-parameter model | 6314.83 | 12677.7 | 12677.8 | 5.39963 | 0.0611635 |
14-parameter model | 6312.5 | 12681 | 12681.2 | 8.79881 | 0.0111781 |
Table 4 (b) shows the Negative Log-Likelihood, Akaike Information Criterion and associated measures for the models considered in the UPGMA-based model selection procedure.