---
title: "Markov Stability Analysis"
author:
  - name: Mohammed Saqr
    url: https://saqr.me
    affiliation: University of Eastern Finland
  - name: Sonsoles López-Pernas
    url: https://sonsoles.me
    affiliation: University of Eastern Finland
  - name: Kamila Misiejuk
    url: https://kamilamisiejuk.com
    affiliation: FernUniversität in Hagen, Germany
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Markov Stability Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  out.width = "100%",
  fig.width = 7,
  fig.height = 5,
  dpi       = 96,
  warning   = FALSE,
  message   = FALSE
)
```

```{r}
library(Nestimate)
```

Nestimate provides two functions for Markov-chain stability analysis of
transition networks:

- **`passage_time()`** -- computes the full matrix of mean first passage times
  (MFPT). Entry M[i, j] is the expected number of steps to travel from state
  *i* to state *j* for the first time. The diagonal equals the mean recurrence
  time 1/pi.
- **`markov_stability()`** -- computes per-state stability metrics: persistence,
  stationary probability, return time, sojourn time, and mean accessibility.

Both functions accept a `netobject`, `cograph_network`, `tna` object,
row-stochastic matrix, or a raw wide sequence data frame.

---

## Data

`trajectories` contains 138 learners recorded at 15 time-steps each with three
engagement states: Active, Average, and Disengaged.

```{r}
dim(trajectories)
table(as.vector(trajectories), useNA = "always")
```

---

## Selecting mostly-active learners

We keep learners who were Active **more than half the time** (> 7 of 15 steps).

```{r}
sub <- trajectories[rowSums(trajectories == "Active", na.rm = TRUE) > 7, ]
nrow(sub)
```

---

## Sequence plots

```{r fig.height=7}
sequence_plot(trajectories, type = "heatmap")
```

```{r fig.height=5}
sequence_plot(sub, type = "heatmap")
```

---

## Transition networks

```{r}
net_all <- build_network(trajectories, method = "relative")
net_sub <- build_network(sub, method = "relative")

round(net_all$weights, 3)
round(net_sub$weights, 3)
```

In the mostly-active group nearly every state transitions predominantly back
to Active, and the probability of entering Disengaged is almost zero.

---

## Mean First Passage Times

```{r}
pt_all <- passage_time(net_all)
pt_sub <- passage_time(net_sub)
```

```{r}
print(pt_all, digits = 2)
```

```{r}
print(pt_sub, digits = 2)
```

**Active to Disengaged** rises from 10.4 to ~40 steps -- mostly-active learners
are nearly four times harder to disengage. The diagonal equals the mean
recurrence time (how many steps between consecutive visits to the same state).

```{r fig.height=4}
plot(pt_all, title = "Full sample (n = 138)")
```

```{r fig.height=4}
plot(pt_sub, title = "Mostly-active learners (n = 42)")
```

In the mostly-active heatmap the Active column is uniformly dark (quickly
reachable from any state) and the Disengaged column is uniformly pale
(nearly unreachable).

---

## Stationary distribution

```{r echo=FALSE}
data.frame(
  State           = names(pt_all$stationary),
  `Full sample`   = paste0(round(pt_all$stationary * 100, 1), "%"),
  `Mostly active` = paste0(round(pt_sub$stationary * 100, 1), "%"),
  check.names     = FALSE
)
```

In the mostly-active group ~79% of long-run time is spent Active (vs 37% in
the full sample).

---

## Markov Stability

```{r}
ms_all <- markov_stability(net_all)
ms_sub <- markov_stability(net_sub)

print(ms_all)
print(ms_sub)
```

The `stability` data frame contains:

| Column | Meaning |
|--------|---------|
| `persistence` | P[i,i] -- probability of staying at the next step |
| `stationary_prob` | Long-run proportion of time in this state |
| `return_time` | Expected steps between consecutive visits (1/pi) |
| `sojourn_time` | Expected consecutive steps before leaving (1/(1-P[i,i])) |
| `avg_time_to_others` | Mean MFPT leaving this state to all others |
| `avg_time_from_others` | Mean MFPT from all other states arriving here |

```{r fig.height=6}
plot(ms_all, title = "Full sample")
```

```{r fig.height=6}
plot(ms_sub, title = "Mostly-active learners")
```

Active leads on persistence (0.81) and sojourn time (5.3 steps) -- once a
learner is active they stay active. Average has the lowest
`avg_time_from_others` (1.6 steps) -- the most accessible hub state.
Disengaged has `avg_time_from_others` of ~39 steps -- effectively unreachable.

---