2 - Data Analysis

Marina Papadopoulou

swaRverse provides a pipeline to extract metrics of collective motion from grouping individuals trajectories. Metrics include either global (group-level) or pairwise (individual-level) characteristics of the group. After calculating the timeseries of these metrics, the package estimates their averages over each ‘event’ of collective motion. More details about how an event is defined is given below. Let’s start with ..

2.1 Velocity estimations

We start by adding headings and speeds to the trajectory data, and splitting the whole dataframe into a list of dataframes, one per set. For this, we need to specify whether the data correspond to geo data (lon-lat) or not.

library(swaRmverse)
#data_df <- trackdf::tracks
#raw$set <- c(rep('ctx1', nrow(raw)/2 ), rep('ctx2', nrow(raw)/2))
raw <- read.csv(system.file("extdata/video/01.csv", package = "trackdf"))
raw <- raw[!raw$ignore, ]

## Add fake context
raw$context <- c(rep("ctx1", nrow(raw) / 2), rep("ctx2", nrow(raw) / 2))

data_df <- set_data_format(raw_x = raw$x,
                          raw_y = raw$y,
                          raw_t = raw$frame,
                          raw_id = raw$id,
                          origin = "2020-02-1 12:00:21",
                          period = "0.04S",
                          tz = "America/New_York",
                          raw_context = raw$context
                          )

is_geo <- FALSE
data_dfs <- add_velocities(data_df,
                           geo = is_geo,
                           verbose = TRUE,
                           parallelize = FALSE
                           ) ## A list of dataframes
## Adding velocity info to every set of the dataset..
## Done!
#head(data_dfs[[1]])
print(paste("Velocity information added for", length(data_dfs), "sets."))
## [1] "Velocity information added for 2 sets."

If there is a high number of sets in the dataset, the parallelization of the function can be turned on (setting parallelize argument to TRUE). This is not recommended for small to intermediate data sizes.

2.2 Group characteristics

Based on the list of positional data and calculated velocities, we can now calculate the timeseries of group polarization, average speed, and shape. As a proxy for group shape we use the angle between the object-oriented bounding box that includes the position of all group members and the average heading of the group. Small angles close to 0 rads represent oblong groups, while large angles close to pi/2 rads wide groups. The group_metrics function calculates the timeseries of each measurement across sets. To reduce noise, the function further calculates the smoothed timeseries of speed and polarization over a given time window (using a moving average).

sampling_timestep <- 0.04
time_window <- 1 # seconds
smoothing_time_window <- time_window / sampling_timestep

g_metr <- group_metrics_per_set(data_list = data_dfs,
                               mov_av_time_window = smoothing_time_window,
                               step2time = sampling_timestep,
                               geo = is_geo,
                               parallelize = FALSE
                               )
summary(g_metr)
##      set                  t                               pol         
##  Length:2802        Min.   :2020-02-01 12:00:21.03   Min.   :0.01027  
##  Class :character   1st Qu.:2020-02-01 12:00:49.04   1st Qu.:0.20701  
##  Mode  :character   Median :2020-02-01 12:01:17.05   Median :0.32532  
##                     Mean   :2020-02-01 12:01:17.03   Mean   :0.33785  
##                     3rd Qu.:2020-02-01 12:01:45.02   3rd Qu.:0.44768  
##                     Max.   :2020-02-01 12:02:13.03   Max.   :0.97476  
##                                                      NA's   :2        
##      speed              shape                 N          missing_ind    
##  Min.   :   35.42   Min.   :0.0002811   Min.   :3.000   Min.   :0.0000  
##  1st Qu.:  132.02   1st Qu.:0.4259205   1st Qu.:7.000   1st Qu.:0.0000  
##  Median :  175.98   Median :0.8333063   Median :7.000   Median :1.0000  
##  Mean   :  742.80   Mean   :0.8132966   Mean   :7.291   Mean   :0.5543  
##  3rd Qu.:  243.84   3rd Qu.:1.1968371   3rd Qu.:8.000   3rd Qu.:1.0000  
##  Max.   :12232.96   Max.   :1.5706044   Max.   :9.000   Max.   :5.0000  
##  NA's   :2          NA's   :2                           NA's   :2       
##     speed_av          pol_av      
##  Min.   : 111.5   Min.   :0.1696  
##  1st Qu.: 426.3   1st Qu.:0.2852  
##  Median : 670.1   Median :0.3259  
##  Mean   : 746.7   Mean   :0.3379  
##  3rd Qu.:1005.7   3rd Qu.:0.3812  
##  Max.   :2241.2   Max.   :0.5599  
##  NA's   :50       NA's   :50

As before, one can parallelize the function if the data are from many different days/sets. A column of N and missing_ind are added to the dataframe, showing the group size of that time point and whether an individual has NA data.

2.3 Pairwise measurements

From the timeseries of positions and velocities, we can calculate information concerning the nearest neighbor of each group member. Here we estimate the distance and the bearing angle (angle between the focal individual’s heading and its neighbor) to the nearest neighbor of each individual. These, along with the id of the nearest neighbor, are added as columns to the positional timeseries dataframe:

data_df <- pairwise_metrics(data_list = data_dfs,
                            geo = is_geo,
                            verbose = TRUE,
                            parallelize = FALSE,
                            add_coords = FALSE # could be set to TRUE if the relative positions of neighbors are needed 
                            )
## Pairwise analysis started..
#tail(data_df)

2.4 Metrics of collective motion

Based on the global and local measurements, we then calculate a series of metrics that aim to capture the dynamics of the collective motion of the group. These metrics are calculated over parts of the trajectories that the group is performing coordinated collective motion, when the group is moving (average speed is higher than a given threshold) and is somewhat polarized (polarization higher than a given threshold). These parts are defined as ‘events’. The thresholds are asked by the user in run time if ‘interactive_mode’ is activated, after printing the quantiles of average speed and polarization across all data. Otherwise, the thresholds (pol_lim and speed_lim) should be given as inputs. If both limits are set to 0, a set will be taken as a complete event. The time between observation is needed as input to distinguish between continuous events. When the group and pairwise timeseries are calculated, one can calculate the metrics per event:

### Interactive mode, if the limits of speed and polarization are unknown
# new_species_metrics <- col_motion_metrics(data_df,
#                                            global_metrics = g_metr,
#                                            step2time = sampling_timestep,
#                                            verbose = TRUE,
#                                            speed_lim = NA,
#                                            pol_lim = NA
#                                             
# )

new_species_metrics <- col_motion_metrics(data_df,
                                           global_metrics = g_metr,
                                           step2time = sampling_timestep,
                                           verbose = TRUE,
                                           speed_lim = 150,
                                           pol_lim = 0.3
)

# summary(new_species_metrics)

The number of events and their total duration given the input thresholds is also printed. If we are not interested in inspecting the timeseries of the measurements, on can calculate the metrics directly from the formatted dataset:

new_species_metrics <- col_motion_metrics_from_raw(data_df,
                                mov_av_time_window = smoothing_time_window,
                                step2time = sampling_timestep,
                                geo = is_geo,
                                verbose = TRUE,
                                speed_lim = 150,
                                pol_lim = 0.3,
                                parallelize_all = FALSE
                                )
## Adding velocity info to every set of the dataset..
## Done!
# summary(new_species_metrics)

Since we are interested in comparing different datasets across species or contexts, a new species id column should be added:

new_species_metrics$species <- "new_species_1"

head(new_species_metrics)
##   event N             set          start_time mean_mean_nnd mean_sd_nnd
## 1     1 8 2020-02-01_ctx1 2020-02-01 12:00:21      261.3625   202.00913
## 2     2 8 2020-02-01_ctx1 2020-02-01 12:00:23      188.0053   123.32601
## 3     3 7 2020-02-01_ctx1 2020-02-01 12:00:25      184.0563    72.31109
## 4     4 8 2020-02-01_ctx1 2020-02-01 12:00:26      199.2923    73.48385
## 5     5 8 2020-02-01_ctx1 2020-02-01 12:00:27      156.2709   132.57649
## 6     6 7 2020-02-01_ctx1 2020-02-01 12:00:28      158.0017    96.51129
##   sd_mean_nnd  mean_pol    sd_pol  cv_speed mean_sd_front mean_mean_bangl
## 1    3.542036 0.3400859 0.1434671 1.8786978     0.2875128        1.637330
## 2   24.690932 0.3337442 0.1693607 1.6556076     0.2743454        1.238517
## 3   27.403410 0.3341782 0.1962730 1.8533976     0.3199358        1.778552
## 4   11.876152 0.4152229 0.0000000 0.0000000     0.3290209        1.754450
## 5    4.002438 0.2857535 0.1386557 0.8965682     0.2441024        1.513099
## 6   26.039637 0.4056536 0.1758167 1.9728170     0.2924780        1.693242
##   mean_shape  sd_shape event_dur       species
## 1  0.7355342 0.4642304      1.28 new_species_1
## 2  0.7145053 0.3854631      1.12 new_species_1
## 3  0.9832798 0.3983144      1.08 new_species_1
## 4  1.1002695 0.0000000      0.04 new_species_1
## 5  1.0568951 0.3513711      0.32 new_species_1
## 6  0.9961263 0.3764097      7.80 new_species_1
## Un-comment bellow to save the output in order to combine it with other datasets (replace 'path2file' with appropriate local path and name).
# write.csv(new_species_metrics, file = path2file.csv, row.names = FALSE) # OR R object
# save(new_species_metrics, file = path2file.rda) 

The duration, starting time and group size (N) of each event are also added to the result dataframe. We suggest filtering out events of very small duration and with less than 3 individuals (singletons and pairs). The calculated metrics are: