Introduction to the capl R package

Joel D. Barnes, M.Sc. and Michelle D. Guerrero, Ph.D.

March 25, 2022

logo

Introduction

The Canadian Assessment of Physical Literacy (CAPL) is the first comprehensive protocol that can accurately and reliably assess a broad spectrum of skills and abilities that contribute to and characterize the physical literacy level of a participating child.

Physical literacy moves beyond just fitness, motor skill or motivation in isolation. The CAPL is unique in that it can assess the multiple aspects of physical literacy: physical competence, daily behaviour, motivation and confidence, and knowledge and understanding.

The domains of physical literacy are summarized in figure 1 of the CAPL-2 manual on page 6:

domains of physical literacy

The Healthy Active Living and Obesity Research Group (HALO) has been responsible for the systematic development of the CAPL since 2008. HALO’s test development efforts have been informed by the assessment of more than 10,000 children and with input from well over 100 researchers and practitioners within related fields of study.

The capl package contains tools enabling users to compute and visualize CAPL-2 (Canadian Assessment of Physical Literacy, Second Edition) scores and interpretations from raw data, all within the R environment without having to use the CAPL-2 website.

Installation

GitHub

Users can download and install the most recent version of the capl package directly from GitHub (www.github.com/barnzilla/capl) using the devtools R package.

devtools::install_github("barnzilla/capl", upgrade = "never", build_vignettes = TRUE, force = TRUE)
library(capl)

Once the capl package is loaded, any available tutorials for the package, such as this vignette, can be accessed by calling the browseVignettes() function.

browseVignettes("capl")

Getting started

Importing raw data

Users must first import their raw data before using the capl package to compute CAPL-2 scores and interpretations. The import_capl_data() function enables users to import data from an Excel workbook into the R global environment.

data <- import_capl_data(
  file_path = "c:/path/to/raw-data.xlsx",
  sheet_name = "Sheet1"
)

Required variables

The capl package requires 60 variables in order to compute CAPL-2 scores and interpretations. Users can use the get_missing_capl_variables() function to retrieve a list of the required variables. The required variables are outlined in the Details section of the documentation.

?get_missing_capl_variables

The capl package is looking for 60 variables by the following names:

Loading the pre-installed dataset

The capl package comes with a demo (fake) dataset of raw data, capl_demo_data, which contains 500 rows of participant data on the 60 variables that are required by the capl package. Users can load the demo dataset and start exploring.

data("capl_demo_data")

The base R str() function allows users to get a sense of how the CAPL-2 raw data should be structured and named for downstream use in the capl package.

str(capl_demo_data)
#> 'data.frame':    500 obs. of  60 variables:
#>  $ age               : int  8 9 9 8 12 10 12 10 12 9 ...
#>  $ gender            : chr  "Male" "Female" "Male" "f" ...
#>  $ pacer_lap_distance: num  15 20 20 15 20 15 15 15 15 NA ...
#>  $ pacer_laps        : int  23 31 169 50 63 15 32 143 43 182 ...
#>  $ plank_time        : int  274 282 9 228 252 110 21 185 6 41 ...
#>  $ camsa_skill_score1: int  14 5 6 13 2 9 4 11 5 11 ...
#>  $ camsa_time1       : int  34 27 13 35 21 NA NA 16 20 14 ...
#>  $ camsa_skill_score2: int  14 5 13 11 14 14 0 4 0 4 ...
#>  $ camsa_time2       : int  35 23 14 35 23 23 33 30 29 18 ...
#>  $ steps1            : int  30627 27788 8457 8769 14169 9610 29459 17112 30008 18270 ...
#>  $ time_on1          : chr  "5:13am" "6:13" "6:07" "6:13" ...
#>  $ time_off1         : chr  "22:00" NA "21:00" "22:00" ...
#>  $ non_wear_time1    : int  25 31 33 25 83 67 20 10 49 64 ...
#>  $ steps2            : int  14905 24750 30111 21077 15786 23828 24735 2621 20690 19652 ...
#>  $ time_on2          : chr  "06:00" "5:13am" "6:13" "6:13" ...
#>  $ time_off2         : chr  "21:00" "23:00" "11:13pm" "23:00" ...
#>  $ non_wear_time2    : int  20 82 4 55 1 53 65 47 82 79 ...
#>  $ steps3            : int  21972 15827 14130 13132 18022 12817 14065 26352 27090 10226 ...
#>  $ time_on3          : chr  "07:00" "05:00" "07:48am" NA ...
#>  $ time_off3         : chr  "11:57pm" NA "08:30pm" NA ...
#>  $ non_wear_time3    : int  6 79 23 65 34 15 72 76 60 40 ...
#>  $ steps4            : int  28084 27369 14315 9963 6993 10092 10774 3208 2878 9055 ...
#>  $ time_on4          : chr  "05:00" "6:13" "6:07" NA ...
#>  $ time_off4         : chr  "08:30pm" "10:57 pm" "22:00" "11:13pm" ...
#>  $ non_wear_time4    : int  32 38 74 20 75 22 84 59 42 22 ...
#>  $ steps5            : int  14858 21112 16880 11707 20917 30200 20220 17995 18712 25336 ...
#>  $ time_on5          : chr  "6:07" "6:13" "06:00" "05:00" ...
#>  $ time_off5         : chr  "11:57pm" "23:00" "8:17pm" "8:17pm" ...
#>  $ non_wear_time5    : int  61 64 73 23 82 42 66 38 55 18 ...
#>  $ steps6            : int  17705 5564 16459 12235 27766 26099 15763 7202 2746 3895 ...
#>  $ time_on6          : chr  "06:00" "06:00" NA "6:07" ...
#>  $ time_off6         : chr  "21:00" NA "10:57 pm" "08:30pm" ...
#>  $ non_wear_time6    : int  33 24 89 8 27 56 66 21 14 7 ...
#>  $ steps7            : int  11067 13540 12106 18795 15039 9082 3733 4029 20791 28499 ...
#>  $ time_on7          : chr  "6:07" "6:07" "8:00am" "06:00" ...
#>  $ time_off7         : chr  "08:30pm" "11:13pm" "8:17pm" "10:57 pm" ...
#>  $ non_wear_time7    : int  8 72 4 38 9 32 49 36 34 43 ...
#>  $ self_report_pa    : int  NA 2 2 4 3 5 NA 7 6 7 ...
#>  $ csappa1           : int  1 2 4 2 2 2 3 2 2 3 ...
#>  $ csappa2           : int  3 2 1 1 1 1 4 1 4 3 ...
#>  $ csappa3           : int  2 3 2 1 NA 1 3 3 4 4 ...
#>  $ csappa4           : int  4 1 1 3 4 4 4 4 4 1 ...
#>  $ csappa5           : int  4 2 3 2 1 2 2 2 4 1 ...
#>  $ csappa6           : int  3 4 1 4 2 2 2 3 4 4 ...
#>  $ why_active1       : int  4 3 5 3 1 5 4 1 1 2 ...
#>  $ why_active2       : int  5 3 4 2 5 3 5 NA 5 NA ...
#>  $ why_active3       : int  3 3 1 4 2 3 4 4 5 3 ...
#>  $ feelings_about_pa1: int  4 3 2 2 1 1 3 4 4 2 ...
#>  $ feelings_about_pa2: int  5 2 2 3 4 2 4 4 2 5 ...
#>  $ feelings_about_pa3: int  2 5 2 5 3 2 2 1 3 5 ...
#>  $ pa_guideline      : int  2 3 4 1 2 4 3 2 2 2 ...
#>  $ crf_means         : int  1 4 4 2 2 1 2 1 4 1 ...
#>  $ ms_means          : int  3 2 1 2 3 1 1 2 4 2 ...
#>  $ sports_skill      : int  2 4 4 1 3 1 3 1 4 3 ...
#>  $ pa_is             : int  10 1 1 1 1 1 2 1 3 1 ...
#>  $ pa_is_also        : int  5 1 4 4 1 7 2 7 2 8 ...
#>  $ improve           : int  3 3 9 3 9 9 3 3 3 6 ...
#>  $ increase          : int  2 8 3 8 8 1 3 3 8 8 ...
#>  $ when_cooling_down : int  4 2 4 2 2 2 2 5 2 2 ...
#>  $ heart_rate        : int  5 6 4 4 4 9 4 8 7 4 ...

The 60 required variables can also be quickly accessed by calling the base R colnames() function.

colnames(capl_demo_data)
#>  [1] "age"                "gender"             "pacer_lap_distance"
#>  [4] "pacer_laps"         "plank_time"         "camsa_skill_score1"
#>  [7] "camsa_time1"        "camsa_skill_score2" "camsa_time2"       
#> [10] "steps1"             "time_on1"           "time_off1"         
#> [13] "non_wear_time1"     "steps2"             "time_on2"          
#> [16] "time_off2"          "non_wear_time2"     "steps3"            
#> [19] "time_on3"           "time_off3"          "non_wear_time3"    
#> [22] "steps4"             "time_on4"           "time_off4"         
#> [25] "non_wear_time4"     "steps5"             "time_on5"          
#> [28] "time_off5"          "non_wear_time5"     "steps6"            
#> [31] "time_on6"           "time_off6"          "non_wear_time6"    
#> [34] "steps7"             "time_on7"           "time_off7"         
#> [37] "non_wear_time7"     "self_report_pa"     "csappa1"           
#> [40] "csappa2"            "csappa3"            "csappa4"           
#> [43] "csappa5"            "csappa6"            "why_active1"       
#> [46] "why_active2"        "why_active3"        "feelings_about_pa1"
#> [49] "feelings_about_pa2" "feelings_about_pa3" "pa_guideline"      
#> [52] "crf_means"          "ms_means"           "sports_skill"      
#> [55] "pa_is"              "pa_is_also"         "improve"           
#> [58] "increase"           "when_cooling_down"  "heart_rate"

Generating demo raw data

The capl package is also equipped with the get_capl_demo_data() function. This function allows users to randomly generate demo raw data and takes parameter n (set to 500 by default). This parameter is used to specify how many rows of demo raw data to generate and must, therefore, be an integer greater than zero. Users, for example, can randomly generate demo raw data for 10,000 participants by executing a single line of code:

capl_demo_data2 <- get_capl_demo_data(n = 10000)

The base R str() function can be called to verify how many rows and columns of data were created.

str(capl_demo_data2)
#> 'data.frame':    10000 obs. of  60 variables:
#>  $ age               : int  12 11 11 11 7 11 10 10 12 10 ...
#>  $ gender            : chr  "f" "Girl" "Boy" "Female" ...
#>  $ pacer_lap_distance: num  20 20 15 15 20 15 NA 15 20 15 ...
#>  $ pacer_laps        : int  194 194 6 136 108 141 175 55 115 33 ...
#>  $ plank_time        : int  15 24 97 77 243 261 285 119 35 232 ...
#>  $ camsa_skill_score1: int  9 9 5 7 9 8 4 4 14 6 ...
#>  $ camsa_time1       : int  30 18 11 22 20 16 27 17 12 26 ...
#>  $ camsa_skill_score2: int  5 NA 9 6 7 13 9 3 10 8 ...
#>  $ camsa_time2       : int  26 20 27 30 15 31 18 26 24 12 ...
#>  $ steps1            : int  8956 6296 28534 13681 7921 27379 3682 11291 14083 6010 ...
#>  $ time_on1          : chr  "07:00" NA "6:07" "6:13" ...
#>  $ time_off1         : chr  "11:57pm" "21:00" "22:00" "11:57pm" ...
#>  $ non_wear_time1    : int  64 25 47 21 66 47 1 90 36 79 ...
#>  $ steps2            : int  13288 24332 8247 29195 18082 26848 13019 28029 2290 8498 ...
#>  $ time_on2          : chr  "06:00" "5:13am" "05:00" "07:48am" ...
#>  $ time_off2         : chr  "22:00" "21:00" "08:30pm" "11:13pm" ...
#>  $ non_wear_time2    : int  82 25 41 63 28 14 48 65 4 54 ...
#>  $ steps3            : int  30337 16486 19251 24376 6181 2794 22925 7834 3685 14526 ...
#>  $ time_on3          : chr  "6:07" "06:00" "07:48am" "8:00am" ...
#>  $ time_off3         : chr  "8:17pm" "22:00" "08:30pm" "10:57 pm" ...
#>  $ non_wear_time3    : int  50 0 42 77 22 44 28 31 13 20 ...
#>  $ steps4            : int  12491 28605 18313 2136 19644 26878 7157 2182 21941 9769 ...
#>  $ time_on4          : chr  "5:13am" "8:00am" "5:13am" "05:00" ...
#>  $ time_off4         : chr  "22:00" NA "10:57 pm" "23:00" ...
#>  $ non_wear_time4    : int  72 71 19 52 81 40 82 20 36 3 ...
#>  $ steps5            : int  9149 1599 17696 16316 14286 30330 20326 24706 11135 17383 ...
#>  $ time_on5          : chr  "06:00" "6:07" "8:00am" "5:13am" ...
#>  $ time_off5         : chr  "21:00" "11:57pm" "10:57 pm" "10:57 pm" ...
#>  $ non_wear_time5    : int  6 29 7 71 76 79 82 4 1 18 ...
#>  $ steps6            : int  15240 22719 14439 20137 22746 22500 7315 8295 17142 12305 ...
#>  $ time_on6          : chr  "07:48am" "5:13am" "07:00" "05:00" ...
#>  $ time_off6         : chr  "11:13pm" "21:00" NA NA ...
#>  $ non_wear_time6    : int  18 48 60 66 15 41 9 NA 22 8 ...
#>  $ steps7            : int  10810 11607 28089 6269 29530 13824 23614 25367 25992 25336 ...
#>  $ time_on7          : chr  "6:07" "8:00am" "07:48am" "8:00am" ...
#>  $ time_off7         : chr  "8:17pm" "22:00" "23:00" "11:13pm" ...
#>  $ non_wear_time7    : int  13 39 65 90 12 1 61 55 58 68 ...
#>  $ self_report_pa    : int  2 6 5 NA 6 6 2 1 2 1 ...
#>  $ csappa1           : int  2 4 2 2 4 4 3 4 2 2 ...
#>  $ csappa2           : int  3 3 2 1 4 3 2 4 2 4 ...
#>  $ csappa3           : int  1 4 4 2 2 2 4 2 4 3 ...
#>  $ csappa4           : int  3 3 1 4 3 1 3 3 3 4 ...
#>  $ csappa5           : int  1 1 2 2 3 4 1 1 3 1 ...
#>  $ csappa6           : int  2 1 3 1 3 2 2 2 4 1 ...
#>  $ why_active1       : int  3 2 1 5 5 1 2 3 3 4 ...
#>  $ why_active2       : int  5 2 2 3 1 2 1 1 5 1 ...
#>  $ why_active3       : int  3 1 1 4 NA NA 4 NA 3 1 ...
#>  $ feelings_about_pa1: int  4 1 1 2 4 3 2 5 2 3 ...
#>  $ feelings_about_pa2: int  2 2 3 5 3 3 4 1 1 4 ...
#>  $ feelings_about_pa3: int  1 3 1 NA 4 3 3 5 4 2 ...
#>  $ pa_guideline      : int  1 1 4 4 1 4 1 3 1 1 ...
#>  $ crf_means         : int  3 1 3 1 3 2 1 4 4 3 ...
#>  $ ms_means          : int  1 2 4 1 2 3 3 3 4 1 ...
#>  $ sports_skill      : int  1 1 3 4 3 2 3 4 1 2 ...
#>  $ pa_is             : int  2 7 3 7 8 3 1 4 1 5 ...
#>  $ pa_is_also        : int  5 7 2 7 7 2 9 10 10 5 ...
#>  $ improve           : int  3 3 1 5 3 2 9 2 3 1 ...
#>  $ increase          : int  2 8 8 1 4 2 8 4 8 1 ...
#>  $ when_cooling_down : int  2 4 2 3 2 2 2 2 10 10 ...
#>  $ heart_rate        : int  4 3 8 7 4 10 9 4 4 9 ...

Exporting data to Excel

If users prefer to examine the CAPL demo raw data in a workbook, the export_capl_data() function allows them to export data objects to Excel.

export_capl_data(capl_demo_data2, "c:/path/to/store/capl_demo_data2.xlsx")

Renaming variables

If users have imported their own raw data and plan to use the main function, get_capl(), in the capl package to compute CAPL-2 scores and interpretations, they must ensure their variables names match the names of the 60 required variables. Users can rename their variables by calling the rename_variable() function. This function takes three parameters: x, search, and replace. The x parameter must be the raw data object, the search parameter must be a character vector representing the variable name(s) to be renamed, and the replace parameter must be a character vector representing the new names for the variables specificed in the search parameter. Below we show how to rename variables using a fake dataset called raw_data.

# Create fake data
raw_data <- data.frame(
  age_years = sample(8:12, 100, replace = TRUE),
  genders = sample(c("girl", "boy"), 100, replace = TRUE, prob = c(0.51, 0.49)),
  step_counts1 = sample(1000:30000, 100, replace = TRUE),
  step_counts2 = sample(1000:30000, 100, replace = TRUE),
  step_counts3 = sample(1000:30000, 100, replace = TRUE),
  step_counts4 = sample(1000:30000, 100, replace = TRUE),
  step_counts5 = sample(1000:30000, 100, replace = TRUE),
  step_counts6 = sample(1000:30000, 100, replace = TRUE),
  step_counts7 = sample(1000:30000, 100, replace = TRUE)
)

# Examine the structure of this data
str(raw_data)
#> 'data.frame':    100 obs. of  9 variables:
#>  $ age_years   : int  10 11 10 12 11 10 9 11 9 10 ...
#>  $ genders     : chr  "boy" "girl" "girl" "girl" ...
#>  $ step_counts1: int  11270 28657 15265 20530 4297 27950 29183 7587 26347 23758 ...
#>  $ step_counts2: int  21344 1555 26582 28398 19696 25712 22505 14109 11511 28814 ...
#>  $ step_counts3: int  19846 27190 28220 11657 13947 23189 6460 4110 16202 27480 ...
#>  $ step_counts4: int  28702 18124 6937 26191 22156 19665 5524 7143 23614 21583 ...
#>  $ step_counts5: int  23579 8113 4736 24895 8246 21498 18466 6378 28749 11644 ...
#>  $ step_counts6: int  22961 3050 10305 15144 5867 21254 5790 28853 26437 29569 ...
#>  $ step_counts7: int  9582 21427 15006 11178 3369 9065 17739 10447 13078 18389 ...

# Rename the variables
raw_data <- rename_variable(
  x = raw_data,
  search = c(
    "age_years", 
    "genders", 
    "step_counts1", 
    "step_counts2", 
    "step_counts3", 
    "step_counts4", 
    "step_counts5", 
    "step_counts6", 
    "step_counts7"
  ),
  replace = c(
    "age", 
    "gender", 
    "steps1", 
    "steps2", 
    "steps3", 
    "steps4", 
    "steps5", 
    "steps6", 
    "steps7"
    )
)

# Examine the structure of this data
str(raw_data)
#> 'data.frame':    100 obs. of  9 variables:
#>  $ age   : int  10 11 10 12 11 10 9 11 9 10 ...
#>  $ gender: chr  "boy" "girl" "girl" "girl" ...
#>  $ steps1: int  11270 28657 15265 20530 4297 27950 29183 7587 26347 23758 ...
#>  $ steps2: int  21344 1555 26582 28398 19696 25712 22505 14109 11511 28814 ...
#>  $ steps3: int  19846 27190 28220 11657 13947 23189 6460 4110 16202 27480 ...
#>  $ steps4: int  28702 18124 6937 26191 22156 19665 5524 7143 23614 21583 ...
#>  $ steps5: int  23579 8113 4736 24895 8246 21498 18466 6378 28749 11644 ...
#>  $ steps6: int  22961 3050 10305 15144 5867 21254 5790 28853 26437 29569 ...
#>  $ steps7: int  9582 21427 15006 11178 3369 9065 17739 10447 13078 18389 ...

Eliminating noisy errors with validation

One of the coding philosophies behind the capl package is to create a “quiet” user experience by suppressing “noisy” error and warning messages via validation. That is, the capl package returns missing or invalid values as NA values instead of throwing “noisy” errors that halt code execution. If any variable is missing, for example, the get_capl() function will continue to execute without throwing error or warning messages. The get_missing_capl_variables() function will create required variables that are missing and populate these variables with NA values. In order to implement the validation philosophy, every capl function enlists helper functions to validate the data. If a given value is not of the correct class or out of range, an NA will be returned.

Validation functions in the capl package

There are eight functions included in the capl package (displayed in alphabetical order) to help provide a “quiet” user experience:

  • validate_age()
  • validate_character()
  • validate_domain_score()
  • validate_gender()
  • validate_integer()
  • validate_number()
  • validate_scale()
  • validate_steps()

Users can learn more about these functions by accessing the documentation within the R environment.

?validate_age
?validate_character
?validate_domain_score
?validate_gender
?validate_integer
?validate_number
?validate_scale
?validate_steps

Validation of age

The CAPL-2 is currently validated with 8- to 12-year-old children. However, when a function requires the age variable to execute a computation (e.g., get_capl_interpretation()), the age variable is validated via the validate_age() function.

validated_age <- validate_age(c(7, 8, 9, 10, 11, 12, 13, "", NA, "12", 8.5))

Notice the NA values in the results.

validated_age
#>  [1] NA  8  9 10 11 12 NA NA NA 12  8

The first element is NA because the original value is 7. The next five elements are identical to their original values because they are integers between 8 and 12. The seventh element is NA because the original value is 13. The next two elements are NA because the original values ("" and NA) are obviously invalid. The last element is 8, but notice that the original value is a decimal. Because 8.5 is between 8 and 12, it is considered valid but the floor of the value is returned since CAPL performs age-specific computations based on integer age.

Validation of gender

The CAPL-2 is currently validated for children who identify as boys or girls. When a function requires the gender variable to execute a computation (e.g., get_capl_interpretation()), the gender variable is validated via the validate_gender() function.

validated_gender <- validate_gender(c("Girl", "GIRL", "g", "G", "Female", "f", "F", "", NA, 1))

validated_gender
#>  [1] "girl" "girl" "girl" "girl" "girl" "girl" "girl" NA     NA     "girl"

Notice the results again. This function accepts a number of case-insensitive options (e.g., “Girl”, “G”, “Female”, “F”, 1) for the female gender and returns a standardized “girl” value. The only two elements that are returned as NA have original values that are obviously invalid ("" and NA). The validate_gender() function behaves in a similar fashion for the male gender; it also accepts a number of case-insensitive options and returns a standardized “boy” value.

validated_gender <- validate_gender(c("Boy", "BOY", "b", "B", "Male", "m", "M", "", NA, 0))

validated_gender
#>  [1] "boy" "boy" "boy" "boy" "boy" "boy" "boy" NA    NA    "boy"

Computing CAPL-2 scores and interpretations

The CAPL-2 scoring system is nicely summarized in figure 2 of the CAPL-2 manual on page 7: