The goal of CINmetrics package is to provide different methods of calculating Chromosomal Instability (CIN) metrics from the literature that can be applied to any cancer data set including The Cancer Genome Atlas.
library(CINmetrics)
The dataset provided with CINmetrics package is masked Copy Number variation data for Breast Cancer for 10 unique samples selected randomly from TCGA.
dim(maskCNV_BRCA)
#> [1] 1650 7
Alternatively, you can download the entire dataset from TCGA using TCGAbiolinks package
## Not run:
#library(TCGAbiolinks)
#query.maskCNV.hg39.BRCA <- GDCquery(project = "TCGA-BRCA",
# data.category = "Copy Number Variation",
# data.type = "Masked Copy Number Segment", legacy=FALSE)
#GDCdownload(query = query.maskCNV.hg39.BRCA)
#maskCNV.BRCA <- GDCprepare(query = query.maskCNV.hg39.BRCA, summarizedExperiment = FALSE)
#maskCNV.BRCA <- data.frame(maskCNV.BRCA, stringsAsFactors = FALSE)
#tai.test <- tai(cnvData = maskCNV.BRCA)
## End(Not run)
tai calculates the Total Aberration Index (TAI; Baumbusch LO, et. al.), “a measure of the abundance of genomic size of copy number changes in a tumour”. It is defined as a weighted sum of the segment means (\(|\bar{y}_{S_i}|\)).
Biologically, it can also be interpreted as the absolute deviation from the normal copy number state averaged over all genomic locations.
\[ Total\ Aberration\ Index = \frac {\sum^{R}_{i = 1} {d_i} \cdot |{\bar{y}_{S_i}}|} {\sum^{R}_{i = 1} {d_i}}\ \ where |\bar{y}_{S_i}| \ge |\log_2 1.7| \]
<- tai(cnvData = maskCNV_BRCA)
tai.test head(tai.test)
#> sample_id tai
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 0.4574789
#> 2 TCGA-E2-A153-11A-31D-A12A-01 1.4916264
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 0.9886191
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 0.4944296
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 0.3531782
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 0.3706400
taiModified calculates a modified Total Aberration Index using all sample values instead of those in aberrant copy number state, thus does not remove the directionality from the score.
\[ Modified\ Total\ Aberration\ Index = \frac {\sum^{R}_{i = 1} {d_i} \cdot {\bar{y}_{S_i}}} {\sum^{R}_{i = 1} {d_i}} \]
<- taiModified(cnvData = maskCNV_BRCA)
modified.tai.test head(modified.tai.test)
#> sample_id modified_tai
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 0.014579640
#> 2 TCGA-E2-A153-11A-31D-A12A-01 0.012139011
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 0.015385256
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 0.006692841
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 0.004983911
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 0.014940306
cna calculates the total number of copy number aberrations (CNA; Davidson JM, et. al.), defined as a segment with copy number outside the pre-defined range of 1.7-2.3 (\((\log_2 1.7 -1) \le \bar{y}_{S_i} \le (\log_2 2.3 -1)\)) that is not contiguous with an adjacent independent CNA of identical copy number. For our purposes, we have adapted the range to be \(|\bar{y}_{S_i}| \ge |\log_2 1.7|\), which is only slightly larger than the original.
This metric is very similar to the number of break points, but it comes with the caveat that adjacent segments need to have a difference in segmentation mean values.
\[ Total\ Copy\ Number\ Aberration = \sum^{R}_{i = 1} n_i \ \ where\ \ \begin{align} |\bar{y}_{S_i}| \ge |\log_2{1.7}|, \\ |\bar{y}_{S_{i-1}} - \bar{y}_{S_i}| \ge 0.2, \\ d_i \ge 10 \end{align} \]
<- cna(cnvData = maskCNV_BRCA)
cna.test head(cna.test)
#> sample_id cna
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 33
#> 2 TCGA-E2-A153-11A-31D-A12A-01 14
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 7
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 14
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 212
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 31
countingBaseSegments calculates the number of altered bases defined as the sums of the lengths of segments (\(d_i\)) with an absolute segment mean (\(|\bar{y}_{S_i}|\)) of greater than 0.2.
Biologically, this value can be thought to quantify numerical chromosomal instability. This is also a simpler representation of how much of the genome has been altered, and it does not run into the issue of sequencing coverage affecting the fraction of the genome altered.
\[ Number\ of\ Altered\ Bases = \sum^{R}_{i = 1} d_i\ where\ |\bar{y}_{S_i}| \ge 0.2 \]
<- countingBaseSegments(cnvData = maskCNV_BRCA)
base.seg.test head(base.seg.test)
#> sample_id base_segments
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 55853059
#> 2 TCGA-E2-A153-11A-31D-A12A-01 131157
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 80000
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 271941966
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 1314597331
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 536984944
countingBreakPoints calculates the number of break points defined as the number of segments (\(n_i\)) with an absolute segment mean greater than 0.2. This is then doubled to account for the 5’ and 3’ break points.
Biologically, this value can be thought to quantify structural chromosomal instability.
\[ Number\ of \ Break\ Points = \sum^{R}_{i = 1} (n_i \cdot 2)\ where\ |\bar{y}_{S_i}| \ge 0.2 \]
<- countingBreakPoints(cnvData = maskCNV_BRCA)
break.points.test head(break.points.test)
#> sample_id break_points
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 104
#> 2 TCGA-E2-A153-11A-31D-A12A-01 40
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 22
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 40
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 626
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 102
fga calculates the fraction of the genome altered (FGA; Chin SF, et. al.), measured by taking the sum of the number of bases altered and dividing it by the genome length covered (\(G\)). Genome length covered was calculated by summing the lengths of each probe on the Affeymetrix 6.0 array. This calculation excludes sex chromosomes.
\[ Fraction\ Genome\ Altered = \frac {\sum^{R}_{i = 1} d_i} {G} \ \ where\ |\bar{y}_{S_i}| \ge 0.2 \]
<- fga(cnvData = maskCNV_BRCA)
fraction.genome.test head(fraction.genome.test)
#> sample_id fga
#> 1 TCGA-E2-A1B1-01A-21D-A12N-01 1.943930e-02
#> 2 TCGA-E2-A153-11A-31D-A12A-01 4.564835e-05
#> 3 TCGA-D8-A1XW-10A-01D-A14J-01 2.784349e-05
#> 4 TCGA-A2-A0YD-01A-11D-A107-01 9.464765e-02
#> 5 TCGA-BH-A0BR-01A-21D-A111-01 4.126128e-01
#> 6 TCGA-D8-A27T-01A-11D-A16C-01 1.868942e-01
CINmetrics calculates tai, cna, number of altered base segments, number of break points, and fraction of genome altered and returns them as a single data frame.
<- CINmetrics(cnvData = maskCNV_BRCA)
cinmetrics.test head(cinmetrics.test)
#> sample_id tai cna base_segments break_points
#> 1 TCGA-A2-A0YD-01A-11D-A107-01 0.4944296 14 271941966 40
#> 2 TCGA-A8-A086-01A-11D-A011-01 0.6721224 70 805881366 214
#> 3 TCGA-AO-A0J5-10A-01D-A037-01 0.8889885 12 41816 34
#> 4 TCGA-AR-A0TV-01A-21D-A087-01 0.5861162 187 1099228749 624
#> 5 TCGA-B6-A0RP-01A-21D-A087-01 0.3184316 41 1291153635 142
#> 6 TCGA-BH-A0BR-01A-21D-A111-01 0.3531782 212 1314597331 626
#> fga
#> 1 9.464765e-02
#> 2 2.804818e-01
#> 3 1.455379e-05
#> 4 3.825795e-01
#> 5 4.493777e-01
#> 6 4.126128e-01