Introduction to FracFixR

Alice Cleynen, Agin Ravindran, Nikolay Shirokikh

2025-10-16

Introduction

FracFixR is a compositional statistical framework for analyzing RNA-seq data from fractionation experiments. It addresses the fundamental challenge where library preparation and sequencing depth obscure the original proportions of RNA fractions.

This vignette demonstrates:

  1. Basic usage with polysome profiling data
  2. Differential proportion testing
  3. Visualization of results
  4. Advanced applications

Installation

# From CRAN
install.packages("FracFixR")

# From GitHub (development version)
devtools::install_github("Arnaroo/FracFixR")

Quick Start Example

Let’s simulate a simple polysome profiling experiment:

library(FracFixR)

# Set seed for reproducibility
set.seed(123)

Creating Example Data

# Simulate count data for 500 genes across 12 samples
n_genes <- 100
n_samples <- 12

# Generate count matrix with varying expression levels
total_counts <- matrix(
  rnbinom(n_genes * 4, mu = 100, size = 10),
  nrow = n_genes,
  dimnames = list(
    paste0("Gene", 1:n_genes),
    paste0("Sample", 1:4)
  )
)
prob=c(1/4,1/4,1/2)

# distribute counts evenly accross different fractions in Control
multinom_samples <- apply(total_counts, c(1, 2), function(x) rmultinom(1, size = x, prob = prob))
reshaped <- aperm(multinom_samples, c(3, 2, 1))  # now (n, 4, 3)
reshaped_matrix1 <- matrix(reshaped, nrow = n_genes, ncol = 12)
colnames(reshaped_matrix1) <- paste0("V", rep(1:4, each = 3), "_p", 1:3)

counts<-cbind(total_counts[,1:2],reshaped_matrix1[,c(2:3,5:6)],total_counts[,3:4],reshaped_matrix1[,c(8:9,11:12)])

# Create annotation data frame
annotation <- data.frame(
  Sample = colnames(counts),
  Condition = rep(c("Control", "Treatment"), each = 6),
  Type = rep(c("Total", "Total", "Light_Polysome", "Heavy_Polysome","Light_Polysome", "Heavy_Polysome"), 2),
  Replicate = c("Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2), "Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2))
)

print(head(annotation))
#>    Sample Condition           Type Replicate
#> 1 Sample1   Control          Total      Rep1
#> 2 Sample2   Control          Total      Rep2
#> 3   V1_p2   Control Light_Polysome      Rep1
#> 4   V1_p3   Control Heavy_Polysome      Rep1
#> 5   V2_p2   Control Light_Polysome      Rep2
#> 6   V2_p3   Control Heavy_Polysome      Rep2

Running FracFixR

# Run the main analysis
results <- FracFixR(MatrixCounts = counts, Annotation = annotation)
#> Setting up parallel processing...
#> Filtering transcripts based on Total samples...
#> Retained 100 transcripts present in Total samples
#> 
#> Processing condition 1/2: Control
#>   Selecting transcripts with 70-96% quantiles (range: 214.3 - 296.1)
#>   Selected 26 transcripts for regression
#>   Processing replicate 1/2: Rep1
#>   Processing replicate 2/2: Rep2
#> 
#> Processing condition 2/2: Treatment
#>   Selecting transcripts with 70-96% quantiles (range: 217.3 - 280.0)
#>   Selected 26 transcripts for regression
#>   Processing replicate 1/2: Rep1
#>   Processing replicate 2/2: Rep2
#> 
#> Finalizing results...
#> FracFixR analysis complete!

# View the structure of results
names(results)
#> [1] "OriginalData"  "Annotation"    "Propestimates" "Coefficients" 
#> [5] "Fractions"     "plots"

Understanding the Output

The FracFixR output contains several components:

# 1. Fraction proportions for each replicate
print(results$Fractions)
#>   Light_Polysome Heavy_Polysome      Lost      Replicate
#> 1     0.08560270     0.00123615 0.9131612   Control_Rep1
#> 2     0.04790483     0.06823600 0.8838592   Control_Rep2
#> 3     0.03838147     0.05640819 0.9052103 Treatment_Rep1
#> 4     0.08793944     0.10989656 0.8021640 Treatment_Rep2

# 2. Regression coefficients
print(results$Coefficients)
#>       Lost Light_Polysome Heavy_Polysome      Replicate
#> 1 226.1183      0.6616193    0.009933183   Control_Rep1
#> 2 219.9686      0.3981329    0.547867870   Control_Rep2
#> 3 223.3712      0.3011167    0.234843211 Treatment_Rep1
#> 4 201.5765      0.3582424    0.424722648 Treatment_Rep2

# 3. Estimated Proportions (first 5 genes, first 6 samples)
print(results$Propestimates[1:5, 1:6])
#>       Sample1 Sample2      V1_p2        V1_p3      V2_p2      V2_p3
#> Gene1     253     250 0.17673392 0.0021771360 0.05453876 0.15010079
#> Gene2     237     236 0.05319552 0.0006988169 0.04201403 0.03579036
#> Gene3     249     238 0.07548677 0.0007666551 0.02538431 0.03309269
#> Gene4     243     248 0.10875934 0.0031976685 0.05999263 0.12758567
#> Gene5     241     237 0.06175113 0.0011478345 0.02123376 0.05113433

Visualizing Fraction Proportions

# Create fraction plot
frac_plot <- PlotFractions(results)
print(frac_plot)
Fraction proportions across replicates
Fraction proportions across replicates

The plot shows: - The proportion of RNA in each fraction - The “Lost” fraction (grey) representing unrecoverable material - Consistency across replicates

Differential Proportion Testing

Now let’s identify genes with differential polysome association between conditions:

# Test for differential proportion in heavy polysomes
diff_heavy <- DiffPropTest(
  NormObject = results,
  Conditions = c("Control", "Treatment"),
  Types = "Heavy_Polysome",
  Test = "Logit"
)
#> Performing Logit test comparing Conditions Control and Treatment in the Heavy_Polysome fraction
#> Applying FDR correction...
#> Analysis complete. Found 14 transcripts with padj < 0.01

# View top differentially associated genes
top_genes <- diff_heavy[order(diff_heavy$padj), ]
print(head(top_genes, 10))
#>    transcript mean_success_cond1 mean_success_cond2  mean_diff   log2FC
#> 72     Gene72         0.02617524          0.1390912 0.11291599 2.828417
#> 82     Gene82         0.04533465          0.1535843 0.10824963 2.352886
#> 50     Gene50         0.06573491          0.1267624 0.06102752 2.633829
#> 6       Gene6         0.02014019          0.1492147 0.12907447 2.603087
#> 11     Gene11         0.01350989          0.1079133 0.09440345 3.669058
#> 29     Gene29         0.02035831          0.1017006 0.08134232 2.841302
#> 33     Gene33         0.01628909          0.1063028 0.09001371 3.070389
#> 35     Gene35         0.02986100          0.1431196 0.11325857 2.242306
#> 40     Gene40         0.03166690          0.1083089 0.07664199 2.710986
#> 46     Gene46         0.02526712          0.1347122 0.10944504 2.256062
#>            pval        padj
#> 72 3.099748e-05 0.001549874
#> 82 1.675630e-05 0.001549874
#> 50 5.916326e-05 0.001972109
#> 6  8.567553e-05 0.002141888
#> 11 7.595966e-04 0.006124418
#> 29 6.455671e-04 0.006124418
#> 33 7.840268e-04 0.006124418
#> 35 7.961743e-04 0.006124418
#> 40 6.586108e-04 0.006124418
#> 46 7.268264e-04 0.006124418

Interpreting Results

Positive mean_diff indicates higher proportion in Treatment.

Creating a Volcano Plot

volcano <- PlotComparison(
  diff_heavy,
  Conditions = c("Control", "Treatment"),
  Types = "Heavy_Polysome",
  cutoff = 20
)
print(volcano)
Volcano plot of differential polysome association
Volcano plot of differential polysome association

Data Requirements

For real experiments, ensure your data meets these requirements:

  1. Count Matrix: Raw counts (not normalized)
  2. Annotation: Must include:
    • Sample names matching count matrix columns
    • At least one “Total” sample per condition-replicate
    • Consistent fraction names across replicates

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
#>  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Australia/Sydney
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] FracFixR_1.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] future.apply_1.20.0 gtable_0.3.6        jsonlite_2.0.0     
#>  [4] dplyr_1.1.4         compiler_4.4.2      tidyselect_1.2.1   
#>  [7] parallel_4.4.2      tidyr_1.3.1         jquerylib_0.1.4    
#> [10] globals_0.18.0      scales_1.4.0        yaml_2.3.10        
#> [13] fastmap_1.2.0       ggplot2_3.5.2       R6_2.6.1           
#> [16] labeling_0.4.3      generics_0.1.4      knitr_1.50         
#> [19] future_1.67.0       tibble_3.3.0        bslib_0.9.0        
#> [22] pillar_1.11.0       RColorBrewer_1.1-3  rlang_1.1.6        
#> [25] cachem_1.1.0        xfun_0.52           nnls_1.6           
#> [28] sass_0.4.10         aod_1.3.3           cli_3.6.5          
#> [31] withr_3.0.2         magrittr_2.0.3      digest_0.6.37      
#> [34] grid_4.4.2          lifecycle_1.0.4     vctrs_0.6.5        
#> [37] evaluate_1.0.4      glue_1.8.0          farver_2.1.2       
#> [40] listenv_0.9.1       codetools_0.2-20    parallelly_1.45.1  
#> [43] purrr_1.1.0         rmarkdown_2.29      matrixStats_1.5.0  
#> [46] tools_4.4.2         pkgconfig_2.0.3     htmltools_0.5.8.1

References

Cleynen A, Ravindran A, Shirokikh N (2025). FracFixR: A compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data.

For more examples and advanced usage, see the FracFixR GitHub repository.