| Type: | Package | 
| Title: | Statistical Methods for Microbiome Compositional Data | 
| Version: | 1.2 | 
| Date: | 2024-03-13 | 
| Author: | Xianyang Zhang [aut], Jun Chen [aut, cre], Huijuan Zhou [ctb] | 
| Maintainer: | Jun Chen <chen.jun2@mayo.edu> | 
| Description: | A suite of methods for powerful and robust microbiome data analysis addressing zero-inflation, phylogenetic structure and compositional effects (Zhou et al. (2022)<doi:10.1186/s13059-022-02655-5>). The methods can be applied to the analysis of other (high-dimensional) compositional data arising from sequencing experiments. | 
| Depends: | R (≥ 3.5.0) | 
| Imports: | ggplot2, matrixStats, parallel, stats, utils, Matrix, statmod, MASS, ggrepel, lmerTest, foreach, modeest | 
| NeedsCompilation: | yes | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| Packaged: | 2024-04-01 22:00:40 UTC; m123485 | 
| Repository: | CRAN | 
| Date/Publication: | 2024-04-01 22:30:02 UTC | 
Linear (Lin) Model for Differential Abundance (DA) Analysis of High-dimensional Compositional Data
Description
The function implements a simple, robust and highly scalable approach to tackle the compositional effects in differential abundance analysis of high-dimensional compositional data. It fits linear regression models on the centered log2-ratio transformed data, identifies a bias term due to the transformation and compositional effect, and corrects the bias using the mode of the regression coefficients. It could fit mixed-effect models for analysis of correlated data.
Usage
linda(
  feature.dat,
  meta.dat,
  formula,
  feature.dat.type = c('count', 'proportion'),
  prev.filter = 0,
  mean.abund.filter = 0, 
  max.abund.filter = 0,
  is.winsor = TRUE,
  outlier.pct = 0.03,
  adaptive = TRUE,
  zero.handling = c('pseudo-count', 'imputation'),
  pseudo.cnt = 0.5,
  corr.cut = 0.1,
  p.adj.method = "BH",
  alpha = 0.05,
  n.cores = 1, 
  verbose = TRUE
)
Arguments
| feature.dat | a matrix of counts/proportions, row - features (OTUs, genes, etc) , column - samples. | 
| meta.dat | a data frame containing the sample meta data. If there are NAs, the corresponding samples will be removed in the analysis. | 
| formula | a character string for the formula. The formula should conform to that used by  | 
| feature.dat.type | the type of the feature data. It could be "count" or "proportion". | 
| prev.filter | the prevalence (percentage of non-zeros) cutoff, under which the features will be filtered. The default is 0. | 
| mean.abund.filter | the mean relative abundance cutoff, under which the features will be filtered. The default is 0. | 
| max.abund.filter | the max relative abundance cutoff, under which the features will be filtered. The default is 0. | 
| is.winsor | a logical value indicating whether winsorization should be performed to replace outliers (high values). The default is TRUE. | 
| outlier.pct | the expected percentage of outliers. These outliers will be winsorized. The default is 0.03. | 
| adaptive | a logical value indicating whether the approach to handle zeros (pseudo-count or imputation)
will be determined based on the correlations between the log(sequencing depth) and the explanatory variables
in  | 
| zero.handling | a character string of 'pseudo-count' or 'imputation' indicating the zero handling method
used when  | 
| pseudo.cnt | a positive numeric value for the pseudo-count to be added if  | 
| corr.cut | a numerical value between 0 and 1, indicating the significance level used for determining
the zero-handling approach when  | 
| p.adj.method | a character string indicating the p-value adjustment approach for 
addressing multiple testing. See R function  | 
| alpha | a numerical value between 0 and 1 indicating the significance level for declaring differential features. Default is 0.05. | 
| n.cores | a positive integer. If  | 
| verbose | a logical value indicating whether the trace information should be printed out. | 
Value
A list with the elements
| variables | A vector of variable names of all fixed effects in  | 
| bias | numeric vector; each element corresponds to one variable in  | 
| output | a list of data frames with columns 'baseMean', 'log2FoldChange', 'lfcSE', 'stat', 'pvalue', 'padj', 'reject',
'df';  
 | 
| covariance | a list of data frames; the data frame records the covariances between a regression coefficient with other coefficients;
 | 
| otu.tab.use | the OTU table used in the abundance analysis (the  | 
| meta.use | the meta data used in the abundance analysis (only variables in  | 
| wald | a list for use in Wald test. If the fitting model is a linear model, then it includes 
 If the fitting model is a linear mixed-effect model, then it includes 
 | 
Author(s)
Huijuan Zhou, Jun Chen, Xianyang Zhang
References
Zhou, H., He, K., Chen, J., Zhang, X. (2022). LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome biology, 23(1), 95.
Examples
data(smokers)
ind <- smokers$meta$AIRWAYSITE == 'Throat'
otu.tab <- as.data.frame(smokers$otu[, ind])
depth <- colSums(otu.tab)
meta <- cbind.data.frame(Smoke = factor(smokers$meta$SMOKER[ind]),
                         Sex = factor(smokers$meta$SEX[ind]),
                         Site = factor(smokers$meta$SIDEOFBODY[ind]),
                         SubjectID = factor(smokers$meta$HOST_SUBJECT_ID[ind]))
# Differential abundance analysis using the left throat data                       
ind1 <- meta$Site == 'Left' & depth >= 1000
linda.obj  <- linda(otu.tab[, ind1], meta[ind1, ], formula = '~Smoke+Sex', 
           feature.dat.type = 'count', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
           
           
linda.plot(linda.obj, c('Smokey', 'Sexmale'),
           titles = c('Smoke: n v.s. y', 'Sex: female v.s. male'), 
           alpha = 0.1, lfc.cut = 1, legend = TRUE, directory = NULL,
            width = 11, height = 8)
            
rownames(linda.obj $output[[1]])[which(linda.obj $output[[1]]$reject)]
# Differential abundance analysis pooling both the left and right throat data 
# Mixed effects model is used 
ind  <- depth >= 1000
linda.obj <- linda(otu.tab[, ind], meta[ind, ], formula = '~Smoke+Sex+(1|SubjectID)',
           feature.dat.type = 'count', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
       
    
# For proportion data   
otu.tab.p <- t(t(otu.tab) / colSums(otu.tab))
ind1 <- meta$Site == 'Left' & depth >= 1000
lind.obj  <- linda(otu.tab[, ind1], meta[ind1, ], formula = '~Smoke+Sex', 
           feature.dat.type = 'proportion', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
Plot LinDA Results
Description
The function produces the effect size plot of the differential features and volcano plot based on the output from linda.
Usage
linda.plot(
  linda.obj,
  variables.plot,
  titles = NULL,
  alpha = 0.05,
  lfc.cut = 1,
  legend = FALSE,
  directory = NULL,
  width = 11,
  height = 8
)
Arguments
| linda.obj | return from function  | 
| variables.plot | vector; variables whose results are to be plotted. For example, suppose the return
value  | 
| titles | vector; titles of the effect size plot and volcano plot for each variable in  | 
| alpha | a numerical value between 0 and 1; cutoff for  | 
| lfc.cut | a positive numerical value; cutoff for  | 
| legend | TRUE or FALSE; whether to show the legends of the effect size plot and volcano plot. | 
| directory | character; the directory to save the figures, e.g.,  | 
| width | the width of the graphics region in inches. See R function  | 
| height | the height of the graphics region in inches. See R function  | 
Value
A list of ggplot2 objects.
| plot.lfc | a list of effect size plots. Each plot corresponds to one variable in  | 
| plot.volcano | a list of volcano plots. Each plot corresponds to one variable in  | 
Author(s)
Huijuan Zhou, Jun Chen, Xianyang Zhang
References
Zhou, H., He, K., Chen, J., Zhang, X. (2022). LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome biology, 23(1), 95.
Examples
data(smokers)
ind <- smokers$meta$AIRWAYSITE == 'Throat' & smokers$meta$SIDEOFBODY == 'Left'
otu.tab <- as.data.frame(smokers$otu[, ind])
depth <- colSums(otu.tab)
meta <- cbind.data.frame(Smoke = factor(smokers$meta$SMOKER[ind]),
                         Sex = factor(smokers$meta$SEX[ind]))
                         
ind  <- depth >= 1000
linda.obj <- linda(otu.tab[, ind], meta[ind, ], formula = '~Smoke+Sex',
           feature.dat.type = 'count', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
           
linda.plot(linda.obj, c('Smokey', 'Sexmale'),
           titles = c('Smoke: n v.s. y', 'Sex: female v.s. male'), 
           alpha = 0.1, lfc.cut = 1, legend = TRUE, directory = NULL,
            width = 11, height = 8)
Wald test for bias-corrected regression coefficients
Description
The function implements Wald test for bias-corrected regression coefficients generated from the linda function.
One can utilize the function to perform ANOVA-type analyses. For example, if you have a cateogrical variable with three levels, you can test whether all levels have the same effect.
Usage
linda.wald.test(
  linda.obj,
  L,
  model = c("LM", "LMM"),
  alpha = 0.05,
  p.adj.method = "BH"
)
Arguments
| linda.obj | return from the  | 
| L | A matrix for testing  | 
| model | 
 | 
| alpha | significance level for testing  | 
| p.adj.method | P-value adjustment approach. See R function  | 
Value
A data frame with columns
| Fstat | Wald statistics for each taxon | 
| df1 | The numerator degrees of freedom | 
| df2 | The denominator degrees of freedom | 
| pvalue | 
 | 
| padj | 
 | 
| reject | 
 | 
Author(s)
Huijuan Zhou huijuanzhou2019@gmail.com Jun Chen Chen.Jun2@mayo.edu Xianyang Zhang zhangxiany@stat.tamu.edu
References
Zhou, H., He, K., Chen, J., Zhang, X. (2022). LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome biology, 23(1), 95.
Examples
data(smokers)
ind <- smokers$meta$AIRWAYSITE == 'Throat'
otu.tab <- as.data.frame(smokers$otu[, ind])
depth <- colSums(otu.tab)
meta <- cbind.data.frame(Smoke = factor(smokers$meta$SMOKER[ind]),
                         Sex = factor(smokers$meta$SEX[ind]),
                         Site = factor(smokers$meta$SIDEOFBODY[ind]),
                         SubjectID = factor(smokers$meta$HOST_SUBJECT_ID[ind]))
ind  <- depth >= 1000
linda.obj <- linda(otu.tab[, ind], meta[ind, ], formula = '~Smoke+Sex+(1|SubjectID)',
           feature.dat.type = 'count', 
           prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
           p.adj.method = "BH", alpha = 0.1)
#  L matrix (2x3) is designed to test the second (Smoke) and the third (Sex) coefficient to be 0.
# For a categorical variable > two levels, similar L can be designed to do ANOVA-type test. 
L <- matrix(c(0, 1, 0, 0, 0, 1), nrow = 2, byrow = TRUE)
result <- linda.wald.test(linda.obj, L, 'LMM', alpha = 0.1)
Microbiome data from the human upper respiratory tract (left and right throat)
Description
A dataset containing "otu", "tax", meta", "genus", family"
Usage
data(smokers)
Format
A list with elements
- otu
- otu table, 2156 taxa by 290 samples 
- tax
- taxonomy table, 2156 taxa by 7 taxonomic ranks 
- meta
- meta table, 290 samples by 53 sample variables 
- genus
- 304 by 290 
- family
- 113 by 290 
Source
https://qiita.ucsd.edu/ study ID:524 Reference: Charlson ES, Chen J, Custers-Allen R, Bittinger K, Li H, et al. (2010) Disordered Microbial Communities in the Upper Respiratory Tract of Cigarette Smokers. PLoS ONE 5(12): e15216.