The purpose of tab is to make it easier to create tables for papers, including Table 1’s showing characteristics of the sample and summary tables for fitted regression models. Currently, the following functions are included:
tabmeans compares means in two or more groups.tabmedians compares medians in two or more groups.tabfreq compares frequencies in two or more groups.tabmulti compares multiple variables in two or more groups.tabmeans.svy, tabmedians.svy, tabfreq.svy, and tabmulti.svy serve the same purposes as the above functions, but for complex survey data.tabglm summarizes generalized linear models (GLM’s) fit via glm or survey::svyglm.tabgee summarizes generalized estimating equation models (GEE’s) fit via gee::gee.tabcoxph summarizes Cox Proportional Hazards models fit via survival::coxph or survey::svycoxph.You can use tabmulti to compare characteristics across levels of a factor variable, e.g. here comparing age, sex, and race by treatment group in the toy dataset tabdata.
tabmulti(Age + Sex + Race ~ Group, data = tabdata) %>% kable()| Variable | Control | Treatment | P | 
|---|---|---|---|
| Age, M (SD) | 70.5 (5.3) | 69.5 (5.9) | 0.15 | 
| Sex, n (%) | <0.001 | ||
| Female | 93 (68.4) | 62 (38.5) | |
| Male | 43 (31.6) | 99 (61.5) | |
| Race, n (%) | 0.29 | ||
| White | 46 (34.1) | 65 (39.6) | |
| Black | 36 (26.7) | 52 (31.7) | |
| Mexican American | 21 (15.6) | 19 (11.6) | |
| Other | 32 (23.7) | 28 (17.1) | 
To illustrate some options, we can request Age and Race to print as Age (years) and Race/ethnicity, compare medians rather than means for age, and include the sample sizes in the column headings:
tabmulti(Age + Sex + Race ~ Group, data = tabdata, 
         yvarlabels = list(Age = "Age (years)", Race = "Race/ethnicity"), 
         ymeasures = c("median", "freq", "freq"), 
         listwise.deletion = TRUE, 
         n.headings = TRUE) %>% kable()| Variable | Control (n = 134) | Treatment (n = 158) | P | 
|---|---|---|---|
| Age (years), Median (IQR) | 70.0 (9.8) | 69.0 (11.0) | 0.19 | 
| Sex, n (%) | <0.001 | ||
| Female | 92 (68.7) | 60 (38.0) | |
| Male | 42 (31.3) | 98 (62.0) | |
| Race/ethnicity, n (%) | 0.26 | ||
| White | 46 (34.3) | 64 (40.5) | |
| Black | 36 (26.9) | 50 (31.6) | |
| Mexican American | 21 (15.7) | 17 (10.8) | |
| Other | 31 (23.1) | 27 (17.1) | 
Logistic regression for 1-year mortality vs. age, sex, and treatment, with the binary factor variables displayed in a “compressed” format:
fit <- glm(death_1yr ~ Age + Sex + Group, data = tabdata, family = binomial)
fit %>% tabglm(factor.compression = 5) %>% kable()| Variable | Beta (SE) | OR (95% CI) | P | 
|---|---|---|---|
| Intercept | -2.02 (1.76) | - | 0.25 | 
| Age | 0.02 (0.02) | 1.02 (0.97, 1.07) | 0.50 | 
| Male | 0.11 (0.29) | 1.12 (0.63, 1.97) | 0.70 | 
| Treatment | -0.04 (0.29) | 0.96 (0.54, 1.69) | 0.88 | 
GEE for high blood pressure (measured at 3 time points longitudinally) vs. various predictors, with some higher-order terms:
tabdata2 <- reshape(data = tabdata,
                    varying = c("bp.1", "bp.2", "bp.3", "highbp.1", "highbp.2", "highbp.3"),
                    timevar = "bp.visit", direction = "long")
tabdata2 <- tabdata2[order(tabdata2$id), ]
fit <- gee(highbp ~ poly(Age, 2, raw = TRUE) + Sex + Race + Race*Sex,
           id = id, data = tabdata2, family = "binomial", corstr = "unstructured")
fit %>% tabgee(data = tabdata2) %>% kable()| Variable | Beta (SE) | OR (95% CI) | P | 
|---|---|---|---|
| Intercept | -3.10 (14.84) | - | 0.83 | 
| Age | 0.06 (0.43) | 1.06 (0.46, 2.45) | 0.89 | 
| Age squared | -0.00 (0.00) | 1.00 (0.99, 1.01) | 0.88 | 
| Sex | |||
| Female (ref) | - | - | - | 
| Male | 0.48 (0.29) | 1.61 (0.91, 2.84) | 0.10 | 
| Race | |||
| White (ref) | - | - | - | 
| Black | 0.04 (0.32) | 1.04 (0.56, 1.95) | 0.90 | 
| Mexican American | 0.13 (0.38) | 1.14 (0.55, 2.39) | 0.72 | 
| Other | -0.83 (0.37) | 0.43 (0.21, 0.89) | 0.02 | 
| Sex by Race | |||
| Male, Black | 0.23 (0.42) | 1.26 (0.55, 2.87) | 0.58 | 
| Male, Mexican American | 0.27 (0.54) | 1.31 (0.46, 3.75) | 0.61 | 
| Male, Other | 1.11 (0.51) | 3.05 (1.12, 8.25) | 0.03 | 
Note that we had to set data = tabdata2 here, because gee objects don’t store all of the information on factor variables (unlike glm objects).
Survival model for mortality vs. predictors, again compressing the factor variables, and requesting slightly differnet columns (i.e. no p-values):
library("survival")
fit <- coxph(Surv(time = time, event = delta) ~ Age + Sex + Group, data = tabdata)
fit %>% tabcoxph(factor.compression = 5, columns = c("beta", "hr.ci")) %>% kable()| Variable | Beta | HR (95% CI) | 
|---|---|---|
| Age | 0.03 | 1.03 (1.00, 1.06) | 
| Male | 0.01 | 1.01 (0.74, 1.39) | 
| Treatment | -0.05 | 0.95 (0.69, 1.30) | 
The functions in tab can also accommodate complex survey data. To illustrate with the included dataset tabsvydata (which is data from NHANES 2003-2004, except for the made-up variables time and event), here’s a Table 1:
library("survey")
design <- svydesign(
  data = tabsvydata,
  ids = ~sdmvpsu,
  strata = ~sdmvstra,
  weights = ~wtmec2yr,
  nest = TRUE
)
tabmulti.svy(Age + Race + BMI ~ Sex, design = design) %>% kable()| Variable | Female | Male | P | 
|---|---|---|---|
| Age, M (SD) | 37.0 (22.5) | 34.8 (21.7) | <0.001 | 
| Race, % (SE) | 0.08 | ||
| Non-Hispanic White | 69.7 (3.7) | 69.6 (3.8) | |
| Non-Hispanic Black | 13.2 (2.0) | 11.9 (1.9) | |
| Mexican American | 8.6 (2.1) | 9.8 (2.2) | |
| Other | 8.4 (1.0) | 8.8 (1.3) | |
| BMI, M (SD) | 26.4 (7.5) | 26.0 (6.4) | 0.11 | 
And here’s a linear regression:
fit <- svyglm(BMI ~ Age + Sex + Race, design = design)
fit %>% tabglm(factor.compression = 3) %>% kable()| Variable | Beta (SE) | 95% CI | P | 
|---|---|---|---|
| Intercept | 20.95 (0.34) | (20.27, 21.62) | <0.001 | 
| Age | 0.14 (0.00) | (0.13, 0.15) | <0.001 | 
| Female (ref) | - | - | - | 
| Male | -0.07 (0.23) | (-0.51, 0.37) | 0.76 | 
| Non-Hispanic White (ref) | - | - | - | 
| Non-Hispanic Black | 1.91 (0.23) | (1.46, 2.35) | <0.001 | 
| Mexican American | 1.06 (0.30) | (0.47, 1.66) | 0.006 | 
| Other | -1.09 (0.33) | (-1.73, -0.45) | 0.007 | 
All of the functions in tab have an argument called print.html which can be used to export tables to word processors. Setting print.html = TRUE will result in a HTML table being output to your current working directory. You can open the table (e.g. in Chrome) and copy/paste into your report.
I used knitr’s kable function for the examples here, but other approaches should also work (e.g. xtable’s xtable or pandoc’s pandoc.table).
Lumley, Thomas. 2019. Survey: Analysis of Complex Survey Samples. https://CRAN.R-project.org/package=survey.
Lumley, Thomas, and others. 2004. “Analysis of Complex Survey Samples.” Journal of Statistical Software 9 (1): 1–19.
R by Thomas Lumley, Vincent J Carey. Ported to, and Brian Ripley. Note that maintainers are not available to give advice on using a package they did not author. 2015. Gee: Generalized Estimation Equation Solver. https://CRAN.R-project.org/package=gee.
Terry M. Therneau, and Patricia M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. New York: Springer.
Therneau, Terry M. 2015. A Package for Survival Analysis in S. https://CRAN.R-project.org/package=survival.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.
———. 2018. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.name/knitr/.