| Title: | Functions to Facilitate Exploratory Data Analysis | 
| Version: | 1.0.3 | 
| Description: | Functions for descriptive statistics, data management, and data visualization. | 
| Depends: | R (≥ 3.5.0) | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.1.2 | 
| License: | MIT + file LICENSE | 
| VignetteBuilder: | knitr | 
| BugReports: | https://github.com/rkabacoff/qacBase/issues | 
| URL: | https://github.com/rkabacoff/qacBase | 
| Suggests: | rmarkdown, knitr, kableExtra | 
| Imports: | ggplot2, dplyr, tidyr, ggcorrplot, multcompView, PMCMRplus, crayon, purrr, haven, rlang, ggExtra, patchwork | 
| NeedsCompilation: | no | 
| Packaged: | 2022-02-09 21:57:41 UTC; rkaba | 
| Author: | Kabacoff Robert [aut, cre], Barich Griffen [ctb], Jamrog Kelly [ctb], Kravchenko Elizaveta [ctb], Kuruvilla Jacob [ctb], Liu Lex [ctb], Nakamura Shota [ctb], Pham Kim [ctb], Rodriguez Belen [ctb], Ross Shane [ctb], Russo Chris [ctb], Corpuz Frederick [ctb], Juradat Nurah [ctb], Karp Harrison [ctb], Koech Kevin [ctb], Peters Anna [ctb], Shah Dhhyey [ctb], Stevenson Kenneth [ctb], Thomas-Franz Kaitlyn [ctb], Zheng Jiner [ctb], Aldarmaki Ahmed [ctb], Alneyadi Mohammed [ctb], Altai Chossis [ctb], Colorado Sofia [ctb], Northrop Blake [ctb], Peretz Shea [ctb], Qin Cher [ctb], Tuhabonye Emma [ctb], Wong Phillip [ctb] | 
| Maintainer: | Kabacoff Robert <rkabacoff@wesleyan.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2022-02-09 22:20:02 UTC | 
qacBase: Functions to Facilitate Exploratory Data Analysis
Description
Functions for descriptive statistics, data management, and data visualization.
Author(s)
Maintainer: Kabacoff Robert rkabacoff@wesleyan.edu
Other contributors:
- Barich Griffen [contributor] 
- Jamrog Kelly [contributor] 
- Kravchenko Elizaveta [contributor] 
- Kuruvilla Jacob [contributor] 
- Liu Lex [contributor] 
- Nakamura Shota [contributor] 
- Pham Kim [contributor] 
- Rodriguez Belen [contributor] 
- Ross Shane [contributor] 
- Russo Chris [contributor] 
- Corpuz Frederick [contributor] 
- Juradat Nurah [contributor] 
- Karp Harrison [contributor] 
- Koech Kevin [contributor] 
- Peters Anna [contributor] 
- Shah Dhhyey [contributor] 
- Stevenson Kenneth [contributor] 
- Thomas-Franz Kaitlyn [contributor] 
- Zheng Jiner [contributor] 
- Aldarmaki Ahmed [contributor] 
- Alneyadi Mohammed [contributor] 
- Altai Chossis [contributor] 
- Colorado Sofia [contributor] 
- Northrop Blake [contributor] 
- Peretz Shea [contributor] 
- Qin Cher [contributor] 
- Tuhabonye Emma [contributor] 
- Wong Phillip [contributor] 
See Also
Useful links:
Barcharts
Description
Create barcharts for all categorical variables in a data frame.
Usage
barcharts(
  data,
  fill = "deepskyblue2",
  color = "grey30",
  labels = TRUE,
  sort = TRUE,
  maxcat = 20,
  abbrev = 20
)
Arguments
| data | data frame | 
| fill | fill color for bars | 
| color | color for bar labels | 
| labels | if  | 
| sort | if  | 
| maxcat | numeric. barcharts with more than this number of bars will not be plotted. | 
| abbrev | numeric. abbreviate bar labels to at most, this character length. | 
Value
a ggplot graph
Examples
barcharts(cars74)
Automobile characteristics
Description
Cars dataset with features including make, model, year, engine, and other properties of the car used to predict its price.
Usage
cardata
Format
A data frame with 11914 rows and 16 variables. The variables are as follows:
- make
- car brand 
- model
- model given by its brand 
- year
- year of manufacture 
- engine_fuel_type
- type of fuel required by its manufacturer 
- engine_hp
- engine horse power 
- engine_cylinders
- number of cylinders 
- transmission_type
- automatic vs. manual 
- driven_wheels
- AWD, FWD, AWD 
- number_of_doors
- Number of Doors 
- market_category
- Luxury, Performance, Hatchback, etc. 
- vehicle_size
- Compact, Midsize, Large 
- vehicle_style
- Type of Vehicle: Sedan, SUV, Coupe, etc. 
- highway_mpg
- highway miles per gallon 
- city_mpg
- city miles per gallon 
- popularity
- Popularity index 
- msrp
- manufacturer's suggested retail price 
Details
This package contains a detailed car dataset.
Source
Taken from Kaggle https://www.kaggle.com/CooperUnion/cardataset.
Examples
summary(cardata)
Motor Trend car road tests
Description
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
Usage
cars74
Format
A data frame with 32 rows and 11 variables. The variables are as follows:
- auto
- highway miles per gallon 
- mpg
- Miles/(US) gallon 
- cyl
- Number of cylinders 
- disp
- Displacement (cu.in.) 
- hp
- Gross horsepower 
- drat
- Rear axle ratio 
- wt
- Weight (1000 lbs) 
- qsec
- 1/4 mile time 
- vs
- Engine cylinder configuration 
- am
- Transmission type 
- gear
- Number of forward gears 
- carb
- Number of carburetors 
Details
This dataset is the mtcars dataset that comes
with base R. However, cyl, vs, am, gear
and carb have been converted
to factors and rownames have been converted to the variable auto.
A description of the variables by Soren Heitmann can be found
here.
Source
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411.
Examples
summary(cars74)
Detailed description of a data frame
Description
contents provides a comprehensive description of a data
frame, including summary statistics for both quantitative and
categorical variables
Usage
contents(data, digits = 2, maxcat = 10, label_length = 20)
Arguments
| data | a data frame | 
| digits | number of decimal digits for statistics. | 
| maxcat | maximum number of levels of a character/factor variable to print. | 
| label_length | maximum length of factor level label to print. Longer labels will be truncated. | 
Details
Prints a comprehensive description of a data frame via several tables, a general summary table and tables that provide a breakdown of quantitative and categorical variables.
Value
a list with 6 components:
- dfname
- name of data frame 
- nrow
- number of rows 
- ncol
- number of columns 
- overall
- data frame of overall dataset characteristics 
- qvars
- data frame with summary statistics for quantitative variables 
- cvars
- data frame with summary statistics for categorical variables 
Examples
contents(cars74)
Correlation matrix plot
Description
Create a correlation matrix for all quantitative variables in a data frame.
Usage
cor_plot(
  data,
  method = c("pearson", "kendall", "spearman"),
  sort = FALSE,
  axis_text_size = 12,
  number_text_size = 3,
  legend = FALSE
)
Arguments
| data | data frame | 
| method | a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman". | 
| sort | logical. If  | 
| axis_text_size | size for axis labels (default=12). | 
| number_text_size | size for correlation coefficient labels (default=3). | 
| legend | logical, if TRUE the legend is displayed. (default=FALSE) | 
Details
The cor_plot function will only select quantitative variables from
a data frame. Categorical variables are ignored.
The correlation matrix is presented as a lower triangle matrix.
Missing values are deleted in listwise fashion.
Value
a ggplot graph
Note
This function is a wrapper for the ggcorrplot function.
Examples
cor_plot(cars74)
cor_plot(cars74, sort=TRUE)
Two-way frequency table
Description
This function creates a two way frequency table.
Usage
crosstab(
  data,
  rowvar,
  colvar,
  type = c("freq", "percent", "rowpercent", "colpercent"),
  total = TRUE,
  na.rm = TRUE,
  digits = 2,
  chisquare = FALSE,
  plot = FALSE
)
Arguments
| data | data frame | 
| rowvar | row factor (unquoted) | 
| colvar | column factor (unquoted) | 
| type | statistics to print. Options are  | 
| total | logical. if TRUE, includes total percents. | 
| na.rm | logical. if TRUE, deletes cases with missing values. | 
| digits | number of decimal digits to report for percents. | 
| chisquare | logical. If  | 
| plot | logical. If  | 
Details
Given a data frame, a row factor, a column factor, and a type (frequencies, cell percents, row percents, or column percents) the function provides the requested cross-tabulation.
If na.rm = FALSE, a level labeled <NA> added. If
total = TRUE, a level labeled Total is added. If
chisquare = TRUE, a chi-square test of independence is
performed.
Value
If plot=TRUE, return a ggplot2 graph.
Otherwise the function return a list with 6 components:
- table(table). Table of frequencies or percents
- type(character). Type of table to print
- total(logical). If- TRUE, print row and or column totals
- digits(numeric). number of digits to print
- rowname(character). Row variable name
- colname(character). Column variable name
- chisquare(character). If- chisquare=TRUE, contains the results of the Chi-square test.- NULLotherwise.
See Also
Examples
# print frequencies
crosstab(mtcars, cyl, gear)
# print cell percents
crosstab(cardata, vehicle_size, driven_wheels)
crosstab(cardata, vehicle_size, driven_wheels,
plot=TRUE)
crosstab(cardata, driven_wheels, vehicle_size,
type="colpercent", plot=TRUE, chisquare=TRUE)
Density plots
Description
Create desnsity plots for all quantitative variables in a data frame.
Usage
densities(data, fill = "deepskyblue2", adjust = 1)
Arguments
| data | data frame | 
| fill | fill color for density plots | 
| adjust | a factor multiplied by the smoothing bandwidth. See details. | 
Details
The densities function will only plot quantitative variables from
a data frame. Categorical variables are ignored.
The adjust parameter mulitplies the smoothing parameter. For example
adjust = 2 will make the density plots twice as smooth.
The adjust = 1/2 will make the density plots half as smooth (i.e., twice as spiky).
Value
a ggplot graph
Examples
densities(cars74)
densities(cars74, adjust=2)
densities(cars74, adjust=1/2)
Visualize a data frame
Description
df_plot visualizes the variables in a data frame.
Usage
df_plot(data)
Arguments
| data | a data frame. | 
Details
For each variable, the plot displays
- type ( - numeric,- integer,- factor,- ordered factor,- logical, or- date)
- percent of available (and missing) cases 
Variables are sorted by type and the total number of variables and cases are printed in the caption.
Value
a ggplot2 graph
See Also
For more descriptive statistics on a data frame see contents.
Examples
df_plot(cars74)
Test of group differences
Description
One-way analysis (ANOVA or Kruskal-Wallis Test) with post-hoc comparisons and plots
Usage
groupdiff(
  data,
  y,
  x,
  method = c("anova", "kw"),
  digits = 2,
  horizontal = FALSE,
  posthoc = FALSE
)
Arguments
| data | a data frame. | 
| y | a numeric response variable | 
| x | a categorical explanatory variable. It will coerced to be a factor. | 
| method | character. Either  | 
| digits | Number of significant digits to print. | 
| horizontal | logical. If  | 
| posthoc | logical. If  | 
Details
The groupdiff function performs one of two analyses:
- anova
- A one-way analysis of variance, with TukeyHSD post-hoc comparisons. 
- kw
- A Kruskal Wallis Rank Sum Test, with Conover Test post-hoc comparisons. 
In each case, summary statistics and a grouped boxplots are
provided. In the parametric case, the statistics are n, mean, and
standard deviation. In the nonparametric case the statistics are
n, median, and median absolute deviation. If posthoc = TRUE,
pairwise comparisons of superimposed on the boxplots.
Groups that share a letter are not significantly different (p < .05),
controlling for multiple comparisons.
Value
a list with 3 components:
- result
- omnibus test 
- summarystats
- summary statistics 
- plot
- ggplot2 graph 
See Also
kwAllPairsConoverTest, multcompLetters.
Examples
# parametric analysis
groupdiff(cars74, hp, gear)
# nonparametric analysis
groupdiff(cardata, popularity, vehicle_style, posthoc=TRUE,
          method="kw", horizontal=TRUE)
Histograms
Description
Create histograms for all quantitative variables in a data frame.
Usage
histograms(data, fill = "deepskyblue2", color = "white", bins = 30)
Arguments
| data | data frame | 
| fill | fill color for histogram bars | 
| color | border color for histogram bars | 
| bins | number of bins (bars) for the histograms | 
Details
The histograms function will only plot quantitative variables from
a data frame. Categorical variables are ignored.
Value
a ggplot graph
Examples
histograms(cars74)
histograms(cars74, bins=15, fill="darkred")
List object sizes and types
Description
lso lists object sizes and types.
Usage
lso(
  pos = 1,
  pattern,
  order.by = "Size",
  decreasing = TRUE,
  head = TRUE,
  n = 10
)
Arguments
| pos | a number specifying the environment as a position in the search list. | 
| pattern | an optional regular expression. Only names matching pattern are returned. glob2rx can be used to convert wildcard patterns to regular expressions. | 
| order.by | column to sort the list by. Values are  | 
| decreasing | logical. If  | 
| head | logical. Should output be limited to  | 
| n | if  | 
Details
This function list the sizes and types of all objects in an environment. By default, the list describes the objects in the current environment, presented in descending order by object size and reported in megabytes (Mb).
Value
a data.frame with four columns (Type, Size, Rows, Columns) and object names as row names.
Author(s)
Based on based on postings by Petr Pikal and David Hinds to the r-help list in 2004 and modified Dirk Eddelbuettel, Patrick McCann, and Rob Kabacoff.
References
https://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session/.
Examples
data(cardata)
data(cars74)
lso()
Normalize numeric variables
Description
Normalize the numeric variables in a data frame
Usage
normalize(data, new_min = 0, new_max = 1)
Arguments
| data | a data frame. | 
| new_min | minimum for the transformed variables. | 
| new_max | maximum for the transformed variables. | 
Details
normalize transforms all the numeric variables
in a data frame to have the same minimum and maximum values.
By default, this will be a minimum of 0 and maximum of 1.
Character variables and factors are left unchanged.
Value
a data frame
Note
Use this function to be transform variables into a given range. The default is [0, 1], but [-1, 1], [0, 100], or any other range is permissible.
Examples
head(cars74)
cars74_st <- normalize(cars74)
head(cars74_st)
Get help on a package
Description
phelp provides help on an installed package.
Usage
phelp(pckg)
Arguments
| pckg | The name of a package | 
Details
This function provides help on an installed package. The package does not have to be loaded. The package name does not need to be entered with quotes.
Value
No return value, called for side effects.
Examples
phelp(stats)
Plot a crosstab object
Description
This function plots the results of a calculated two-way frequency table.
Usage
## S3 method for class 'crosstab'
plot(x, size = 3.5, ...)
Arguments
| x | An object of class  | 
| size | numeric. Size of bar text labels. | 
| ... | no currently used. | 
Value
a ggplot2 graph
Examples
tbl <- crosstab(cars74, cyl, gear, type = "freq")
plot(tbl)
tbl <- crosstab(cars74, cyl, gear, type = "colpercent")
plot(tbl)
Plot a tab object
Description
Plot a frequency or cumulative frequency table
Usage
## S3 method for class 'tab'
plot(x, fill = "deepskyblue2", size = 3.5, ...)
Arguments
| x | An object of class  | 
| fill | Fill color for bars | 
| size | numeric. Size of bar text labels. | 
| ... | Parameters passed to a function | 
Value
a ggplot2 graph
Examples
tbl1 <- tab(cars74, carb)
plot(tbl1)
tbl2 <- tab(cars74, carb, sort = TRUE)
plot(tbl2)
tbl3 <- tab(cars74, carb, cum=TRUE)
plot(tbl3)
Print a contents object
Description
print.contents prints the results of the content function.
Usage
## S3 method for class 'contents'
print(x, ...)
Arguments
| x | a object of class  | 
| ... | not used. | 
Value
No return value, called for side effects.
Examples
testdata <- data.frame(height=c(4, 5, 3, 2, 100),
                       weight=c(39, 88, NA, 15, -2),
                       names=c("Bill","Dean", "Sam", NA, "Jane"),
                       race=c('b', 'w', 'w', 'o', 'b'))
x <- contents(testdata)
print(x)
Print a crosstab object
Description
This function prints the results of a calculated two-way frequency table.
Usage
## S3 method for class 'crosstab'
print(x, ...)
Arguments
| x | An object of class  | 
| ... | not currently used. | 
Value
No return value, called for side effects
Examples
mycrosstab <- crosstab(mtcars, cyl, gear, type = "freq", digits = 2)
print(mycrosstab)
mycrosstab <- crosstab(mtcars, cyl, gear, type = "rowpercent", digits = 3)
print(mycrosstab)
Print a tab object
Description
Print the results of calculating a frequency table
Usage
## S3 method for class 'tab'
print(x, ...)
Arguments
| x | An object of class  | 
| ... | Parameters passed to the print function | 
Value
No return value, called for side effects
Examples
frequency <- tab(cardata, make, sort = TRUE, na.rm = FALSE)
print(frequency)
Summary statistics for a quantitative variable
Description
This function provides descriptive statistics for a quantitative variable alone or separately by groups. Any function that returns a single numeric value can bue used.
Usage
qstats(data, x, ..., stats = c("n", "mean", "sd"), na.rm = TRUE, digits = 2)
Arguments
| data | data frame | 
| x | numeric variable in data (unquoted) | 
| ... | list of grouping variables | 
| stats | statistics to calculate (any function that produces a
numeric value), Default:  | 
| na.rm | if  | 
| digits | number of decimal digits to print, Default: 2 | 
Value
a data frame, where columns are grouping variables (optional) and statistics
Examples
# If no keyword arguments are provided, default values are used
qstats(mtcars, mpg, am, gear)
# You can supply as many (or no) grouping variables as needed
qstats(mtcars, mpg)
qstats(mtcars, mpg, am, cyl)
# You can specify your own functions (e.g., median,
# median absolute deviation, minimum, maximum))
qstats(mtcars, mpg, am, gear,
       stats = c("median", "mad", "min", "max"))
R Colors
Description
Plot a grid of R colors and their associated names
Usage
rcolors(color = NULL, cex = 0.6)
Arguments
| color | character. A text string used to search for specific color variations (see examples.) | 
| cex | numeric. text size for color labels. | 
Details
By default rcolors plots the basic 502 distinct colors provided by the
colors function. If a color name or part of a name is provided, only
colors with matching names are plotted.
Value
No return value, called for side effects
References
This function is adapted from code published by Karl W. Broman.
See Also
Examples
rcolors()
rcolors("blue")
rcolors("red")
rcolors("dark")
Recode one or more variables
Description
recodes recodes the values of one or more variables in
a data frame
Usage
recodes(data, vars, from, to)
Arguments
| data | a data frame. | 
| vars | character vector of variable names. | 
| from | a vector of values or conditions (see Details). | 
| to | a vector of replacement values. | 
Details
- For each variable in the - varsparameter, values are checked against the list of values in the- fromvector. If a value matches, it is replaced with the corresponding entry in the- tovector.
- Once a given observation's value matches a - fromvalue, it is recoded. That particular observation will not be recoded again by that- recodes()statement (i.e., no chaining).
- One or more values in the - fromvector can be an expression, using the dollar sign ($) to represent the variable being recoded. If the expression evaluates to- TRUE, the corresponding- tovalue is returned.
- If the number of values in the - tovector is less than the- fromvector, the values are recycled. This lets you convert several values to a single outcome value (e.g.,- NA).
- If the - tovalues are numeric, the resulting recoded variable will be numeric. If the variable being recoded is a factor and the- tovalues are character values, the resulting variable will remain a factor. If the variable being recoded is a character variable and the- tovalues are character values, the resulting variable will remain a character variable.
Value
a data frame
Note
See the vignette for detailed examples.
Examples
df <- data.frame(x = c(1, 5, 7, 3, 0),
                 y = c(9, 0, 5, 9, 2),
                 z = c(1, 1, 2, 2, 1)
                 )
df <- recodes(df, 
              vars = c("x", "y"), 
              from = 0, to = NA)
df <- recodes(df, 
              vars = "z", 
              from = c(1, 2), to = c("pass", "fail"))
Scatterplot
Description
Create a scatter plot between two quantitative variables.
Usage
scatter(
  data,
  x,
  y,
  outlier = 3,
  alpha = 1,
  digits = 3,
  title,
  margin = "none",
  stats = TRUE,
  point_color = "deepskyblue2",
  outlier_color = "violetred1",
  line_color = "grey30",
  margin_color = "deepskyblue2"
)
Arguments
| data | data frame | 
| x | quantitative predictor variable | 
| y | quantitative response variable | 
| outlier | number. Observations with studentized residuals larger than this value are flagged. If set to 0, observations are not flagged. | 
| alpha | Transparency of data points. A numeric value between 0 (completely transparent) and 1 (completely opaque). | 
| digits | Number of significant digits in displayed statistics. | 
| title | Optional title. | 
| margin | Marginal plots. If specified, parameter can be
 | 
| stats | logical. If  | 
| point_color | Color used for points. | 
| outlier_color | Color used to identify outliers (see the  | 
| line_color | Color for regression line. | 
| margin_color | Fill color for margin boxplots, density plots, or histograms. | 
Details
The scatter function generates a scatterplot between two quantitative
variables, along with a line of best fit and a 95% confidence interval.
By default, regression statistics (b, r, r2, p) are printed and
outliers (observations with studentized residuals > 3) are flagged.
Optionally, variable distributions (histograms, boxplots, violin plots,
density plots) can be added to the plot margins.
Value
a ggplot2 graph
Note
Variable names do not have to be quoted.
Examples
scatter(cars74, hp, mpg)
scatter(cars74, wt, hp)
p <- scatter(ggplot2::mpg, displ, hwy,
        margin="histogram",
        title="Engine Displacement vs. Highway Mileage")
plot(p)
Skewness
Description
Calculate the skewness of a numeric variable
Usage
skewness(x, na.rm = TRUE)
Arguments
| x | numeric vector. | 
| na.rm | if  | 
Value
a number
Examples
skewness(mtcars$mpg)
Standardize numeric variables
Description
Standardize the numeric variables in a data frame
Usage
standardize(data, mean = 0, sd = 1, include_dummy = FALSE)
Arguments
| data | a data frame. | 
| mean | mean of the transformed variables. | 
| sd | standard deviation of the transformed variables. | 
| include_dummy | logical. If  | 
Details
standardize transforms all the numeric variables
in a data frame to have the same mean and standard deviation.
By default, this will be a mean of 0 and standard deviation of 1.
Character variables and factors are left unchanged. By default,
dummy coded variables are also left unchanged. Use
include_dummy=TRUE to transform these variables as well.
Value
a data frame
Examples
head(cars74)
cars74_st <- standardize(cars74)
head(cars74_st)
Frequency distribution for a categorical variable
Description
Function to calculate frequency distributions for categorical variables
Usage
tab(
  data,
  x,
  sort = FALSE,
  maxcat = NULL,
  minp = NULL,
  na.rm = FALSE,
  total = FALSE,
  digits = 2,
  cum = FALSE,
  plot = FALSE
)
Arguments
| data | A dataframe | 
| x | A factor variable in the data frame. | 
| sort | logical. Sort levels from high to low. | 
| maxcat | Maximum number of categories to be included. Smaller categories will be combined into an "Other" category. | 
| minp | Minimum proportion for a category to be included. Categories representing smaller proportions willbe combined into an "Other" category. maxcat and minp cannot both be specified. | 
| na.rm | logical. Removes missing values when TRUE. | 
| total | logical. Include a total category when TRUE. | 
| digits | Number of digits the percents should be rounded to. | 
| cum | logical. If  | 
| plot | logical. If  | 
Details
The function tab will calculate the frequency
distribution for a categorical variable and output a data frame
with three columns: level, n, percent.
Value
If plot = TRUE return a ggplot2 bar chart. Otherwise
return a data frame.
Examples
tab(cars74, carb)
tab(cars74, carb, plot=TRUE)
tab(cars74, carb, sort=TRUE)
tab(cars74, carb, sort=TRUE, plot=TRUE)
tab(cars74, carb, cum=TRUE)
tab(cars74, carb, cum=TRUE, plot=TRUE)
Time spent watching television - 2017
Description
This is a data set detailing TV usage on days surveyed as determined by the 2017 American Time Use Survey. The data set includes demographic information, as well as details regarding employment and family makeup, where applicable. Information on days surveyed, as well as whether the day is a holiday, is also included.
Usage
tv
Format
A data frame with 10,223 rows and 21 variables. The variables are as follows:
- id
- ID of respondent 
- weight
- ATUS final weight 
- youngest_child
- Age of the youngest child in the household that is less than 18 years old (if applicable). Range: 1-17; if no child in household: NA 
- age
- Age of respondent 
- sex
- Sex of respondent 
- job
- Status of employment of the respondent. Direct transcription from original codebook: 1 = Employed, at work, 2 = Employed, absent, 3 = Unemployed, on layoff, 4 = Unemployed, looking, 5 = Not in the labor force. 
- m_job
- The response to question, “in the last seven days did you have more than one job?” Returns NA if no job. 
- f_job
- Does the respondent have a full time job or a part time job? (NA if no job) 
- educ
- Are you enrolled in high school, college, or university? (NA if not currently enrolled) 
- educ2
- If yes to educ, are you enrolled in high school or upper schooling? (NA if not currently enrolled) 
- partner
- Presence of the respondent's spouse or unmarried partner in the household with 1 = Spouse present 2 = Unmarried partner present 3 = No spouse/unmarried partner present 
- pr_job
- Answer to the question, “does your partner have a job?” (NA if not applicable) 
- salary
- Weekly earnings at the respondent’s main job, two decimals implied 
- children
- Number of children under 18 in the household 
- pr_job_f
- Part time/full time job status of partner, if applicable (NA if partner unemployed or no partner) 
- job_hours
- Total hours usually worked per week (-4: Hours vary) 
- day
- Day of the week about which the respondent was interviewed (Monday thorugh Friday) 
- holiday
- Notes if the respondent was interviewed on a holiday 
- elder_care
- Total time spent providing elder care that day by the respondent, in minutes 
- child_time
- Total time spent during diary day providing secondary childcare for household children younger than 13, in minutes 
- tv
- Minutes spent watching TV 
Details
For more information regarding the key visit https://www.bls.gov/tus/atusintcodebk17.pdf. This data is retrieved from the American Time Use Survey, made available through the Bureau of Labor Statistics https://www.bls.gov/tus/datafiles_2017.htm.
Examples
summary(tv)
hist(tv$tv, col="skyblue")
Univariate plot
Description
Generates a descriptive graph for a quantitative variable.
Usage
univariate_plot(
  data,
  x,
  bins = 30,
  fill = "deepskyblue",
  pointcolor = "black",
  density = TRUE,
  densitycolor = "grey",
  alpha = 0.2,
  seed = 1234
)
Arguments
| data | a data frame. | 
| x | a variable name (without quotes). | 
| bins | number of histogram bins. | 
| fill | fill color for the histogram and boxplot. | 
| pointcolor | point color for the jitter plot. | 
| density | logical. Plot a filled density curve over the the histogram. (default=TRUE) | 
| densitycolor | fill color for density curve. | 
| alpha | Alpha transparency (0-1) for the density curve and jittered points. | 
| seed | pseudorandom number seed for jittered plot. | 
Details
univariate_plot generates a plot containing three graphs:
a histogram (with an optional density curve), a horizontal
jittered point plot, and a horizontal box plot. The subtitle
contains descriptive statistics, including the mean, standard
deviation, median, minimum, maximum, and skew.
Value
a ggplot2 graph
Note
The graphs are created with ggplot2 and then assembled into a single plot through the patchwork package. Missing values are deleted.
Examples
univariate_plot(mtcars, mpg)
univariate_plot(cardata, city_mpg, fill="lightsteelblue",
                pointcolor="lightsteelblue", densitycolor="lightpink",
                alpha=.6)