Reading and Writing Data Files
While eg_read() and eg_write() handle raw
file transfers, you probably spend most of your time working with data
files that you want to load directly into R as data frames. egnyte
provides a set of functions that handle the download-read or
write-upload workflow in a single step.
Prerequisites
Before you can read or write files, you need to authenticate. See
vignette("configuration") or
vignette("authorization") if you haven’t set that up
yet.
library(egnyte)
eg_auth()
Reading Data Files
egnyte provides read functions for common data formats. Each
function:
- Downloads the file from Egnyte to a temporary location
- Reads it using the appropriate R package
- Returns the data as a data frame (or tibble)
- Cleans up the temporary file
CSV Files
# Read a CSV file
dat <- eg_read_csv("/Shared/Data/analysis.csv")
This uses readr::read_csv() under the hood. You can pass
any additional arguments that read_csv() accepts:
# Specify column types
dat <- eg_read_csv(
"/Shared/Data/analysis.csv",
col_types = cols(
id = col_integer(),
name = col_character(),
value = col_double()
)
)
# Skip rows, select columns, etc.
dat <- eg_read_csv(
"/Shared/Data/analysis.csv",
skip = 2,
col_select = c(id, name, value)
)
Delimited Files
For files with delimiters other than commas:
# Tab-delimited file (default)
dat <- eg_read_delim("/Shared/Data/data.tsv")
# Pipe-delimited file
dat <- eg_read_delim("/Shared/Data/data.txt", delim = "|")
# Semicolon-delimited (common in European CSVs)
dat <- eg_read_delim("/Shared/Data/european.csv", delim = ";")
Excel Files
# Read an Excel file (first sheet by default)
dat <- eg_read_excel("/Shared/Data/workbook.xlsx")
# Read a specific sheet by name
dat <- eg_read_excel("/Shared/Data/workbook.xlsx", sheet = "Summary")
# Read a specific sheet by position
dat <- eg_read_excel("/Shared/Data/workbook.xlsx", sheet = 3)
Additional readxl::read_excel() arguments work here
too:
# Specify a range
dat <- eg_read_excel(
"/Shared/Data/workbook.xlsx",
range = "B2:F100"
)
# Skip rows
dat <- eg_read_excel(
"/Shared/Data/workbook.xlsx",
skip = 5
)
SAS Files
# Read a SAS7BDAT file
dat <- eg_read_sas("/Shared/Data/dataset.sas7bdat")
# Read a SAS transport file (.xpt)
dat <- eg_read_xpt("/Shared/Data/dataset.xpt")
These use haven::read_sas() and
haven::read_xpt(). Haven preserves SAS attributes like
variable labels and formats.
Stata Files
# Read a Stata file
dat <- eg_read_stata("/Shared/Data/dataset.dta")
SPSS Files
# Read an SPSS file
dat <- eg_read_spss("/Shared/Data/dataset.sav")
R Objects (RDS)
# Read an RDS file
obj <- eg_read_rds("/Shared/Data/model.rds")
RDS files can contain any R object, not just data frames. This is
useful for saving fitted models, lists, or other complex objects.
Writing Data Files
The write functions work in reverse - they take an R object, write it
to a temporary file, and upload it to Egnyte.
CSV Files
# Write a data frame to CSV
eg_write_csv(dat, "/Shared/Data/results.csv")
By default, this will fail if the file already exists:
# Overwrite an existing file
eg_write_csv(dat, "/Shared/Data/results.csv", overwrite = TRUE)
You can pass additional arguments to
readr::write_csv():
# Don't include column names
eg_write_csv(dat, "/Shared/Data/results.csv", col_names = FALSE)
# Use a different NA representation
eg_write_csv(dat, "/Shared/Data/results.csv", na = ".")
Delimited Files
# Write a tab-delimited file
eg_write_delim(dat, "/Shared/Data/results.tsv")
# Write with a different delimiter
eg_write_delim(dat, "/Shared/Data/results.txt", delim = "|")
Excel Files
# Write a single sheet
eg_write_excel(dat, "/Shared/Data/results.xlsx")
# Write multiple sheets by passing a named list
eg_write_excel(
list(
"Summary" = summary_df,
"Details" = details_df,
"Raw" = raw_df
),
"/Shared/Data/workbook.xlsx"
)
SAS Transport Files
# Write a SAS transport file
eg_write_xpt(dat, "/Shared/Data/results.xpt")
Note: egnyte can write XPT files but not native SAS7BDAT files. If
you need SAS7BDAT, you’ll need to use SAS itself or another tool.
Stata Files
# Write a Stata file
eg_write_stata(dat, "/Shared/Data/results.dta")
SPSS Files
# Write an SPSS file
eg_write_spss(dat, "/Shared/Data/results.sav")
R Objects (RDS)
# Save any R object
eg_write_rds(fitted_model, "/Shared/Data/model.rds")
# Control compression
eg_write_rds(large_data, "/Shared/Data/data.rds", compress = "xz")
Optional Dependencies
The format-specific functions require additional packages that aren’t
installed by default with egnyte:
eg_read_csv(), eg_write_csv() |
readr |
eg_read_delim(), eg_write_delim() |
readr |
eg_read_excel() |
readxl |
eg_write_excel() |
writexl |
eg_read_sas(), eg_read_xpt(),
eg_write_xpt() |
haven |
eg_read_stata(), eg_write_stata() |
haven |
eg_read_spss(), eg_write_spss() |
haven |
eg_read_rds(), eg_write_rds() |
(base R - no extra package) |
If you try to use a function without the required package installed,
egnyte will prompt you to install it:
The readr package is required for this function.
Would you like to install it? (yes/no)
A Few Tips
Working with Large Files
The read functions download the entire file before reading it. For
large files, this can take a while and use significant memory. A few
suggestions:
- If you only need specific columns, use the
col_select
argument (for CSV/delim files) to avoid loading unnecessary data
- Consider whether you really need the whole file, or if you can work
with a sample
- For very large datasets, you might want to use
eg_read() to download the file once, then work with it
locally
Passing Arguments Through
All the format-specific functions accept ... arguments
that get passed to the underlying read/write function. If you’re
familiar with readr::read_csv() or
haven::read_sas(), you can use all the same options:
# readr options
dat <- eg_read_csv(
"/Shared/Data/data.csv",
col_types = "ccdd",
locale = locale(decimal_mark = ","),
na = c("", "NA", "N/A", ".")
)
# haven options
dat <- eg_read_sas(
"/Shared/Data/data.sas7bdat",
encoding = "latin1"
)
File Extensions Matter
egnyte uses the file extension to determine the format when
downloading. Make sure your files have the correct extension:
.csv for CSV files
.xlsx or .xls for Excel files
.sas7bdat for SAS data files
.xpt for SAS transport files
.dta for Stata files
.sav for SPSS files
.rds for R objects
Comparison with Base File Transfer
Here’s the difference between using the format-specific functions
vs. the base eg_read()/eg_write():
Using eg_read_csv() (recommended for data
files):
# One step - download and read
dat <- eg_read_csv("/Shared/Data/data.csv")
Using eg_read() (manual approach):
# Two steps - download, then read
temp_file <- eg_read("/Shared/Data/data.csv")
dat <- readr::read_csv(temp_file)
unlink(temp_file) # Clean up
The format-specific functions handle all this for you and clean up
temporary files automatically.
Next Steps
- Learn about authentication options in
vignette("authorization")
- See raw file transfer in
vignette("file-transfer")