Privacy-Preserving Data Anonymization for R
An R package for anonymizing sensitive patient and research data. Helps protect privacy while keeping your data useful for analysis.
# Install from CRAN
install.packages("privacyR")library(privacyR)
# Anonymize a data frame
patient_data <- data.frame(
patient_id = c("P001", "P002", "P003"),
name = c("John Doe", "Jane Smith", "Bob Johnson"),
dob = as.Date(c("1980-01-15", "1975-03-20", "1990-06-10")),
location = c("New York, NY", "Los Angeles, CA", "Chicago, IL")
)
anonymized_data <- anonymize_dataframe(patient_data, seed = 123)
print(anonymized_data)
# With UUID anonymization for stronger privacy
anonymized_data_uuid <- anonymize_dataframe(patient_data, use_uuid = TRUE, seed = 123)
print(anonymized_data_uuid)
# Month-year date anonymization
anonymized_data <- anonymize_dataframe(patient_data,
date_method = "round",
date_granularity = "month_year")
print(anonymized_data)All anonymization functions accept an optional seed
parameter (default: NULL).
seed = NULL: The package
maintains referential integrity using a deterministic hash-based
approach. Same inputs always produce the same anonymized outputs,
ensuring relationships in your data are preserved.seed is provided: You get
explicit control over the anonymization for reproducibility across
sessions.You can use the package without providing a seed, and it will still maintain referential integrity automatically.
anonymize_id() - Anonymize patient identifiersanonymize_names() - Anonymize patient namesanonymize_dates() - Anonymize dates (shift or
round)anonymize_locations() - Anonymize geographic
locationsanonymize_dataframe() - Anonymize entire data
framesSee the package vignette for detailed examples and usage:
vignette("privacyR")IMPORTANT: While the privacyR package aids in anonymizing patient data, users must ensure compliance with all applicable regulations and guidelines. The author is not liable for any issues arising from the use of this package.
Users should pay close attention to: - CDC Guidelines: CDC Data Privacy and HIPAA - California Department of Health Care Services: DHCS List of HIPAA Identifiers - HIPAA Regulations: HHS De-identification Guidance
This package is provided “as is” without warranty. Users assume full responsibility for ensuring anonymized data meets regulatory requirements. Consult with legal and privacy experts as needed.
MIT
If you use this package in your research, please cite it as:
citation("privacyR")Contributions are welcome! Please feel free to submit issues or pull requests.