TDApplied is an R package for analyzing persistence diagrams using machine learning and statistical inference, and is designed to interface with persistent (co)homology calculations from the R packages TDA and TDAstats. Please note that during the development of TDApplied, TDA was available on CRAN and therefore included in package examples and tests, however since that is presently not the case the dependency on TDA has been removed (and therefore some examples and tests have been modified) but TDApplied will still work with TDA computed persistence diagrams and TDA functions if a user already has a working version installed.
R package TDA:
Fasy, Brittany T., Jisu Kim, Fabrizio Lecci, Clement Maria, David L. Millman, and Vincent Rouvreau. 2021. TDA: Statistical Tools for Topological Data Analysis. https://CRAN.R-project.org/package=TDA.
R package TDAstats:
Wadhwa, Raoul R., Drew R. K. Williamson, Andrew Dhawan, and Jacob G. Scott. 2018. TDAstats: R pipeline for computing persistent homology in topological data analysis. https://CRAN.R-project.org/package=TDAstats.
To install the latest version of this R package directly from GitHub:
install.packages("devtools")
library(devtools)
devtools::install_github("shaelebrown/TDApplied")
library(TDApplied)
To install from GitHub you might need:
To install the stable version of this R package from CRAN:
install.packages("TDApplied")
If you use TDApplied, please consider citing as:
If you wish to cite a particular method used in TDApplied see the REFERENCES.bib file in the vignette directory.
TDApplied has three major modules:
PyH
function connects with python creating a fast
persistent (co)homology engine compared to alternatives. The
plot_diagram
function can be used to plot diagrams computed
from PyH
or the TDA and
TDAstats packages. The rips_graphs
and
plot_rips_graphs
functions can be used to visualize dataset
structure at the scale of particular topological features. The
bootstrap_persistence_thresholds
function can be used to
identify statistically significant topological features in a
dataset.diagram_mds
,
diagram_kpca
and predict_diagram_kpca
can be
used to project a group of diagrams into a low dimensional space
(i.e. dimension reduction). The functions diagram_kkmeans
and predict_diagram_kkmeans
can be used to cluster a group
of diagrams. The functions diagram_ksvm
and
predict_diagram_ksvm
can be used to link, through a
prediction function, persistence diagrams and an outcome
(i.e. dependent) variable.permutation_test
function acts like an
ANOVA test for identifying group differences of persistence diagrams.
The independence_test
function can determine if two groups
of paired persistence diagrams are likely independent or not.Not only does TDApplied provide methods for the applied analysis of persistence diagrams which were previously unavailable, but an emphasis on speed and scalability through parallelization, C code, avoiding redundant slow computations, etc., makes TDApplied a powerful tool for carrying out applied analyses of persistence diagrams.
This example creates six persistence diagrams, plots one and projects all six into 2D space using multidimensional scaling (MDS) to demonstrate TDApplied functionalities.
library(TDApplied)
# create 6 persistence diagrams
# 3 from circles and 3 from spheres
<- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,size = 50),],dim = 1,threshold = 2)
circ1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,size = 50),],dim = 1,threshold = 2)
circ2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,size = 50),],dim = 1,threshold = 2)
circ3 <- TDAstats::calculate_homology(TDAstats::sphere3d[sample(1:100,size = 50),],dim = 1,threshold = 2)
sphere1 <- TDAstats::calculate_homology(TDAstats::sphere3d[sample(1:100,size = 50),],dim = 1,threshold = 2)
sphere2 <- TDAstats::calculate_homology(TDAstats::sphere3d[sample(1:100,size = 50),],dim = 1,threshold = 2)
sphere3
# plot a diagram
plot_diagram(circ1,title = "Circle 1")
# project into 2D and plot
<- diagram_mds(list(circ1,circ2,circ3,sphere1,sphere2,sphere3),dim = 1,k = 2)
proj_2D plot(x = proj_2D[,1],y = proj_2D[,2])
TDApplied has five major vignettes:
To contribute to TDApplied you can create issues for any bugs/suggestions on the issues page. You can also fork the TDApplied repository and create pull requests to add features you think will be useful for users.