The BFI package is a powerful tool in R
designed to execute the Bayesian Federated Inference
(BFI) methodology, supporting a wide range of
regression models including linear, logistic, and
Cox regression. While SAS offers robust
statistical capabilities, it currently lacks a dedicated package for
implementing the BFI method. Consequently, this vignette serves
to bridge that gap by illustrating how to utilize the R package
BFI within the SAS environment. By seamlessly integrating
R’s BFI package with the analytical prowess of SAS, users
gain access to a comprehensive suite of statistical techniques,
enhancing their ability to conduct sophisticated data analyses,
particularly when working with small datasets.
In this guide, we’ll explore how you can leverage SAS to effectively
utilize the BFI package for your data analysis needs.
To utilize R within SAS, it’s assumed that you have access to a SAS server. This access allows you to establish a connection to a SAS session by providing the necessary connection parameters, such as the SAS server hostname, port number, and authentication credentials.
More information about configuring the SAS system to call functions in the R language is documented in the SAS Online Help.
To access SAS services, you typically need to connect to a SAS server. Here’s how you can do it using SAS Studio, which is a web-based interface for SAS:
Open a web browser and navigate to the URL provided by your SAS administrator for accessing SAS Studio. Enter your credentials to log in to SAS Studio. Once logged in, you can access SAS services such as analysis and reporting through the SAS Studio interface.
SAS requires two configuration options in order to communicate with R. First the RLANG option must be set when SAS is started. This may be set either in a custom configuration file or on the SAS command line. Second, SAS needs an R_HOME environment variable to point it to the correct, available version of R.
The RLANG system option determines whether you have permission to call R from the SAS system. You can determine the value of the RLANG option by submitting the following SAS statements:
proc options option=RLANG;
run;The result is one of the following statements in the SAS log:
If the SAS log contains this statement, it means that R integration is disabled, and you do not have permission to call R from the SAS system :( You may need to consult with your SAS administrator or IT department to enable it.
If the SAS log contains this statement, it means that R integration is enabled, and you can call R from the SAS system :)
Download and install R from the official R website (https://www.r-project.org/). Follow the installation instructions provided for your operating system.
The SAS/IML Interface to R allows you to call R
functions from within PROC IML (Interactive Matrix
Language). Check if the interface is installed by running the following
code within SAS:
proc options option=R_HOME;
run;If the path to your R installation directory is displayed, the SAS/IML Interface to R is installed. If not, you may need to install or reinstall it.
PROC IML and RsubmitYou can use R inside SAS through
the use of the PROC IML procedure. PROC IML
allows you to execute R code within a SAS session,
enabling integration between SAS and R
for data analysis and statistical modeling.
To install an R package from CRAN and
GitHub, you can use the base,
stats and remotes packages, respectively. It
can be done in R or in SAS. Here’s how you can do it within SAS:
proc iml;
  rsubmit;
    /* First install and load 'base', 'stats' and 'remotes' from CRAN */
    install.packages("base")
    install.packages("stats")
    install.packages("remotes")
    library(base)
    library(stats)
    library(remotes)
  
    /* Then install and load BFI from GitHub */
    remotes::install_github("hassanpazira/BFI", force = TRUE)
    library(BFI)
  endrsubmit;
quit;Now that you have the BFI package installed and
configured, let’s explore its functionality through the following
example.
Now we generate two datasets independently from Gaussian
distribution, and then apply main functions in the BFI
package to these datasets:
First generate 30 samples randomly from Gaussian distribution N(0, 1) with p=3 covariates:
proc iml;
    /****************************************************************/
    /* Center 1: Data simulation for local center 1 with 30 samples */
    /****************************************************************/
    p     = 3;   /* Number of variables */
    n1    = 30;  /* Number of samples for center 1 */
    theta = {1, 2, 2, 2, 1.5}; /* Define theta values directly */
    X1  = j(n1, p); /* Initialize matrix X1 */
    mu1 = j(n1, 1); /* Initialize vector mu1 */
    y1  = j(n1, 1); /* Initialize vector y1 */
    /* Generate data for center 1 */
    call randseed(1123);
    X1  = randfun(n1 || p, "Normal", 0, 1);
    mu1 = theta[1] + X1 * theta[2:4];
    y1  = randfun(n1 || 1, "Normal", mu1, sqrt(theta[5]));
    /* Create dataset for center 1 */
    create y1 var {"y1"};
    append;
    close y1;
    create X1 from X1[colname={"X1_1" "X1_2" "X1_3"}];
    append from X1;
    close X1;
    call ExportMatrixToR(X1, "X1");
    call ExportMatrixToR(y1, "y1");
quit;Now generate 50 samples randomly from N(0, 1) with 3 covariates:
proc iml;
    /****************************************************************/
    /* Center 2: Data simulation for local center 2 with 50 samples */
    /****************************************************************/
    p     = 3;   /* Number of variables */
    n2    = 50;   /* Number of samples for center 2 */
    theta = {1, 2, 2, 2, 1.5}; /* Define theta values directly */
  
    X2  = j(n2, p);
    mu2 = j(n2, 1);
    y2  = j(n2, 1);
    /* Generate data for center 2 */
    call randseed(1123);
    X2  = randfun(n2 || p, "Normal", 0, 1);
    mu2 = theta[1] + X2 * theta[2:4];
    y2  = randfun(n2 || 1, "Normal", mu2, sqrt(theta[5]));
    /* Create dataset for center 1 */
    create y2 var {"y2"};
    append;
    close y2;
    create X2 from X2[colname={"X2_1" "X2_2" "X2_3"}];
    append from X2;
    close X2;
    call ExportMatrixToR(X2, "X2");
    call ExportMatrixToR(y2, "y2");
quit;We have transferred SAS data to the R session and are currently
initiating an analysis using the BFI method in R. All communications
with R are facilitated through SAS’s PROC IML. It’s
important to note that capitalization matters in R, and character
variables are automatically converted into factors.
The following compute the Maximum A Posterior (MAP) estimators of the parameters for center 1:
proc iml;
  
  rsubmit;
    #---------------------------
    # Inverse Covariance Matrix
    #---------------------------
    # Creating the inverse covariance matrix for the Gaussian prior distribution:
    Lambda <- inv.prior.cov(X1, lambda=0.05, family=gaussian)
    #--------------------------
    # MAP estimates at center 1
    #--------------------------
    fit1       <- MAP.estimation(y1, X1, family=gaussian, Lambda)
    theta_hat1 <- fit1$theta_hat # intercept and coefficient estimates
    A_hat1     <- fit1$A_hat     # minus the curvature matrix
    summary(fit1, cur_mat=TRUE)
  endrsubmit;
quit;Obtaining the MAP estimators of the parameters for center 2 using the following:
proc iml;
  rsubmit;
    
    # Creating the inverse covariance matrix for the Gaussian prior distribution:
    Lambda <- inv.prior.cov(X2, lambda=0.05, family=gaussian)
    #--------------------------
    # MAP estimates at center 2
    #--------------------------
    fit2       <- MAP.estimation(y2, X2, family=gaussian, Lambda)
    theta_hat2 <- fit2$theta_hat
    A_hat2     <- fit2$A_hat
    summary(fit2, cur_mat=TRUE)
  endrsubmit;
quit;Now, you can utilize the primary function bfi() to
acquire the BFI estimates:
proc iml;
  rsubmit;
  
    # Creating the inverse covariance matrix for central server:
    Lambda <- inv.prior.cov(X1, lambda=0.05, family=gaussian) # the same as other centers
  
    #----------------------
    # BFI at central center
    #----------------------
    A_hats     <- list(A_hat1, A_hat2)
    theta_hats <- list(theta_hat1, theta_hat2)
    bfi        <- bfi(theta_hats, A_hats, Lambda)
    summary(bfi, cur_mat=TRUE)
  endrsubmit;
  call ImportMatrixFromR(bfi, "bfi");
  
quit;BFI packageIn order to find and use the datasets available from the
BFI package, use the following codes:
proc iml;
  rsubmit;
  
    # To find a list of all datasets included in the package
    print(data(package = "BFI"))  
  
    # To use the 'Nurses' data
    BFI::Nurses
    cat("Dimension of the 'Nurses' data: \n", dim(Nurses))
    cat("Colnames of the 'Nurses' data: \n", colnames(Nurses))
    # To use the 'trauma' data
    BFI::trauma
    cat("Dimension of the 'trauma' data: \n", dim(trauma))
    cat("Colnames of the 'trauma' data: \n", colnames(trauma))
  endrsubmit;
quit;R objects and data may be brought back into SAS as well, for any manipulation you might want to do in SAS. Here, we just grab the bfi object and the Nurses data from R and print the data in SAS.
proc iml;
  submit / R;  * 'rsubmit' is equivalent to 'submit / R' ;
   # Export 'bfi' object
   ExportDataSetToSAS(bfi)
   # Export dataset 'Nurses'
   ExportDataSetToSAS(Nurses)
   
  endsubmit;
run;
  
proc print data=Nurses;
run;In the near future, we will be releasing the SAS/IML package for BFI,
which can be installed by the PACKAGE INSTALL statement in
the SAS environment.
If you find any errors, have any suggestions, or would like to request that something be added, please file an issue at issue report or send an email to: hassan.pazira@radboudumc.nl.