This vignette addresses the usage of the functions involved in statistical inference and power analysis for the direct and spillover effects in two-stage randomized experiments motivated by the JD data set.
In 2007, the ministry in charge of employment in France launched a public employment integration service contract for young graduates seeking employment. A randomized experiment of this job placement assistance program was conducted and the methods in this package can be used to analyze the data. The following examples focus on two specific outcomes: fixed-term contract of six months or more (LTFC) and permanent contract (PC).
The data set is a subset of the original JD data set and includes the following variables:
anonale
: local employment agency
tempsc_av
: full-time work (at time of assignment)
assigned
: 1 if the individual is assigned to treatment,
0 otherwise
pct0
: share of the local population treated
cdi
: binary variable for whether the individual works on
a permanent contract, 8 months after the assignment
cdd6m
: binary variable for whether the individual works
in CDD (LTFC-time contract) for more than 6 months, 8 months after the
assignment
emploidur
: binary variable for whether the individual
works on a permanent or LTFC-term contract for more than 6 months, 8
months after the assignment
tempsc
: binary variable for whether the individual works
full time, 8 months after the assignment
salaire
: individual’s salary in Euros.
The relevant functions for this analysis are the following:
ZSRE
: returns a list of Z
the vector of
the desired binary treatment assignment variable
YSRE
: returns a list of Y
the vector of
the outcomes for a desired variable of interest.
CalAPO
: returns a list of point estimates and
variances for the average potential outcomes, unit level direct effect,
marginal direct effect, and unit level spillover effect.
Test2SRE
: returns the rejection region for the
desired test. This function takes in the data, the effect type
(i.e. direct effect, marginal direct effect, or spillover effect) and
outputs the rejection region at the desired significance level.
calpara
: returns a list of the estimated
within-cluster variance, between cluster variance, intra-class
correlation coefficient, and average of the assignment vector which are
necessary for the Calsamplesize
Calsamplesize
: returns a list of the necessary total
number of clusters in order to achieve a given power level at a given
significance level for the three types of effects.
First, import the RCT2 library and load the relevant data set.
library(RCT2)
data(jd)
In order to calculate a list of point estimates and variances for an
effect of interest, run the CalAPO
command. It is necessary
first to create the vector of treatment assignments, A
,
which will depend on the study design. In this experiment, there are
three treatment assignment mechanisms with treated probabilities 25%,
50%, and 75% respectively.
Then, run the CalAPO
command, which takes in the vector
of treatment assignments, the assignment mechanism vector, and the
vector of outcomes for the variable of interest which is
Y.LTFC
in this case. We see that the estimated average
potential outcome for long-term fixed contracts is given by
Y.hat
. As stated in the paper, we also have the results for
the estimated direct effects under the three treatment mechanisms
(ADE.est
), the estimated marginal direct effect
(MDE.est
), and the estimated spillover effects
(ASE.est
). We also have the estimated covariance matrices
for the average potential outcomes, the estimated direct effect,
estimated marginal effect, and estimated spillover effects.
<- data.frame(jd$assigned, jd$pct0, jd$cdd6m, jd$anonale)
data_LTFC colnames(data_LTFC) <- c("Z", "A", "Y", "id")
<- CalAPO(data_LTFC)
test print(CalAPO(data_LTFC))
## [[1]]
## Potential Outcome Estimates
## treated group 1 estimate 0.2109006
## control group 1 estimate 0.1953872
## treated group 2 estimate 0.2071030
## control group 2 estimate 0.2027447
## treated group 3 estimate 0.2018187
## control group 3 estimate 0.2243082
##
## $Y.covariance
## [,1] [,2] [,3] [,4] [,5]
## [1,] 9.352489e-05 -1.196691e-05 0.000000e+00 0.000000e+00 0.000000e+00
## [2,] -1.196691e-05 1.034387e-04 0.000000e+00 0.000000e+00 0.000000e+00
## [3,] 0.000000e+00 0.000000e+00 1.147296e-04 2.025355e-05 0.000000e+00
## [4,] 0.000000e+00 0.000000e+00 2.025355e-05 7.940618e-05 0.000000e+00
## [5,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 9.680927e-05
## [6,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 -3.197198e-05
## [,6]
## [1,] 0.000000e+00
## [2,] 0.000000e+00
## [3,] 0.000000e+00
## [4,] 0.000000e+00
## [5,] -3.197198e-05
## [6,] 2.276049e-04
##
## [[3]]
## Average Direct Effect
## assignment group 1 0.015513434
## assignment group 2 0.004358247
## assignment group 3 -0.022489545
##
## $ADE.covariance
## [,1] [,2] [,3]
## [1,] 0.0002208974 0.0000000000 0.0000000000
## [2,] 0.0000000000 0.0001536287 0.0000000000
## [3,] 0.0000000000 0.0000000000 0.0003883582
##
## [[5]]
## Average Spillover Effect
## treatment group under assignments 1 2 0.003797605
## treatment group under assignments 2 3 0.005284307
## control group under assignments 1 2 -0.007357582
## control group under assignments 2 3 -0.021563484
##
## $ASE.covariance
## [,1] [,2] [,3] [,4]
## [1,] 2.082545e-04 -1.147296e-04 8.286640e-06 -2.025355e-05
## [2,] -1.147296e-04 2.115389e-04 -2.025355e-05 -1.171843e-05
## [3,] 8.286640e-06 -2.025355e-05 1.828448e-04 -7.940618e-05
## [4,] -2.025355e-05 -1.171843e-05 -7.940618e-05 3.070111e-04
##
## [[7]]
## Marginal Direct Effect
## 1 -0.0008726215
##
## $MDE.covariance
## [,1]
## [1,] 8.476492e-05
Similarly, we can run this on the permanent contracts.
<- data.frame(jd$assigned, jd$pct0, jd$cdi, jd$anonale)
data_perm colnames(data_perm) <- c("Z", "A", "Y", "id")
CalAPO(data_perm)
We can also perform hypothesis tests on this data by using the
Test2SRE
function. THE Test2SRE
function takes
in Z
, A
, Y
, as before, and also
takes in an extra argument effect
, where the desired effect
should be specified (either ADE for direct effect, MDE for marginal
direct effect, or ASE for spillover effect). The function returns
TRUE
if the hypothesis should be rejected, and
FALSE
otherwise. The default significance level is set to
0.05, but may be changed by altering the alpha
argument.
Test2SRE(data_LTFC, effect="MDE", alpha=0.05)
## [1] FALSE
Lastly, we can perform sample size calculations for the sample size
needed for a given power at a given significance level. First, we call
the calpara
function to calculate the necessary parameters
for the sample size calculation, including the within-class and between
class variances and the intra-class correlation coefficient. The effect
size and the assignment mechanism also need to be specified based on the
study design. In this case, mu
is the effect size and
qa
is the vector of probabilities of being assigned to one
of the three assignment mechanisms.
Then, call the calpara
command to calculate the
within-class and between class variances, and the intra-class
correlation coefficient.
# calculate variances for permanent contract
<- calpara(data_perm)
var.perm
# calculate variances for long term fixed contract
<- calpara(data_LTFC) var.LTFC
The elements of the output of calpara
can be accessed as
below. For example, to retrieve the total variance of the potential
outcomes for the permanent contracts and long-term fixed contracts, the
following code can be run:
<- var.perm$sigma.tot
sigma.perm <- var.LTFC$sigma.tot
sigma.LTFC print(sigma.perm)
## [1] 0.1951648
Then, we specify the effect size and use the
Calsamplesize
function to calculate the appropriate sample
sizes for the permanent contract and the LTFC. The default
alpha
(significance level) and beta
(power) are
set at 0.05 and 0.2 respectively.
### effect size and assignment mechanism
<- 0.03
mu <- rep(1/3,3)
qa
# calculate sample size for the permanent contract
print("Permanent Contract:")
## [1] "Permanent Contract:"
print(Calsamplesize(data_LTFC, 0.03, qa, 0.05, 0.2))
## [,1] [,2] [,3]
## Assignment Mechanism 1.0000 2.00000 3.0000
## Number of Clusters 428.4264 96.59406 511.5405
# calculate sample size for the long term fixed contract
print("Long Term Fixed Contract:")
## [1] "Long Term Fixed Contract:"
print(Calsamplesize(data_perm, 0.03, qa, alpha=0.05, beta=0.2))
## [,1] [,2] [,3]
## Assignment Mechanism 1.0000 2.0000 3.0000
## Number of Clusters 515.6595 116.4777 614.2199
From the results, we can see the necessary total number of clusters
for each assignment mechanism with size n.avg
needed to
detect a specific alternative at a certain power and significance
level.