\name{ccems-package}
\alias{ccems-package}
\alias{ccems}
\docType{package}
\title{ Combinatorially Complex Equilibrium Model Selection }
\description{
This package performs model selections of equilibriums in general and quasi-equilibriums of enzyme complexes in particular.
Estimates of dissociation constants K that best describe a dataset are found by 
systematically scanning though all possibilities of K being infinity and/or plausibly equal to other K. 
The automatically generated space of models is then fitted to data.  Automation enables searches of spaces 
too large to be specified by hand, e.g. spaces generated by combinatorially complex equilibriums. 
}

\details{
\tabular{ll}{
Package: \tab ccems\cr
Type: \tab Package\cr
Version: \tab 1.0\cr
Date: \tab 2009-1-2\cr
Depends: \tab odesolve,snow\cr
Suggests: \tab nws\cr
License: \tab GPL-2\cr
LazyLoad: \tab yes\cr
LazyData: \tab yes\cr
URL: \tab http://epbi-radivot.cwru.edu/ccems\cr
Built: \tab R 2.8.1; ; 2009-01-13 17:24:33; windows\cr
}

Index:
\preformatted{
RNR                     Ribonucleotide Reductase Data
TK1                     Thymidine Kinase 1 Data
ems                     Equilibrium Model Selection
fitModel                Fit Model
mkGrids                 Make Grid Model Space
mkKd2Kj                 Make Kd2Kj Mappings
mkModel                 Make Specific Model
mkSpurs                 Make Spur Model Space
mkg                     Make Generic Model
simulateData            Simulate Data
}

This package automatically generates and fits biochemical equilibrium models using as outputs either average protein mass 
data or enzyme reaction rate data. 
It is currently limited to systems where one central hub protein mediates all of the interactions and total 
concentrations of the reactants are approximately known exactly, e.g. as in systems that were reconstituted 
from purified reactants. 
It is limited further in that multiple sites for the same ligand must be filled in a predetermined sequence. 

Equilibriums can be specified by any acyclic spanning subgraph of its nodes, where edges are 
dissociation constants. Here, hub protein
oligomerization is viewed as a curtain rod from which threads 
of ligand bound states/complexes hang: each notch down a thread 
corresponds to one additional ligand bound to the hub j-mer where j increases as 
one moves to the right on the curtain rod. At the top of each thread is
a head-node that sits on the rod. The head nodes must be specified, as 
some j values may be absent and some ligand sites (other than the thread 
defining site) may be assumed to be saturated in some j-mers. The last node in 
each thread will be referred to as a tail node. If a ligand has more than one binding site, 
the tail of the thread of one site (other than the last one filled) is 
the head of the thread of the site filled next. 
Thus, head nodes must be stated only for the first site filled.  

The example given below, where t is dTTP and R is the large subunit of ribonucleotide reductase, is 
not combinatorially complex, as there is only one ligand binding 
site (the s-site) and the hub protein forms at most a dimer. 
Thus, the thread topology of the acyclic graph used (to explore K equality 
hypotheses) has only two head nodes and two threads.  
The head node of the monomer thread is the free hub protein R1t0 and 
the head node of the dimer thread is the ligand free dimer R2t0.  
As there is only one site, the s-site, there are only two threads, one for the monomer 
and one for the dimer. Threads contain the names of
only their non-head nodes since their heads have already been specified. 
This structure is assigned to \code{topology} which is then passed to the function \code{mkg} 
to produce a generic model object \code{g}. Together with the data, this
generic model object is then passed to the function \code{ems} (equilibrium model selection) which generates the 
model space, fits it to the data, and returns the \code{topN} (typically 10 or 20) best (lowest AIC) models.  

The user must have working directory write privileges so that the subdirectories
\code{models} and \code{results} can be created to hold  model C code (generated 
by \code{mkg}) and html output (generated by \code{ems}), respectively.

The intended use of this package is on a linux cluster. Its development and use to date has been on a ROCKS cluster. 
 
 
}

\note{ This work was supported by the National Cancer Institute (K25CA104791). }

\author{ Tom Radivoyevitch (txr24@case.edu) }
\references{
Radivoyevitch, T. (2008) Equilibrium model selection: dTTP induced R1 dimerization. \emph{BMC Systems Biology} \bold{2}, 15. 

Radivoyevitch, T.  Automated model generation and selection methods for combinatorially complex biochemical equilibriums. (to be submitted to \emph{Biology Direct}). 
}
\seealso{\code{\link{ems}},  \code{\link{mkg}} }

\keyword{package}
\examples{
library(ccems)
## this example corresponds to the reference above: dTTP induced R1 dimerization
topology <- list(  
        heads=c("R1t0","R2t0"),  
        sites=list(       
                s=list(                     # s-site    thread #
                        m=c("R1t1"),        # monomer      1
                        d=c("R2t1","R2t2")  # dimer        2
                )
        )
) 
g <- mkg(topology,TCC=TRUE) 
data(RNR)
d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year))
d2 <- subset(RNR,year==2006,select=c(R,t,m,year)) 
dd <- rbind(d1,d2)
names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="") # e.g. to form "RT"
rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe

## Note: This block is for a ROCKS cluster
cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
chnkPs <- list(size=100,n=1,maxnPs=2,extend2maxP=TRUE)
\dontrun{
top10=ems(dd,g,cpusPerHost=cpusPerHost,chunkParams=chnkPs, ptype="SOCK") 
}
# The next example gives the cluster a really big (~12 hour) job
library(ccems)
topology <- list(
        heads=c("R1X0","R2X2","R4X4","R6X6"), # s-sites are already filled only in (j>1)-mer head nodes 
        sites=list(                    
                a=list(                                                              # a-site       thread #
                        m=c("R1X1"),                                                 # monomer          1
                        d=c("R2X3","R2X4"),                                          # dimer            2
                        t=c("R4X5","R4X6","R4X7","R4X8"),                            # tetramer         3
                        h=c("R6X7","R6X8","R6X9","R6X10", "R6X11", "R6X12")          # hexamer          4
                ),
                h=list( ## tails of a-site threads are heads of h-site threads       # h-site
                        m=c("R1X2"),                                                 # monomer          5
                        d=c("R2X5", "R2X6"),                                         # dimer            6
                        t=c("R4X9", "R4X10","R4X11", "R4X12"),                       # tetramer         7
                        h=c("R6X13", "R6X14", "R6X15","R6X16", "R6X17", "R6X18")     # hexamer          8
                )
        )
)
g=mkg(topology,TCC=TRUE) 
dd=subset(RNR,(year==2002)&(fg==1)&(X>0),select=c(R,X,m,year))
names(dd)[1:2]=paste(strsplit(g$id,split="")[[1]],"T",sep="") # e.g. c("RT","XT")

cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
chnkPs <- list(size=1000,n=1,maxnPs=3,extend2maxP=TRUE) # 29 choose 3(2) is 3654(406), so 3654 + 406 + 29 + 1 = 4090 spurs 
\dontrun{

top10=ems(dd,g,cpusPerHost=cpusPerHost,chunkParams=chnkPs, ptype="SOCK") 

# The following are the last few lines of the output. The first line shows that the two parameter models are the best
# (shown are best AICs with increasing numbers of parameters). The next shows that it took 820 minutes on 16 cpus. 
# And the block that follows shows that the top 10 modes are all two parameter spur graph models. The html file 
# RXglobSOCK.htm in the results directory contains this information and more (e.g. parameter estimates and CI).
# Of the total number of models fitted reported in the html file, 4133, the difference 4133 - 4090 = 43 is the number of grid 
# models fitted. Grid models are always fitted as one batch before spur model fitting begins. 

[1]  41.14828  23.95284 -31.11051 -27.29232
Time difference of 819.9431 mins

 ... making HTML file ... 
  1 Model 252; nbp= 2; id=IIIIIIIJIIIJIIIIIIIIIIIIIIIII; AIC=-31.1105
  2 Model 187; nbp= 2; id=IIIIJIIIIIIIJIIIIIIIIIIIIIIII; AIC=-30.9837
  3 Model 186; nbp= 2; id=IIIIJIIIIIIJIIIIIIIIIIIIIIIII; AIC=-30.7098
  4 Model 163; nbp= 2; id=IIIJIIIIIIIIJIIIIIIIIIIIIIIII; AIC=-30.4086
  5 Model 232; nbp= 2; id=IIIIIIJIIIIIJIIIIIIIIIIIIIIII; AIC=-30.1868

> ## End(Not run)
}
}
