---
title: "Simulating continuous data based on network graph structures"
author: "Tom Kelly"
date: "Thursday 18 June 2020"
output: 
  rmarkdown::html_document:
    keep_html: TRUE
vignette: >
  %\VignetteIndexEntry{Simulating network graph structure in continuous data}
   %\VignetteEngine{knitr::knitr}
  %\VignetteEncoding{UTF-8}
---



# Overview of graphsim

This package is designed to balance user-friendliness (by providing sensible defaults and built-in functions) and flexbility (many options are available to use as needed). It is likely that you will disagree with the parameters that I've used to simulate datasets. That's a valid opinion and there are many options at your disposal to alter parameters to use these functions for your purposes.

If you have issues or feedback, sumbmission to the the GitHub repository are welcome. See the DESCRIPTION and README.md for more details on how to suggest changes to the package.


## Motivations

Pathway and graph structures have a wide array of applications. Here will consider the simulation of (log-normalised) gene expression data from genomics experiments in a biological pathway. If you have another use for this software you are welcome to apply it to your problem, please bear in mind that it was designed with this application in mind however. In principle, normally-distributed continuous data can be generated based on any defined relationships. This uses the graph structure to define a ∑ covariance matrix and generate simulated data by sampling from a multivariate normal distribution.

Crucially, this allows the simulation of negative correlations based on inhibitory or repressive relationships, as commonly occur in biology. A custom plotting function `plot_directed` is provided to visualise these relationships with the "state" parameter. This plotting function has a dedicated vignette on [plotting](plots_directed.Rmd).

For more details on the background of this package, see the [paper](../paper/paper.Rmd) included with the package on GitHub. This vignette provides more detail on the code needed to reproduce the figures in this manuscript.
  
# Getting Started
  
## Install dependencies for Demonstration

The package can be installed as follows. Run the following command to install the current release from CRAN (recommended).


```r
#install packages required (once per machine)
install.packages("graphsim")
```

Run the following command to install the development version from GitHub (advanced users). This will import the latest changes to the package ahead of releasing updates, behaviour may be unstable.


```r
#install stable release
remotes::install_github("TomKellyGenetics", ref = "master")
#install development version
remotes::install_github("TomKellyGenetics", ref = "dev")
```

Once the required packages are installed, we must load the packages required to use the package functions with igraph objects and generate plots. Here `igraph` is required to create `igraph` objects and `gplots` is required for plotting heatmaps.


```r
library("igraph")
library("gplots")
library("graphsim")
library("scales")
```

## Set up simulated graphs

Here we set up a simple graph to demonstrate how connections in the graph structure lead to correlations in the final output. We create a simple pathway of 9 genes with various branches.



```r
graph_structure_edges <- rbind(c("A", "C"), c("B", "C"), c("C", "D"), c("D", "E"), c("D", "F"), c("F", "G"), c("F", "I"), c("H", "I"))
graph_structure <- graph.edgelist(graph_structure_edges, directed = T)
plot_directed(graph_structure, layout = layout.kamada.kawai)
```

<img src="Plotunnamed-chunk-5-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

# Generating  simulated expression data from graph

## Minimal example 

A simulated dataset can be generated with a single command. This is all that is required to get started.


```r
expr <- generate_expression(100, graph_structure, cor = 0.8, mean = 0, comm = FALSE, dist = TRUE, absolute = FALSE)
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-6-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

Here we've generated a simulated dataset of 100 samples with gene expression for the genes in the graph shown above. All other functions are called internally and are not needed to compute the final dataset in this heatmap plot. We will show below how these are used to demonstrate what computations are being performed to generate this data from the graph structure given.

Various arguments are supported to alter how the simulated datasets are computed. See the documentation for details.

# How it works step-by-step

Here we show the data generated by for this graph structure. This demonstrates how several of the options available compute the necessary steps.

### Adjacency matrix

The data can be summarised by an "adjacency matrix" where a one (1) is given between a row `i` and column `j` if there is an edge between genes `i` and `j`. Otherise it is a zero (0) for genes that are not connected. For an undirected graph, edges are shown in a symmetical matrix.


```r
adj_mat <- make_adjmatrix_graph(graph_structure)
heatmap.2(make_adjmatrix_graph(graph_structure), scale = "none", trace = "none", col = colorpanel(3, "grey75", "white", "blue"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-7-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

For a directed graph, the edges are not shown from gene (row) `i` and gene (column) `j` 


```r
heatmap.2(make_adjmatrix_graph(graph_structure, directed = T), scale = "none", trace = "none", col = colorpanel(3, "grey75", "white", "blue"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-8-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

We can compute the common links between each pair of genes. This shows how many genes are connected to both genes `i` and `j`.


```r
comm_mat <- make_commonlink_graph(graph_structure)
heatmap.2(make_commonlink_graph(graph_structure), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-9-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

Note that this weights towards genes with a higher vertex degree (as does the Laplacian).


```r
laplacian_mat <- make_laplacian_graph(graph_structure)
heatmap.2(make_laplacian_graph(graph_structure), scale = "none", trace = "none", col = bluered(50), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-10-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

### Distance matrix

To compute the relationships between each gene by "distance" we first compute the shortest paths between each pair of nodes by Dijkstra's algorithm. 


```r
shortest.paths(graph_structure)
```

```
##   A C B D E F G I H
## A 0 1 2 2 3 3 4 4 5
## C 1 0 1 1 2 2 3 3 4
## B 2 1 0 2 3 3 4 4 5
## D 2 1 2 0 1 1 2 2 3
## E 3 2 3 1 0 2 3 3 4
## F 3 2 3 1 2 0 1 1 2
## G 4 3 4 2 3 1 0 2 3
## I 4 3 4 2 3 1 2 0 1
## H 5 4 5 3 4 2 3 1 0
```

```r
heatmap.2(shortest.paths(graph_structure), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-11-1.png" width="50%" height="50%" style="display: block; margin: auto;" />
Relative to the "diameter" (length of the longest shortest path between any 2 nodes), we can show which genes are more similar or different based on the graph structure.


```r
(1+diameter(graph_structure)-shortest.paths(graph_structure))/diameter(graph_structure)
```

```
##      A    C    B    D    E    F    G    I    H
## A 1.25 1.00 0.75 0.75 0.50 0.50 0.25 0.25 0.00
## C 1.00 1.25 1.00 1.00 0.75 0.75 0.50 0.50 0.25
## B 0.75 1.00 1.25 0.75 0.50 0.50 0.25 0.25 0.00
## D 0.75 1.00 0.75 1.25 1.00 1.00 0.75 0.75 0.50
## E 0.50 0.75 0.50 1.00 1.25 0.75 0.50 0.50 0.25
## F 0.50 0.75 0.50 1.00 0.75 1.25 1.00 1.00 0.75
## G 0.25 0.50 0.25 0.75 0.50 1.00 1.25 0.75 0.50
## I 0.25 0.50 0.25 0.75 0.50 1.00 0.75 1.25 1.00
## H 0.00 0.25 0.00 0.50 0.25 0.75 0.50 1.00 1.25
```

```r
heatmap.2((1+diameter(graph_structure)-shortest.paths(graph_structure))/diameter(graph_structure), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-12-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

These relationships are used to create a distance graph relative to the diameter. A relative geometrically decreasing distance is computed as follows. In this case every connected node is weighted in fractions of the diameter.


```r
make_distance_graph(graph_structure, absolute = F)
```

```
##            A          C          B          D          E          F
## A 1.00000000 0.20000000 0.10000000 0.10000000 0.06666667 0.06666667
## C 0.20000000 1.00000000 0.20000000 0.20000000 0.10000000 0.10000000
## B 0.10000000 0.20000000 1.00000000 0.10000000 0.06666667 0.06666667
## D 0.10000000 0.20000000 0.10000000 1.00000000 0.20000000 0.20000000
## E 0.06666667 0.10000000 0.06666667 0.20000000 1.00000000 0.10000000
## F 0.06666667 0.10000000 0.06666667 0.20000000 0.10000000 1.00000000
## G 0.05000000 0.06666667 0.05000000 0.10000000 0.06666667 0.20000000
## I 0.05000000 0.06666667 0.05000000 0.10000000 0.06666667 0.20000000
## H 0.04000000 0.05000000 0.04000000 0.06666667 0.05000000 0.10000000
##            G          I          H
## A 0.05000000 0.05000000 0.04000000
## C 0.06666667 0.06666667 0.05000000
## B 0.05000000 0.05000000 0.04000000
## D 0.10000000 0.10000000 0.06666667
## E 0.06666667 0.06666667 0.05000000
## F 0.20000000 0.20000000 0.10000000
## G 1.00000000 0.10000000 0.06666667
## I 0.10000000 1.00000000 0.20000000
## H 0.06666667 0.20000000 1.00000000
```

```r
heatmap.2(make_distance_graph(graph_structure, absolute = F), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-13-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

An arithmetically decreasing distance is computed as follows. In this case every connected node is by the length of their shortest paths relative to the diameter.



```r
make_distance_graph(graph_structure, absolute = T)
```

```
##     A   C   B   D   E   F   G   I   H
## A 1.0 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0
## C 0.8 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2
## B 0.6 0.8 1.0 0.6 0.4 0.4 0.2 0.2 0.0
## D 0.6 0.8 0.6 1.0 0.8 0.8 0.6 0.6 0.4
## E 0.4 0.6 0.4 0.8 1.0 0.6 0.4 0.4 0.2
## F 0.4 0.6 0.4 0.8 0.6 1.0 0.8 0.8 0.6
## G 0.2 0.4 0.2 0.6 0.4 0.8 1.0 0.6 0.4
## I 0.2 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8
## H 0.0 0.2 0.0 0.4 0.2 0.6 0.4 0.8 1.0
```

```r
heatmap.2(make_distance_graph(graph_structure, absolute = T), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-14-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

### Sigma matrix

The &Sigma covariance matrix defines the relationships between the simulated gene distributions. Where the diagonal is one (1), the covariance terms are correlations between each gene. Where possible these are derived from the distance relationships described above. In cases where this is not compatible, the nearest "positive definite" symmetric matrix is computed.

These can be computed directly from an adjacency matrix.


```r
#sigma from adj mat
make_sigma_mat_graph(graph_structure, 0.8)
```

```
##     A   C   B   D   E   F   G   I   H
## A 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## C 0.8 1.0 0.8 0.8 0.0 0.0 0.0 0.0 0.0
## B 0.0 0.8 1.0 0.0 0.0 0.0 0.0 0.0 0.0
## D 0.0 0.8 0.0 1.0 0.8 0.8 0.0 0.0 0.0
## E 0.0 0.0 0.0 0.8 1.0 0.0 0.0 0.0 0.0
## F 0.0 0.0 0.0 0.8 0.0 1.0 0.8 0.8 0.0
## G 0.0 0.0 0.0 0.0 0.0 0.8 1.0 0.0 0.0
## I 0.0 0.0 0.0 0.0 0.0 0.8 0.0 1.0 0.8
## H 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 1.0
```

```r
heatmap.2(make_sigma_mat_graph(graph_structure, 0.8), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-15-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

A commonlink matrix can also be used to compute a &Sigma matrix.


```r
#sigma from comm mat
make_sigma_mat_graph(graph_structure, 0.8, comm = T)
```

```
##     A   C   B   D   E   F   G   I   H
## A 1.0 0.0 0.8 0.8 0.0 0.0 0.0 0.0 0.0
## C 0.0 1.0 0.0 0.0 0.8 0.8 0.0 0.0 0.0
## B 0.8 0.0 1.0 0.8 0.0 0.0 0.0 0.0 0.0
## D 0.8 0.0 0.8 1.0 0.0 0.0 0.8 0.8 0.0
## E 0.0 0.8 0.0 0.0 1.0 0.8 0.0 0.0 0.0
## F 0.0 0.8 0.0 0.0 0.8 1.0 0.0 0.0 0.8
## G 0.0 0.0 0.0 0.8 0.0 0.0 1.0 0.8 0.0
## I 0.0 0.0 0.0 0.8 0.0 0.0 0.8 1.0 0.0
## H 0.0 0.0 0.0 0.0 0.0 0.8 0.0 0.0 1.0
```

```r
heatmap.2(make_sigma_mat_graph(graph_structure, 0.8, comm = T), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-16-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

It is recommended to compute the distance relationships and use these. This is supported with the built-in functions. For instance &Sigma from the geometrically computed distances.



```r
# sigma from geometric distance matrix
make_sigma_mat_dist_graph(graph_structure, 0.8, absolute = F)
```

```
##           A         C         B         D         E         F
## A 1.0000000 0.8000000 0.4000000 0.4000000 0.2666667 0.2666667
## C 0.8000000 1.0000000 0.8000000 0.8000000 0.4000000 0.4000000
## B 0.4000000 0.8000000 1.0000000 0.4000000 0.2666667 0.2666667
## D 0.4000000 0.8000000 0.4000000 1.0000000 0.8000000 0.8000000
## E 0.2666667 0.4000000 0.2666667 0.8000000 1.0000000 0.4000000
## F 0.2666667 0.4000000 0.2666667 0.8000000 0.4000000 1.0000000
## G 0.2000000 0.2666667 0.2000000 0.4000000 0.2666667 0.8000000
## I 0.2000000 0.2666667 0.2000000 0.4000000 0.2666667 0.8000000
## H 0.1600000 0.2000000 0.1600000 0.2666667 0.2000000 0.4000000
##           G         I         H
## A 0.2000000 0.2000000 0.1600000
## C 0.2666667 0.2666667 0.2000000
## B 0.2000000 0.2000000 0.1600000
## D 0.4000000 0.4000000 0.2666667
## E 0.2666667 0.2666667 0.2000000
## F 0.8000000 0.8000000 0.4000000
## G 1.0000000 0.4000000 0.2666667
## I 0.4000000 1.0000000 0.8000000
## H 0.2666667 0.8000000 1.0000000
```

```r
heatmap.2(make_sigma_mat_dist_graph(graph_structure, 0.8, absolute = F), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-17-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

&Sigma can also be computed for arithmetically computed distances.


```r
# sigma from absolute distance matrix
make_sigma_mat_dist_graph(graph_structure, 0.8, absolute = T)
```

```
##     A   C   B   D   E   F   G   I   H
## A 1.0 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0
## C 0.8 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2
## B 0.6 0.8 1.0 0.6 0.4 0.4 0.2 0.2 0.0
## D 0.6 0.8 0.6 1.0 0.8 0.8 0.6 0.6 0.4
## E 0.4 0.6 0.4 0.8 1.0 0.6 0.4 0.4 0.2
## F 0.4 0.6 0.4 0.8 0.6 1.0 0.8 0.8 0.6
## G 0.2 0.4 0.2 0.6 0.4 0.8 1.0 0.6 0.4
## I 0.2 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8
## H 0.0 0.2 0.0 0.4 0.2 0.6 0.4 0.8 1.0
```

```r
heatmap.2(make_sigma_mat_dist_graph(graph_structure, 0.8, absolute = T), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-18-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

### Simulated expression and observed correlation

Here we generate the final simulated expression dataset. Note that none of the prior steps are required. These are called internalled as needed.

For example, the adjacency matrix is derived to generate the following dataset. Note that the nearest positive definite matrix is required for the &Sigma matrix in this case.



```r
#simulate expression data
#adj mat
expr <- generate_expression(100, graph_structure, cor = 0.8, mean = 0, comm = F) # unable to generate from adj mat ## fixed with positive definite correction
```

```
## Warning in generate_expression(100, graph_structure, cor = 0.8, mean
## = 0, : sigma matrix was not positive definite, nearest approximation
## used.
```

```r
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-19-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

```r
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-19-2.png" width="50%" height="50%" style="display: block; margin: auto;" />

Here we compute a simluated dataset based on common links shared to other nodes.


```r
#comm mat
expr <- generate_expression(100, graph_structure, cor = 0.8, mean = 0, comm =T) #expression from comm mat
```

```
## Warning in generate_expression(100, graph_structure, cor = 0.8, mean
## = 0, : sigma matrix was not positive definite, nearest approximation
## used.
```

```r
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-20-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

```r
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-20-2.png" width="50%" height="50%" style="display: block; margin: auto;" />

Here we use relative distance (relationships are geometrically weighted to the diameter).


```r
# relative dist
expr <- generate_expression(100, graph_structure, cor = 0.8, mean = 0, comm = F, dist = T, absolute = F)
```

```
## Warning in generate_expression(100, graph_structure, cor = 0.8, mean
## = 0, : sigma matrix was not positive definite, nearest approximation
## used.
```

```r
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-21-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

```r
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "grey75", "red"), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-21-2.png" width="50%" height="50%" style="display: block; margin: auto;" />

Here we use absolute distance (relationships are arithmetrically weighted to the diameter).





```r
#absolute dist
expr <- generate_expression(100, graph_structure, cor = 0.8, mean = 0, comm = F, dist = T, absolute = T) # unable to generate from adj mat ## fixed PD
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-23-1.png" width="50%" height="50%" style="display: block; margin: auto;" />

```r
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = bluered(50), colsep = 1:4, rowsep = 1:4)
```

<img src="Plotunnamed-chunk-23-2.png" width="50%" height="50%" style="display: block; margin: auto;" />


## Summary

In summary, we compute the following expression dataset but on these underlying relationships in the graph structure. Here we use geometrically decreasing correlations between more distant nodes in the network.





```r
# activating graph
state <- rep(1, length(E(graph_structure)))
plot_directed(graph_structure, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2)
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(graph_structure, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_structure)), rowsep = 1:length(V(graph_structure)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(graph_structure, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_structure)), rowsep = 1:length(V(graph_structure)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, graph_structure, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_structure)), rowsep = 1:length(V(graph_structure)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(graph_structure)), rowsep = 1:length(V(graph_structure)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_activating_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_activating_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_activating_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_activating_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_activating_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

### Inhibiting relationships

Here we simulate the same graph structure with inhibiting edges but passing the `"state"` parameter. This takes one argument for each each to identify which are inhibitory (as documented).





```r
# activating graph
state <- state <- c(1, 1, -1, 1, 1, 1, 1, -1)
plot_directed(graph_structure, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2)
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(graph_structure, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_structure)), rowsep = 1:length(V(graph_structure)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(graph_structure, state, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(graph_structure)), rowsep = 1:length(V(graph_structure)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, graph_structure, state, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(graph_structure)), rowsep = 1:length(V(graph_structure)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(graph_structure)), rowsep = 1:length(V(graph_structure)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_inhibiting_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_inhibiting_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_inhibiting_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_inhibiting_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_inhibiting_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

# Toy examples

Here we give some toy examples to show how the simulations behave in simple cases. This serves to understand how modules within a larger graph will translate to correlations in the final simulated datasets.

### Diverging branches





```r
graph_diverging_edges <- rbind(c("A", "B"), c("B", "C"), c("B", "D"))
graph_diverging <- graph.edgelist(graph_diverging_edges, directed = T)

# activating graph
state <- rep(1, length(E(graph_diverging)))
plot_directed(graph_diverging, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2)
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(graph_diverging, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_diverging)), rowsep = 1:length(V(graph_diverging)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(graph_diverging, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_diverging)), rowsep = 1:length(V(graph_diverging)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, graph_diverging, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_diverging)), rowsep = 1:length(V(graph_diverging)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(graph_diverging)), rowsep = 1:length(V(graph_diverging)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_graph_diverging_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_diverging_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_diverging_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_diverging_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_diverging_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

#### Inhibiting relationships

Here we simulate the same graph structure with inhibiting edges but passing the `"state"` parameter. This takes one argument for each each to identify which are inhibitory (as documented).





```r
# activating graph
state <- state <- c(1, 1, -1)
plot_directed(graph_diverging, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2)
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(graph_diverging, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_diverging)), rowsep = 1:length(V(graph_diverging)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(graph_diverging, state, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(graph_diverging)), rowsep = 1:length(V(graph_diverging)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, graph_diverging, state, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(graph_diverging)), rowsep = 1:length(V(graph_diverging)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(graph_diverging)), rowsep = 1:length(V(graph_diverging)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_graph_diverging_inhibiting_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_diverging_inhibiting_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_diverging_inhibiting_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_diverging_inhibiting_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_diverging_inhibiting_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

### Converging branches





```r
graph_converging_edges <- rbind(c("C", "E"), c("D", "E"), c("E", "F"))
graph_converging <- graph.edgelist(graph_converging_edges, directed = T)

# activating graph
state <- rep(1, length(E(graph_converging)))
plot_directed(graph_converging, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2)
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(graph_converging, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_converging)), rowsep = 1:length(V(graph_converging)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(graph_converging, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_converging)), rowsep = 1:length(V(graph_converging)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, graph_converging, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_converging)), rowsep = 1:length(V(graph_converging)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(graph_converging)), rowsep = 1:length(V(graph_converging)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_graph_converging_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_converging_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_converging_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_converging_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_converging_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

#### Inhibiting relationships

Here we simulate the same graph structure with inhibiting edges but passing the `"state"` parameter. This takes one argument for each each to identify which are inhibitory (as documented).





```r
# activating graph
state <- state <- c(-1, 1, -1)
plot_directed(graph_converging, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2)
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(graph_converging, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_converging)), rowsep = 1:length(V(graph_converging)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(graph_converging, state, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(graph_converging)), rowsep = 1:length(V(graph_converging)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, graph_converging, state, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(graph_converging)), rowsep = 1:length(V(graph_converging)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(graph_converging)), rowsep = 1:length(V(graph_converging)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_graph_converging_inhibiting_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_converging_inhibiting_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_converging_inhibiting_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_converging_inhibiting_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_converging_inhibiting_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

### Reconnecting paths





```r
graph_reconnecting_edges <- rbind(c("A", "B"), c("B", "C"), c("B", "D"),c("C", "E"), c("D", "E"), c("E", "F"))
graph_reconnecting <- graph.edgelist(graph_reconnecting_edges, directed = T)

# activating graph
state <- rep(1, length(E(graph_reconnecting)))
plot_directed(graph_reconnecting, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2)
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(graph_reconnecting, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_reconnecting)), rowsep = 1:length(V(graph_reconnecting)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(graph_reconnecting, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_reconnecting)), rowsep = 1:length(V(graph_reconnecting)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, graph_reconnecting, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_reconnecting)), rowsep = 1:length(V(graph_reconnecting)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(graph_reconnecting)), rowsep = 1:length(V(graph_reconnecting)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_graph_reconnecting_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_reconnecting_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_reconnecting_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_reconnecting_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_reconnecting_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

#### Inhibiting relationships

Here we simulate the same graph structure with inhibiting edges but passing the `"state"` parameter. This takes one argument for each each to identify which are inhibitory (as documented).





```r
# activating graph
state <- state <- c(1, 1, -1, -1, 1, 1, 1, 1)
plot_directed(graph_reconnecting, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2)
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(graph_reconnecting, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(graph_reconnecting)), rowsep = 1:length(V(graph_reconnecting)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(graph_reconnecting, state, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(graph_reconnecting)), rowsep = 1:length(V(graph_reconnecting)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, graph_reconnecting, state, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(graph_reconnecting)), rowsep = 1:length(V(graph_reconnecting)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(graph_reconnecting)), rowsep = 1:length(V(graph_reconnecting)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_graph_reconnecting_inhibiting_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_reconnecting_inhibiting_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_reconnecting_inhibiting_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_reconnecting_inhibiting_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_graph_reconnecting_inhibiting_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

# Empirical examples

Next we demonstrate the simulation procedure based on real biological pathways from the "Reactome" database. We can import these from the `data` directory included with this package. These graphs are given for examples and convenience.

### Kinase pathways

The following pathways are treated as all relationships are activating.

#### RAF/MAP kinase cascade

Here we generate simulated data for the RAF/MAP kinase cascade pathway.







```r
RAF_MAP_graph <- identity(RAF_MAP_graph)

# activating graph
state <- rep(1, length(E(RAF_MAP_graph)))
plot_directed(RAF_MAP_graph, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2,
              col.arrow = "#00A9FF", fill.node = "lightblue")
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(RAF_MAP_graph, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(RAF_MAP_graph)), rowsep = 1:length(V(RAF_MAP_graph)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(RAF_MAP_graph, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(RAF_MAP_graph)), rowsep = 1:length(V(RAF_MAP_graph)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, RAF_MAP_graph, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(RAF_MAP_graph)), rowsep = 1:length(V(RAF_MAP_graph)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(RAF_MAP_graph)), rowsep = 1:length(V(RAF_MAP_graph)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_RAF_MAP_graph_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_RAF_MAP_graph_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_RAF_MAP_graph_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_RAF_MAP_graph_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_RAF_MAP_graph_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

#### Phosphoinositide-3-kinase cascade

Here we generate simulated data for the phosphoinositide-3-kinase (Pi3K) cascade pathway.





```r
graph <- identity(Pi3K_graph)

# activating graph
state <- rep(1, length(E(Pi3K_graph)))
plot_directed(Pi3K_graph, state=state, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2,
              col.arrow = "#00A9FF", fill.node = "lightblue")
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(Pi3K_graph, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(Pi3K_graph)), rowsep = 1:length(V(Pi3K_graph)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(Pi3K_graph, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(Pi3K_graph)), rowsep = 1:length(V(Pi3K_graph)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, Pi3K_graph, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = state)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(Pi3K_graph)), rowsep = 1:length(V(Pi3K_graph)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(Pi3K_graph)), rowsep = 1:length(V(Pi3K_graph)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_Pi3K_graph_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_Pi3K_graph_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_Pi3K_graph_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_Pi3K_graph_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_Pi3K_graph_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

### Pathways with repression

#### The Pi3K/AKT pathway

Here we generate simulated data for the phosphoinositide-3-kinase activation of Protein kinase B (PKB) cascade (also known as Pi3k/AKT) pathway. States are imported as edge attributes from the imported graph.








```r
Pi3K_AKT_graph <- identity(Pi3K_AKT_graph)
Pi3K_AKT_graph <- simplify(Pi3K_AKT_graph, edge.attr.comb = function(x) ifelse(any(x %in% list(-1, 2, "inhibiting", "inhibition")), 2, 1))
Pi3K_AKT_graph <- simplify(Pi3K_AKT_graph, edge.attr.comb = "first")
edge_properties <- E(Pi3K_AKT_graph)$state

# activating graph
#state <- rep(1, length(E(Pi3K_AKT_graph)))
plot_directed(Pi3K_AKT_graph, state = edge_properties, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2,
              col.arrow = c(alpha("navyblue", 0.25), alpha("red", 0.25))[edge_properties],fill.node = "lightblue")
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(Pi3K_AKT_graph, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(Pi3K_AKT_graph, state = edge_properties, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, Pi3K_AKT_graph, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = edge_properties)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_Pi3K_AKT_graph_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_Pi3K_AKT_graph_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_Pi3K_AKT_graph_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_Pi3K_AKT_graph_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_Pi3K_AKT_graph_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

#### The TGFβ-Smad pathway

Here we generate simulated data for the TGFβ-Smad gene regulatory pathway with inhibitions known. States are imported as edge attributes from the imported graph.



```r
TGFBeta_Smad_graph <- identity(TGFBeta_Smad_graph)
edge_properties <- E(TGFBeta_Smad_graph)$state
plot_directed(TGFBeta_Smad_graph, state = edge_properties, col.arrow = c(alpha("navyblue", 0.25), alpha("red", 0.25))[edge_properties], fill.node = c("lightblue"))
```

<img src="Plotunnamed-chunk-37-1.png" width="80%" style="display: block; margin: auto;" />




```r
TGFBeta_Smad_graph <- identity(TGFBeta_Smad_graph)
edge_properties <- E(TGFBeta_Smad_graph)$state

# activating graph
plot_directed(TGFBeta_Smad_graph, state = edge_properties, layout = layout.kamada.kawai,
              cex.node=2, cex.arrow=4, arrow_clip = 0.2,
              col.arrow = c(alpha("navyblue", 0.25), alpha("red", 0.25))[edge_properties],fill.node = "lightblue")
mtext(text = "(a) Activating pathway structure", side=1, line=3.5, at=0.075, adj=0.5, cex=1.75)
box()
#plot relationship matrix
heatmap.2(make_distance_graph(TGFBeta_Smad_graph, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "white", "red"),
          colsep = 1:length(V(TGFBeta_Smad_graph)), rowsep = 1:length(V(TGFBeta_Smad_graph)))
mtext(text = "(b) Relationship matrix", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot sigma matrix
heatmap.2(make_sigma_mat_dist_graph(TGFBeta_Smad_graph, state = edge_properties, cor = 0.8, absolute = FALSE),
          scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(TGFBeta_Smad_graph)), rowsep = 1:length(V(TGFBeta_Smad_graph)))
mtext(text = expression(paste("(c) ", Sigma, " matrix")), side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#simulated data
expr <- generate_expression(100, TGFBeta_Smad_graph, cor = 0.8, mean = 0,
                            comm = FALSE, dist =TRUE, absolute = FALSE, state = edge_properties)
#plot simulated correlations
heatmap.2(cor(t(expr)), scale = "none", trace = "none", col = colorpanel(50, "blue", "white", "red"),
          colsep = 1:length(V(TGFBeta_Smad_graph)), rowsep = 1:length(V(TGFBeta_Smad_graph)))
mtext(text = "(d) Simulated correlation", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
#plot simulated expression data
heatmap.2(expr, scale = "none", trace = "none", col = bluered(50),
          colsep = 1:length(V(TGFBeta_Smad_graph)), rowsep = 1:length(V(TGFBeta_Smad_graph)), labCol = "")
mtext(text = "samples", side=1, line=1.5, at=0.2, adj=0.5, cex=1.5)
mtext(text = "genes", side=4, line=1, at=-0.4, adj=0.5, cex=1.5)
mtext(text = "(e) Simulated expression data (log scale)", side=1, line=3.5, at=0, adj=0.5, cex=1.75)
```

<img src="Plotsimulation_TGFBeta_Smad_graph_hide-1.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_TGFBeta_Smad_graph_hide-2.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_TGFBeta_Smad_graph_hide-3.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_TGFBeta_Smad_graph_hide-4.png" width="50%" height="50%" style="display: block; margin: auto;" /><img src="Plotsimulation_TGFBeta_Smad_graph_hide-5.png" width="50%" height="50%" style="display: block; margin: auto;" />

# Summary

Here we have demonstrated that simulated datasets can be generated based on graph structures. These can be either constructed networks for modelling purposes or empirical networks such as those from curated biological databases. 
Various parameters are available and described in the documentation. You can alter these parameters in the examples given here to see the impact they have on the final network. It is encouraged to try different parameters and examine the results carefully, in addition to carefully considering which assumptions are appropriate for your investigations. A model or simulation is never correct, it is a tool to test your assumptions and find weaknesses in your technique, consider which conditions could your method struggle with and model these. Pathway structure in particular should be considered in biological datasets as correlations within a pathway can lead to false positive results and confounding.

The intended application for thse package is modelling RNA-Seq gene expression data. However, other applications are encouraged, provided that they require multivariate normal simulations based on relationships in graph structures.

## Citation

If you use this package, please cite it where appropriate to recognise the efforts of the developers.


```r
citation("graphsim")
```

```
## 
## To cite package 'graphsim' in publications use:
## 
##   S. Thomas Kelly and Michael A. Black (2019). graphsim:
##   Simulate Expression data from iGraph networks. R package
##   version 1.0.0.
##   https://github.com/TomKellyGenetics/graphsim
##   doi:10.5281/zenodo.1313986
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {{graphsim}: Simulate Expression data from iGraph networks},
##     author = {S. Thomas Kelly and Michael A. Black},
##     year = {2019},
##     note = {R package version 1.0.0},
##     url = {https://github.com/TomKellyGenetics/graphsim},
##     doi = {10.5281/zenodo.1313986},
##   }
## 
## Please also acknowledge the manuscript describing use of
## this package once it is published.
## 
## Kelly, S.T. and Black, M.A. (2020) graphsim: An R package
## for simulating gene expression data from graph structures of
## biological pathways.
## 
## @Article{, title = {{graphsim}: An {R} package for
## simulating gene expression data from graph structures of
## biological pathways}, journal = {}, author = {S. Thomas
## Kelly and Michael A. Black}, year = {2020}, volume = {},
## number = {}, pages = {}, month = {}, note = {Submitted for
## peer-review}, url =
## {https://github.com/TomKellyGenetics/graphsim}, doi = {}, }
```

## Reporting issues

Please see the GitHub repository for reporting problems and requesting changes. Details on how to contribute can be found in the DESCRIPTION and README.
