CoSMoS R | Complete Stochastic Modelling Solution

Simon Michael Papalexiou, Francesco Serinaldi, Filip Strnad, Yannis Markonis, Kevin Shook

2021-05-29


CoSMoS was conceived back in 2009 (see note in section 5) and was officially released on CRAN in April 2019. Since then CoSMoS has become one of the leading and most widely downloaded R packages for stochastic simulation of non-Gaussian time series. Designed with the end user in mind, CoSMoS makes univariate, multivariate, or random field simulations easy. You can reliably generate time series or fields of rainfall, streamflow, wind speed, relative humidity, or any other environmental variable in seconds. Just choose the probability distribution and correlation properties of the time series you want to generate, and it will do the rest.


~Unified Theory of Stochastic Modelling | Papalexiou (2018)

“Hydroclimatic processes come in all “shapes and sizes”. They are characterized by different spatiotemporal correlation structures and probability distributions that can be continuous, mixed-type, discrete or even binary. Simulating such processes by reproducing precisely their marginal distribution and linear correlation structure, including features like intermittency, can greatly improve hydrological analysis and design. … Here, a single framework is proposed that unifies, extends, and improves a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific “parent”" Gaussian process."

~Precise Temporal Disaggregation | Papalexiou et al. (2018)

“Hydroclimatic variables such as precipitation and temperature are often measured or simulated by climate models at coarser spatiotemporal scales than those needed for operational purposes. … Here we introduce a novel disaggregation method, named Disaggregation Preserving Marginals and Correlations (DiPMaC), that is able to disaggregate a coarse-scale time series to any finer scale, while reproducing the probability distribution and the linear correlation structure of the fine-scale process. DiPMaC is also generalized for arbitrary nonstationary scenarios to reproduce time varying marginals.”

~Random Fields Simplified | Papalexiou and Serinaldi (2020)

“Nature manifests itself in space and time. The spatiotemporal complexity of processes such as precipitation, temperature, and wind, does not allow purely deterministic modeling. Spatiotemporal random fields have a long history in modeling such processes, and yet a single unified framework offering the flexibility to simulate processes that may differ profoundly does not exist. Here we introduce a blueprint to efficiently simulate spatiotemporal random fields that preserve any marginal distribution, any valid spatiotemporal correlation structure, and intermittency.”

~Advancing Space-Time Simulation | Papalexiou, Serinaldi and Porcu (2021)

“…we advance random field simulation by introducing the concepts of general velocity fields and general anisotropy transformations. To illustrate the potential of CoSMoS, we simulate random fields with complex patterns and motion mimicking rainfall storms moving across an area, spiraling fields resembling weather cyclones, fields converging to (or diverging from) a point, and colliding air masses.”


1 Step by step guide to timeseries simulation

NOTE: For R code chunks running longer than ~10s, we report the CPU time for a Windows 10 Pro x64 laptop with Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 4-core, 32GB RAM.

In most cases we can accurately simulate a process by reproducing its marginal distribution and its autocorrelation structure. The first is typically represented by a parametric probability distribution; the second either by a parametric correlation structure, or by specific correlation coefficient values. CoSMoS is the first software that enables the generation of time series while preserving any marginal distribution, correlation structure, and intermittency. Intermittency can be defined by the probability of zero values, and is essential in the simulation of precipitation at fine scales (e.g., daily, hourly) but also for any other intermittent process, such as flows of ephemeral streams, wind speed, etc.

Accurate generation of time series requires knowledge of the statistical properties of the process under study. Processes such as precipitation, streamflow, temperature, wind, or evaporation may have very different properties; that is, they can be described by different probability distributions, correlation structures, and may or may not be intermittent.

There are four main steps required to generate a time series: 1. Choosing a probability distribution 2. Choosing a correlation structure, 3. Adding intermittency, and 4. Validating the results.

The four steps are explained below.

1.1 Choosing a Probability Distribution

The choice of a probability distribution to describe the variable you wish to simulate is crucial. The probability distribution expresses how frequently specific magnitudes of the variable occur in the data set. Air temperature is typically described by bell-shaped distributions like the Normal. Relative humidity needs distributions defined in the interval (0,1) such as the Beta or the Kumaraswamy. Streamflow typically has heavy tails and power-type distributions might be needed to reproduce the behavior of the extremes. Wind speed typically can be modelled by positively skewed distributions such as the Weibull. Precipitation at fine scales has to be modelled by mixed-type (or else zero inflated) distributions, that is, having a probability value to describe the occurrence of zeros, and a continuous distribution, such the Generalized Gamma or the Burr type XII, to describe the frequency of nonzero values.

CoSMoS supports all the standard distribution functions of R and from other packages (tens of distributions) but also comes with some built-in distributions such as Generalized Gamma, the Pareto type II, and the Burr type III and XII distributions. More details on the CoSMoS distributions and their parameters can be found in Section 4.

Distributions may have differing numbers and types of parameters (e.g. location, scale, and shape parameters). For example, the Generalized Gamma distribution has one scale and two shape parameters.

CoSMoS has built-in functions to fit distributions to data and assess their goodness-of-fit based on graphs; but you can also use any other package for distribution fitting.

1.2 Choosing a Correlation Structure

The correlation structure expresses how much the random variables depend upon each other. Processes at fine temporal scales are typically more correlated than those at coarse scales. CoSMoS offers two options to introduce correlations: (1) by defining specific lag autocorrelations as a list starting from lag 0 up to a desired limit, or (2) by using a pre-defined parametric autocorrelation structure.

For example, you can generate a time series with lag-1 autocorrelation \(\rho_1 = 0.8\) and Generalized Gamma marginal distribution by:

## (i) specifying the sample size
no <- 1000
## (ii) defining the type of marginal distribution and its parameters
marginaldist <- "ggamma"
param <- list(scale = 1, shape1 = .8, shape2 = .8)
## (iii) defining the desired autocorrelation
acf.my <- c(1, 0.8)
## (iv) simulating
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf.my)
## and (v) visually checking the generated time series
quickTSPlot(ggamma_sim[[1]]) 

You can specify also as many autocorrelation values as you wish. In this example, autocorrelation values are specified up to lag 4:

acf <- c(1, 0.6, 0.5, 0.4, 0.3) #up to lag-4
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf)
quickTSPlot(ggamma_sim[[1]])

A better approach is to use an autocorrelation structure expressed by a parametric function. CoSMoS has built in autocorrelation structures than can be used by the acs() function. You can use the following ACS structures:

For most practical applications a two-parameter ACS is sufficient; we suggest either the Weibull or the Pareto II.

The acs() function produces autocorrelation values based on any of the four structures and any desired lag. Thus, instead of setting each autocorrelation coefficient explicitly (which might be problematic from a technical viewpoint), we can generate series preserving any one of the four ACSs. For example, you can easily define an ACS and visualize its values up to any lag. Here, all four ACSs are defined and visualized up to lag 10. You can see how changing the ACS function changes how the correlation coefficients decay to zero.

## specify lag
lags <- 0:10

## get the ACS
f <- acs(id = "fgn", t = lags, H = .75)
b <- acs(id = "burrXII", t = lags, scale = 1, shape1 = .6, shape2 = .4)
w <- acs(id = "weibull", t = lags, scale = 2, shape = 0.8)
p <- acs(id = "paretoII", t = lags, scale = 3, shape = 0.3)

## visualize the ACS
dta <- data.table(lags, f, b, w, p)
m.dta <- melt(data = dta, id.vars = "lags")

ggplot(data = m.dta, mapping = aes(x = lags, y = value, group = variable, colour = variable)) + 
  geom_point(size = 2.5) + geom_line(lwd = 1) +
  scale_color_manual(values = c("steelblue4", "red4", "green4", "darkorange"), 
  labels = c("FGN", "Burr XII", "Weibull", "Pareto II"), name = "") +
  labs(x = bquote(lag ~ tau), y = "ACS") + scale_x_continuous(breaks = lags) + theme_light()

We can re-generate the previously generated series which used explicity defined correlations up to lag 4, now with a two parameter Pareto II ACF up to lag 30, which improves the modelling parsimony. More details about the parametric autocorrelation structures can be found in section 3.2 in Papalexiou (2018).

acf <- acs(id = "paretoII", t = 0:30, scale = 1, shape = .75)
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf)
dta <- data.frame(time = 1:no, value = ggamma_sim[[1]])

quickTSPlot(dta$value)

Apart from the four autocorrelation functions provided by acs(), we can also create our own. Note that it is important to ensure that the function is positive definite, otherwise the autoregressive model cannot be fitted. This example shows the generation of a time series with the same Generalized Gamma marginal distribution as above, but with a user-defined exponential ACS function up to lag 30. Note that although the time series density is very similar to that in the previous plot, the texture of the time series is very different, due to the changed ACS function.

my_acf <- exp(seq(0, -2, -0.1))
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = my_acf)
quickTSPlot(ggamma_sim[[1]])

1.3 Adding Intermittency (if required)

CoSMoS easily simulates intermittent processes such as precipitation. The only extra step needed is to define the probability of zero events. For example, say you wish to generate 5 mutually independent timeseries having the previously defined Generalized Gamma distribution with the Pareto II ACS and probability zero p0 = 0.9. The generated data thus will have 90% of zero values (i.e. dry days).

prob_zero <- .9
## the argument `TSn = 5` enables the simulation of 5 timeseries.
ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf, 
                         p0 = prob_zero, TSn = 5)
plot(x = ggamma_sim, main = "") + theme_light()

1.4 Validating the Results

You can readily check some basic statistics of the generated time series using the checkTS() function, which return the first three moments, probability zero and the first three autocorrelation coefficients.

checkTS(ggamma_sim)
##             mean   sd skew   p0 acf_t1 acf_t2 acf_t3
## expected    0.11 0.57 8.57 0.90   0.47   0.29   0.21
## simulation1 0.08 0.40 7.32 0.90   0.42   0.23   0.26
## simulation2 0.07 0.45 8.44 0.93   0.51   0.45   0.14
## simulation3 0.10 0.40 6.32 0.88   0.34   0.14   0.12
## simulation4 0.15 0.69 7.68 0.88   0.50   0.34   0.26
## simulation5 0.10 0.45 7.45 0.89   0.23   0.04   0.05
## attr(,"class")
## [1] "checkTS" "matrix" 
## attr(,"margdist")
## [1] "ggamma"
## attr(,"margarg")
## attr(,"margarg")$scale
## [1] 1
## 
## attr(,"margarg")$shape1
## [1] 0.8
## 
## attr(,"margarg")$shape2
## [1] 0.8
## 
## attr(,"p0")
## [1] 0.9

2 Multivariate and Random Fields Simulation

The above methods can readily be extended to multidimensional cases, thus enabling the simulation of spatiotemporally correlated random vectors (i.e. correlated timeseries at multiple locations) and random fields (i.e. gridded data), as discussed in detail by Papalexiou and Serinaldi (2020). This requires the definition of suitable spatiotemporal correlation structures (see Porcu et al. (2020) for a thorough review of this topic).

2.1 Spatiotemporal Correlation Structures

CoSMoS allows the definition of spatiotemporal correlation functions using the function stcs(), which the spatiotemporal counterpart of the purely temporal acs(). Two classes of spatiotemporal correlation functions are provided:

The stcs() function can produce values of the linear spatiotemporal correlation for any desired time lag and spatial distance using these two correlation function classes, which comprise a variety of structures covering most cases of practical interest. This example shows the Clayton-Weibull spatiotemporal correlation structure:

## specify grid of spatial and temporal lags
d <- 51
st <- expand.grid(0:(d - 1), 0:(d - 1))

## get the STCS
wc <- stcfclayton(t = st[, 1], s = st[, 2], scfid = "weibull", tcfid = "weibull", copulaarg = 2,
                  scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 5.1,shape = 0.8))

## visualize the STCS
wc.m <- matrix(data = wc, nrow = d)
j <- tail(which(wc.m[1, ] > 0.15), 1)
i <- tail(which(wc.m[, 1] > 0.15), 1)
wc.m <- wc.m[1:i, 1:j]

persp3D(z = wc.m, x = 1: nrow(wc.m), y = 1:ncol(wc.m),
expand = 1, main = "", scale = TRUE, facets = TRUE,
xlab="Time lag", ylab = "Distance", zlab = "STCF", colkey = list(side = 4, length = 0.5),
phi = 20, theta = 120, resfac = 5,  col= gg2.col(100))

2.2 Multivariate Time Series Simulation

Similar to the one-dimensional case (i.e. using the function generateTS()), CoSMoS provides the functions generateMTS() and generateMTSFast() to simulate multiple timeseries with specified (identical) marginals and spatiotemporal correlation structures (STCs). Gaussian Vector AutoRegressive (VAR) models are used to reproduce the parent-Gaussian STFC (see Papalexiou (2018) and Papalexiou and Serinaldi (2020)). The VAR parameters see e.g. (Biller and Nelson 2003) corresponding to the desired spatiotemporal correlation function of the target process, are produced by the function fitVAR().

In this example, a set of five random locations is defined that could represent precipitation in five different places. A VAR is then fitted using

From the five locations, and the VAR, a set of five timeseries is generated. When the series are plotted, the spatio-temporal correlations among the series can be seen.

## set a sequence of hypothetical coordinates 
d <- 5
coord <- cbind(runif(d)*30, runif(d)*30)

## compute VAR model parameters  
fit <- fitVAR(spacepoints = coord, 
              p = 4, 
              margdist = "burrXII",
              margarg = list(scale = 3, shape1 = .9, shape2 = .2), 
              p0 = 0.8, 
              stcsid = "clayton",
              stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2,
                   scfarg = list(scale = 25, shape = 0.7), 
                   tcfarg = list(scale = 3.1, shape = 0.8) ) )

## generate correlated timeseries  
sim <- generateMTS(n = 500, STmodel = fit)

## visualize simulated timeseries
dta <- melt(data = data.table(time = 1:nrow(sim), sim[,1:d]), id.vars = "time")

ggplot(data = dta, mapping = aes(x = time, y = value)) + geom_line() +
       facet_grid(facets = variable ~ ., scales = "free_y") + theme_light()

The function generateMTSFast() generates multiple time series with approximately separable STCS (Serinaldi and Kilsby 2018), using an algorithm which is computationally less expensive than that in generateMTS(). This allows the simulation of a larger number of cross-correlated and serially dependent timeseries, at the cost of using separable STCSs and accepting some lack of accuracy in the exact reproduction of some terms of the required STCS (Serinaldi and Kilsby 2018).

This example uses generateMTSFast() with similar parameters to the previous example using generateMTS().

## set a sequence of hypothetical coordinates
d <- 5
coord <- cbind(runif(d)*30, runif(d)*30)

## fit and generate correlated timeseries
sim <- generateMTSFast(n = 500, 
                       spacepoints = coord, 
                       p0 = 0.7, 
                       margdist ="burrXII",
                       margarg = list(scale = 3, shape1 = .9, shape2 = .2),  
                       stcsarg = list(scfid = "weibull", tcfid = "weibull",
                       scfarg = list(scale = 25, shape = 0.7),
                       tcfarg = list(scale = 3.1, shape = 0.8)) )

## visualize simulated timeseries
dta <- melt(data = data.table(time = 1:nrow(sim), sim[,1:d]), id.vars = "time")

ggplot(data = dta, mapping = aes(x = time, y = value)) + geom_line() +
       facet_grid(facets = variable ~ ., scales = "free_y") + theme_light()

2.3 Random Fields Simulation

2.3.1 Isotropic Random Fields

CoSMoS simulates spatially and temporally correlated isotropic random fields with the functions generateRF() and generateRFFast(), which are analogs of generateMTS() and generateMTSFast(), with the same syntax. The only difference is the specification of the spatial points, which is an integer denoting the side length of a square grid . As was mentioned in the previous section, the algorithm in generateRFFast() is computationally less expensive than that of generateRF(), enabling the simulation of random fields over a greater number of grid points (see Papalexiou (2018), Papalexiou and Serinaldi (2020), and Serinaldi and Kilsby (2018) for more details).

The example below shows the use of both generateRF and generateRFFast. The generation of random fields using generateRF took approximately 16 times as long as using generateRFFast.

## compute VAR model parameters
## CPU time: ~15s 
fit <- fitVAR(spacepoints = 20, p = 3, margdist ="burrXII",
              margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.8, stcsid = "clayton",
              stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2,
              scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 1.1, shape = 0.8) ) )

## generate isotropic random fields with nonseparable correlation structure 
sim1 <- generateRF(n = 1000, STmodel = fit)

## fast simulation of isotropic random fields with separable correlation structure
sim2 <- generateRFFast(n = 1000, spacepoints = 20, p0 = 0.7, margdist ="paretoII",
                       margarg = list(scale = 1, shape = .3),
                       stcsarg = list(scfid = "weibull", tcfid = "weibull",
                       scfarg = list(scale = 20, shape = 0.7), 
                       tcfarg = list(scale = 1.1, shape = 0.8)))

2.3.2 Validating the Simulated Random Fields

The properties of the generated random fields can be checked with checkRF(). If the parameter method = "stat" (the default), the function returns the first three moments, and the probability zero at 20 grid points. If the parameter method = "statplot", checkRF() returns a multi-panel plot comparing the theoretical and empirical spatial correlation functions at 0, 1, and 2 time lags, the contour plot of the target STCS, and the empirical and target CDFs. If the parameter method = "fields", the function returns level plots of the selected number of simulated random fields (nfields) to visualize the temporal persistence of spatial patterns. If the parameter method = "movie", the fields are written to an animated GIF called “movieRF.gif” in the current working directory.

This example returns the stats, stat plots and fields for the first simulation, which used generateRF.

## check random fields
## CPU time: ~20s 
checkRF(RF = sim1, nfields = 9*9, method = "stat")
##                    mean   sd  skew   p0
## expected           0.71 2.69  9.89 0.80
## sample location 1  0.81 2.77  6.50 0.80
## sample location 2  0.88 2.99  6.05 0.79
## sample location 3  0.90 3.64  9.47 0.79
## sample location 4  0.88 3.53 10.27 0.79
## sample location 5  0.82 3.23 10.08 0.80
## sample location 6  0.80 3.02  8.37 0.79
## sample location 7  0.85 3.26  8.08 0.79
## sample location 8  0.90 4.40 16.86 0.80
## sample location 9  0.85 3.34  9.99 0.79
## sample location 10 0.85 3.43  9.70 0.79
## sample location 11 0.82 3.11  7.22 0.80
## sample location 12 0.84 3.32  7.31 0.78
## sample location 13 0.83 3.33  7.98 0.79
## sample location 14 0.86 3.43  7.75 0.80
## sample location 15 0.89 3.72  8.61 0.80
## sample location 16 0.85 3.64  9.52 0.80
## sample location 17 0.83 3.38  8.48 0.79
## sample location 18 0.86 3.71 10.57 0.80
## sample location 19 0.79 3.68 11.20 0.80
## sample location 20 0.84 3.95 11.40 0.80
checkRF(RF = sim1, nfields = 9*9, method = "statplot")

checkRF(RF = sim1, nfields = 9*9, method = "field")

2.3.3 Anisotropic Random Fields

The functions generateRF() and generateRFFast() allows for the simulation of spatially and temporally correlated anisotropic random fields by specifying the arguments anisotropyid and anisotropyarg when calculating the model parameters via fitVAR(). Three types of anisotropic effects are supported, namely: affine (rotation and stretching), and swirl-like, and “wavy” deformation (see the documentation of the function anisotropyT, Papalexiou, Serinaldi and Porcu (2021), and references therein for more details).

The example below reports the simulation of swirl-like random fields mimicking for instance the shape of atmospheric cyclones. Firstly, one can visualize the the transformed of the coordinate system

## specify a grid of coordinates
m = 30
aux <- seq(0, m - 1, length = m)
coord <- expand.grid(aux, aux)

## transform the coordinate system
at <- anisotropyTswirl(spacepoints = coord, x0 = floor(m / 2), y0 = floor(m / 2), 
                       b = 10, alpha = 1.8 * pi)

## visualize transformed coordinate system
  aux = data.frame(lon = at[ ,1], lat = at[ ,2], id1 = rep(1:m, each = m), id2 = rep(1:m, m))
  ggplot(aux, aes(x = lon, y = lat)) + 
    geom_path(aes(group = id1)) + geom_path(aes(group = id2)) + geom_point(col = 2) + theme_light()

Then, one can generate a set of swirl-like random fields with the desired spatiotemporal correlation structure and visualize them

## compute VAR model parameters
fit <- fitVAR(spacepoints = m, p = 1, margdist = 'burrXII', 
              margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.1, stcsid = "clayton", 
              stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2,
                        scfarg = list(scale = 2, shape = 0.7), tcfarg = list(scale = 20.1, shape = 0.8)),
              anisotropyid = "swirl", 
              anisotropyarg = list(x0 = floor(m / 2), y0 = floor(m / 2), b = 10, alpha = 1.8 * pi) )

## generate
set.seed(9)
sim3 <- generateRF(n=25, STmodel = fit)

## check
checkRF(RF = sim3, nfields = 5*5, method = "field")

2.3.4 Lagrangian Random Fields

CoSMoS implements advances in space-time simulation presented in Papalexiou, Serinaldi and Porcu (2021) that enable the simulation of Lagrangian random fields, i.e. random fields characterized by mass transport (advection). The motion is described by velocity vectors with specified orthogonal components u and v at each grid point. Motion vectors can be specified by setting the arguments advectionid and advectionarg of fitVAR function. Several types of advection fields are supported, but users can create their own following the structure of the existing functions (see the documentation of the function advectionF for more details).

The example below refers to the generation of random fields with affine anisotropy and uniform advection, mimicking rainfall storms moving north-easterly.

## compute VAR model parameters
fit <- fitVAR(spacepoints = 30, p = 1, margdist = 'burrXII', 
              margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.8, stcsid = "clayton", 
              stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2, 
                      scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 30.1, shape = 0.8)),
              anisotropyid = "affine", 
              anisotropyarg = list(phi1 = 2., phi2 = 0.5, phi12 = 0, theta = -pi/3),
              advectionid = "uniform", advectionarg = list(u = 2.5, v = 1.5) )

## generate
set.seed(10) # 1 # 5
sim3 <- generateRF(n=81, STmodel = fit)

## check
checkRF(RF = sim3, nfields = 9*9, method = "field")


3 Stochastic Simulation of Real-World Processes

In practice, it is often desired to simulate a natural process based on the properties of an observed time series. This section shows how this can easily be done using CoSMoS functions.

3.1 Hourly Precipitation

Precipitation can be easily simulated in CoSMoS from observed data. This is an example of a time series of observed hourly precipitation (units are hundredths of an inch).

data("precip")
quickTSPlot(precip$value)

To generate a synthetic time series with similar characteristics to the observed precipitation, you have to determine (1) the marginal distribution, (2) the autocorrelation structure, and (3) the probability zero for each individual month. This can be done by trying to fit various distributions and autocorrelation structures with analyzeTS() and then checking the goodness-of-fit with reportTS(), which can return the plots of the distribution (method = "dist"), ACS (method = "acs") or basic statistics (method = "stat").

## CPU time: ~75s 
precip_ggamma <- analyzeTS(TS = precip, season = "month", dist = "ggamma",
                           acsID = "weibull", lag.max = 12)

reportTS(aTS = precip_ggamma, method = "dist") + theme_light()

reportTS(aTS = precip_ggamma, method = "acs") + theme_light()

reportTS(aTS = precip_ggamma, method = "stat")
##            dist  scale shape1 shape2  error   acsID scale shape  mean    sd min
## month_1  ggamma  0.673  1.099  0.513 0.0112 weibull 4.562 0.736  4.82  6.63   1
## month_2  ggamma  0.895  1.048  0.578 0.0012 weibull 2.101 0.742  3.80  4.18   1
## month_3  ggamma  1.418  1.189  0.621 0.0035 weibull 3.116 0.692  5.56  5.87   1
## month_4  ggamma  0.369  1.131  0.443 0.0025 weibull 3.083 0.809  5.20  6.77   1
## month_5  ggamma  0.006  1.666  0.286 0.0070 weibull 1.733 0.775  6.16 10.10   1
## month_6  ggamma 15.737  0.484  1.033 0.0026 weibull 1.472 0.617  7.74  9.73   1
## month_7  ggamma  0.470  0.966  0.369 0.0138 weibull 0.939 0.469 13.28 23.47   1
## month_8  ggamma  0.103  1.274  0.336 0.0042 weibull 1.287 1.018 10.60 16.78   1
## month_9  ggamma  5.365  0.731  0.670 0.0067 weibull 0.581 0.505  8.42 11.40   1
## month_10 ggamma  2.119  0.817  0.565 0.0052 weibull 1.622 0.639  6.28  8.56   1
## month_11 ggamma  0.000  3.123  0.194 0.0030 weibull 2.347 0.950  5.32  7.02   1
## month_12 ggamma  2.973  0.932  0.750 0.0041 weibull 2.046 0.835  5.13  5.72   1
##          q.5% q.25% q.50% q.75% q.95% max skew.cs l.var l.skew l.kurt   p0
## month_1     1     1     3   6.0 16.00 100    6.44  1.88   0.50   0.26 0.91
## month_2     1     1     2   5.0 12.00  30    2.79  2.05   0.48   0.24 0.92
## month_3     1     2     4   8.0 16.00  53    2.96  2.05   0.39   0.19 0.90
## month_4     1     1     3   6.0 19.00  52    3.13  1.82   0.52   0.29 0.92
## month_5     1     1     3   7.0 20.35 141    6.51  1.70   0.55   0.33 0.92
## month_6     1     1     4  10.0 29.60  68    2.34  1.72   0.47   0.21 0.96
## month_7     1     2     5  14.0 48.40 150    3.52  1.49   0.60   0.37 0.95
## month_8     1     2     4  11.0 41.50 150    4.00  1.60   0.55   0.32 0.95
## month_9     1     2     4  11.0 28.35  98    3.50  1.72   0.48   0.24 0.94
## month_10    1     1     3   7.0 22.40  54    3.08  1.74   0.52   0.29 0.94
## month_11    1     2     3   6.0 18.00  67    3.79  1.85   0.51   0.30 0.93
## month_12    1     1     3   6.5 16.00  56    3.38  2.00   0.43   0.21 0.91
##          acs.l.2 acs.l.3 acs.l.4 acs.l.5
## month_1     0.73    0.57    0.49    0.41
## month_2     0.55    0.37    0.30    0.21
## month_3     0.64    0.48    0.36    0.29
## month_4     0.68    0.49    0.38    0.29
## month_5     0.53    0.32    0.21    0.13
## month_6     0.46    0.29    0.19    0.16
## month_7     0.36    0.27    0.14    0.12
## month_8     0.46    0.21    0.10    0.04
## month_9     0.28    0.14    0.09    0.06
## month_10    0.48    0.30    0.27    0.15
## month_11    0.64    0.44    0.26    0.18
## month_12    0.58    0.36    0.24    0.20

In this example, the Generalized Gamma distribution and the Weibull autocorrelation structure describe well the observed time series. Other combinations might not give such good results. For example, the Pareto II distribution and the Fractional Gaussian Noise ACS do not give good monthly fits:

precip_pareto <- analyzeTS(TS = precip, season = "month", dist = "paretoII", acsID = "fgn", lag.max = 12)

reportTS(aTS = precip_pareto, method = "dist")+ theme_light()

reportTS(aTS = precip_pareto, method = "acs") + theme_light()

Once you choose the marginal distribution and the autocorrelation structure you can produce as many and as long synthetic series you wish. The generated series will reproduce the distribution, correlation structure and probability zero (intermittency), using the simulateTS() function. This example uses the Generalized Gamma distribution and the Weibull autocorrelation structure, which were fitted above.

sim_precip <- simulateTS(aTS = precip_ggamma, from = as.POSIXct(x = "1978-12-01 00:00:00"),
                         to = as.POSIXct(x = "2008-12-01 00:00:00"))
dta <- precip
dta[, id := "observed"]
sim_precip[, id := "simulated"]
dta <- rbind(dta, sim_precip)

ggplot(data = dta) + geom_line(mapping = aes(x = date, y = value)) + 
  facet_wrap(facets = ~id, ncol = 1) + theme_light()

3.2 Daily streamflow

We can also apply the same methods to daily streamflows. For this example we have used the streamflow records of the Nassawango Creek near Snow Hill (Maryland, US; USGS Id 01485500) from the US Geological Survey (USGS) repository (https://waterdata.usgs.gov/nwis). In this case, the fitted distribution was log-normal, and the ACS was Pareto II.

## CPU time: ~240s
data("disch")

str <- analyzeTS(TS = disch, dist = "lnorm", norm = "N2", acsID = "paretoII", 
                 lag.max = 20, constrain = TRUE, season = "month")

reportTS(aTS = str) + theme_light()

reportTS(aTS = str, method = "stat")
##           dist meanlog sdlog  error    acsID scale shape mean   sd  min q.5%
## month_1  lnorm  -0.891 1.252 0.0617 paretoII 2.924 1.133 1.01 1.23 0.06 0.15
## month_2  lnorm  -0.676 1.074 0.0280 paretoII 3.060 2.009 0.99 1.15 0.07 0.19
## month_3  lnorm  -0.392 0.902 0.0054 paretoII 3.041 0.940 1.01 1.15 0.07 0.20
## month_4  lnorm  -0.289 0.808 0.0130 paretoII 3.343 0.963 1.01 1.08 0.09 0.20
## month_5  lnorm  -0.676 1.131 0.0451 paretoII 1.623 1.415 0.99 1.60 0.08 0.16
## month_6  lnorm  -1.610 1.691 0.1128 paretoII 2.915 0.806 1.03 2.38 0.03 0.10
## month_7  lnorm  -0.970 1.455 0.1397 paretoII 3.507 0.373 0.94 2.45 0.03 0.07
## month_8  lnorm  -0.886 1.466 0.1195 paretoII 5.202 0.257 1.02 2.75 0.02 0.04
## month_9  lnorm  -0.711 1.303 0.2421 paretoII 2.186 0.866 1.00 2.36 0.03 0.06
## month_10 lnorm  -0.734 1.321 0.0691 paretoII 2.024 0.824 1.02 2.23 0.03 0.08
## month_11 lnorm  -0.749 1.232 0.0059 paretoII 2.890 0.641 0.99 1.78 0.02 0.08
## month_12 lnorm  -0.483 0.997 0.0098 paretoII 2.108 1.662 1.00 1.25 0.04 0.10
##          q.25% q.50% q.75% q.95%   max skew.cs l.var l.skew l.kurt p0 acs.l.2
## month_1   0.35  0.69  1.19  3.16 25.30    6.59  2.07   0.43   0.29  0    0.86
## month_2   0.40  0.65  1.16  2.84 17.04    5.44  2.18   0.45   0.29  0    0.87
## month_3   0.39  0.63  1.15  3.02 13.08    3.80  2.10   0.47   0.30  0    0.83
## month_4   0.42  0.65  1.17  3.01 10.47    3.51  2.19   0.45   0.29  0    0.87
## month_5   0.34  0.54  0.99  3.02 20.87    5.94  1.86   0.57   0.42  0    0.80
## month_6   0.22  0.48  1.02  3.39 51.28   11.15  1.61   0.60   0.43  0    0.86
## month_7   0.12  0.24  0.71  3.70 45.19    7.46  1.37   0.71   0.52  0    0.87
## month_8   0.08  0.20  0.70  4.90 51.05    7.59  1.30   0.72   0.52  0    0.84
## month_9   0.13  0.30  0.85  3.93 34.93    6.35  1.41   0.67   0.48  0    0.82
## month_10  0.15  0.33  0.86  4.45 36.61    6.42  1.44   0.66   0.46  0    0.78
## month_11  0.20  0.44  1.05  3.48 26.73    6.13  1.60   0.57   0.37  0    0.87
## month_12  0.31  0.63  1.23  3.11 16.34    4.29  1.94   0.45   0.29  0    0.85
##          acs.l.3 acs.l.4 acs.l.5
## month_1     0.63    0.48    0.39
## month_2     0.67    0.55    0.50
## month_3     0.58    0.46    0.44
## month_4     0.64    0.47    0.35
## month_5     0.49    0.31    0.23
## month_6     0.61    0.41    0.32
## month_7     0.64    0.47    0.36
## month_8     0.59    0.44    0.42
## month_9     0.54    0.34    0.23
## month_10    0.50    0.32    0.25
## month_11    0.63    0.44    0.32
## month_12    0.61    0.44    0.34

Given the fitted values, streamflows can be simulated as

sim_str <- simulateTS(aTS = str)

dta <- disch
dta[, id := "observed"]
sim_str[, id := "simulated"]
dta <- rbind(dta, sim_str)

ggplot(data = dta) + geom_line(mapping = aes(x = date, y = value)) + 
  facet_wrap(facets = ~id, ncol = 1) + theme_light()

4 Recommended and Supported Distributions in CoSMoS

4.2 Supported Distribution in CoSMoS

CoSMoS supports most distributions available in R with defined quantile, distribution, and probability density functions. Additionally, we provide four flexible distributions that we highly recommend.

Burr type III

distribution argument/name - burrIII
parameters - list(scale, shape1, shape2)

\[f_{\text{BrIII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}-1} \left(\gamma_1^{-1} \left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}}+1\right) ^{-\gamma_1 \gamma_2-1}}{\beta } \\ F_{\text{BrIII}}(x)= \left(\gamma_1^{-1} \left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}}+1\right) ^{-\gamma _1 \gamma _2} \\ Q_{\text{BrIII}}(u)=\beta \left(\gamma _1 \left(u^{-\frac{1}{\gamma _1 \gamma _2}}-1\right)\right)^{-\gamma _2} \\ m_{\text{BrIII}}(q)=\frac{\beta ^q \gamma _1^{\gamma _2 (-q)} \Gamma \left(\left(q+\gamma _1\right) \gamma _2\right) \Gamma \left(1-q \gamma _2\right)}{\Gamma \left(\gamma _1 \gamma _2\right)}\]

Burr type XII

distribution argument/name - burrXII
parameters - list(scale, shape1, shape2)

\[f_{\text{BrXII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{\gamma _1-1} \left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right)^{-\frac{1}{\gamma _1 \gamma _2}-1}}{\beta } \\ F_{\text{BrXII}}(x)=1-\left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right)^{-\frac{1}{\gamma _1 \gamma _2}} \\ Q_{\text{BrXII}}(u)=\beta \left(-\frac{1-(1-u)^{-\gamma _1 \gamma _2}}{\gamma _2}\right)^{\frac{1}{\gamma _1}} \\ m_{\text{BrXII}}(q)=\frac{\beta ^q \gamma _2^{-\frac{q}{\gamma _1}-1} B\left(\frac{q+\gamma _1}{\gamma _1},\frac{1-q \gamma _2}{\gamma _1 \gamma _2}\right)}{\gamma _1}\]

Generalized Gamma

distribution argument/name - ggamma
parameters - list(scale, shape1, shape2)

\[f_{\mathcal{G}\mathcal{G}}(x)=\frac{\gamma _2 e^{-\left(\frac{x}{\beta }\right)^{\gamma _2}} \left(\frac{x}{\beta }\right)^{\gamma _1-1}}{\beta \Gamma \left(\frac{\gamma _1}{\gamma _2}\right)} \\ F_{\mathcal{G}\mathcal{G}}(x)=Q\left(\frac{\gamma _1}{\gamma _2},0,\left(\frac{x}{\beta }\right)^{\gamma _2}\right) \\ Q_{\mathcal{G}\mathcal{G}}(u)=\beta Q^{-1}\left(\frac{\gamma _1}{\gamma _2},0,u\right)^{\frac{1}{\gamma _2}} \\ m_{\mathcal{G}\mathcal{G}}(q)=\frac{\beta ^q \Gamma \left(\frac{q}{\gamma _2}+\frac{\gamma _1}{\gamma _2}\right)}{\Gamma \left(\frac{\gamma _1}{\gamma _2}\right)}\]

Pareto II (also known as Lomax)

distribution argument/name - paretoII
parameters - list(scale, shape)

\[f_{\text{PII}}(x)=\frac{\left(\frac{\gamma x}{\beta }+1\right)^{-\frac{1}{\gamma }-1}}{\beta } \\ F_{\text{PII}}(x)=1-\left(\frac{\gamma x}{\beta }+1\right)^{-1/\gamma } \\ Q_{\text{PII}}(u)=\frac{\beta \left((1-u)^{-\gamma }-1\right)}{\gamma } \\ m_{\text{PII}}(q)=\frac{\Gamma (q+1) \left(\frac{\beta }{\gamma }\right)^q \Gamma \left(\frac{1}{\gamma }-q\right)}{\Gamma \left(\frac{1}{\gamma }\right)}\]

5 Historical Note

The genesis of CoSMoS dates from 2009 when Simon Michael Papalexiou developed the core of its methods while at National Technical University of Athens (NTUA), Greece, and applied them initially for multivariate rainfall modeling for the needs of an internal project at NTUA. Simon soon invited a few “colleagues” to build a start-up company, named Oktana, to develop stochastic simulation software. The details were described in a legal document (Papalexiou, 2010) as a first step to create proprietary software, although Oktana was never created.

In 2015, Simon with Yannis Markonis reconsidered the idea of a start-up company participating in the Climate-KIC competition. In 2017 Simon moved to California and Yannis to Prague, and they started discussing the release of CoSMoS as open-source software. The methods were published in arXiv on July 21, 2017 (Papalexiou, 2017) and few months later in Advances in Water Resources (Papalexiou, 2018), followed by a publication on disaggregation (DiPMaC) in WRR (Papalexiou et al., 2018). In the meanwhile, Filip Strnad joined the team in 2018 and coded the first version of CoSMoS in R, as the original code was developed by Simon in Mathematica.

Francesco Serinaldi joined the team in 2019 initially to co-author a paper on random fields with Simon which was published in WRR in January 2020 (Papalexiou and Serinaldi, 2020). Francesco developed the R code of CoSMoS version 2, collaborating with Filip and Kevin, and extending its functionalities to include the methods of the previously mentioned paper, that is, enabling the simulation of random fields and multivariate time series. The last addition to our team is Kevin Shook who has become the package maintainer and offers advice on R.

The latest version of CoSMoS implements major advances in space time modeling introduced in Papalexiou, Serinaldi and Porcu (2021) that allow realistic simulations of hydro-environmental fluxes, including rainfall storms, fields resembling weather cyclones, fields converging to (or diverging from) a point, colliding air masses, and more.

We are serene to see this ten-year Odyssey finally back on track to Ithaca. Leaving the burdens of the past behind, empowered by a great team and standing for dignity and academic integrity, we will keep striving to offer free, innovative, and reliable software for stochastic modelling. As a closing remark, we thank our imitators, after all, as Oscar Wilde said: “Imitation is the sincerest form of flattery that mediocrity can pay to greatness”.

6 References

  1. Papalexiou, S.M. (2018). Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency. Advances in Water Resources, 115, 234-252, link
  2. Papalexiou, S.M., Markonis, Y., Lombardo, F., AghaKouchak, A., Foufoula-Georgiou, E. (2018). Precise Temporal Disaggregation Preserving Marginals and Correlations (DiPMaC) for Stationary and Nonstationary Processes. Water Resources Research, 54(10), 7435-7458, link
  3. Papalexiou, S.M., Serinaldi, F. (2020). Random Fields Simplified: Preserving Marginal Distributions, Correlations, and Intermittency, With Applications From Rainfall to Humidity. Water Resources Research, 56(2), e2019WR026331, link
  4. Papalexiou, S.M., Serinaldi, F., Porcu, E. (2021). Advancing Space-Time Simulation of Random Fields: From Storms to Cyclones and Beyond. Water Resources Research, 57, e2020WR029466, link
  5. Serinaldi, F., Kilsby, C.G. (2018). Unsurprising Surprises: The Frequency of Record-breaking and Overthreshold Hydrological Extremes Under Spatial and Temporal Dependence. Water Resources Research, 54(9), 6460-6487, link