Metabolic Flux Dynamics in Upper Glycolysis with dynafluxr

2025-06-27

library(dynafluxr)
#> Loading required package: bspline
#> Loading required package: nlsic
#> Loading required package: nnls
#> Loading required package: dotty
#pkgload::load_all("~/projs/dynafluxr/dev/dynafluxr")

Introduction

This vignette shows how dynafluxr R package can be used to unravel metabolic flux dynamics from metabolite concentration time courses and a stoichiometric model. No regulation model (like Michaelis-Menten) is required for this task. Our method is heavily relying on B-splines as implemented in bspline package. We show how to prepare measurement data, edit stoichiometric reactions, use various package options and interpret/exploit the results. Even if graphical user interface (GUI) cans be called and used with function gui(), this vignette is focused on command line interface (CLI) which is more suitable for reproducible research and automation. The CLI can be used from R or from a shell environment.

The data used for this vignette were obtained by NMR measurements on MetaToul-FluxoMet platform, Toulouse Biotechnology Institute (TBI), Toulouse, France. Credits for making these data available go to Svetlana Dubiley, Pauline Rouane, Cyril Charlier, Guy Lippens and Pierre Millard. Pierre Millard has also shared stoichiometric model and supervised the whole project. All cited persons are (or were at the moment of the project) from TBI. The case presented in this vignette is also presented in an article entitled “Estimating flux dynamics from metabolite concentration time courses with dynafluxr” by Serguei Sokol, Svetlana Dubiley, Pauline Rouane, Cyril Charlier, Guy Lippens, and Pierre Millard.

Data preparations

Data must be stored in a .tsv (Tabulation Separated Values) file (.tsv extension is not mandatory, it can be .txt, .csv, whatever, but the field separator must be the tabulation character), one column per chemical specie. The first row describes column names. Time points, at which measurements are done, must be stored in a column whose name starts with ‘Time’. No specie name can start with ‘Time’. Decimal separator must be a point . character. A head of data file used as an example is looking like

fmeas=system.file("dataglyco/data.tsv", package="dynafluxr")
cat(head(readLines(fmeas)), sep="\n")
#> Time	GLC	G6P	F6P	FBP
#> 10	5.539973094	0.511379021	0.019053081	0.227663976
#> 10.66666667	5.463888781	0.552186749	0.015203465	0.256477013
#> 11.33333333	5.45243488	0.620074144	0.020073931	0.221628305
#> 12	5.386348531	0.665020538	0.026775166	0.261749199
#> 12.66666667	5.409419851	0.711166169	0.020894807	0.202719737

The raw data may be difficult to read because of tab misalignment and row wrapping. Here are parsed data:

mf=read.delim(fmeas, comment.char="#")
print(head(mf))
#>       Time      GLC       G6P        F6P       FBP
#> 1 10.00000 5.539973 0.5113790 0.01905308 0.2276640
#> 2 10.66667 5.463889 0.5521867 0.01520346 0.2564770
#> 3 11.33333 5.452435 0.6200741 0.02007393 0.2216283
#> 4 12.00000 5.386349 0.6650205 0.02677517 0.2617492
#> 5 12.66667 5.409420 0.7111662 0.02089481 0.2027197
#> 6 13.33333 5.393365 0.7319101 0.02205172 0.1638058

Stoichiometric model preparation

The reactions are written as in following example:

fsto=system.file("dataglyco/network.txt", package="dynafluxr")
cat(readLines(fsto), sep="\n")
#> HXK	GLC + ATP -> G6P + ADP
#> PGI	G6P -> F6P
#> PFK	F6P + ATP -> FBP + ADP

The first field separated by tabulation is a reaction name then reaction itself where species are separated by “ + ” sign and possibly preceded by a stoichiometric coefficient with “ * ” symbol, e.g. 2*NAD. The symbol “*” is manadtory as a metabolite name can start with a number, e.g. 3PG is a molecule of 3-phosphoglycerate and not 3 * PG. The reaction can also be a degradation or washing out one, e.g. A -> which means that A is degraded to nothing or transported outside of the considered system. The reaction can also be a synthesis one, e.g. -> A which means that A is synthesized from nothing or brought from external environment.

We start by gathering example files in a fresh temporary created working directory (it will no longer exist after ending the R session):

ddir=tempfile(pattern="glyco")
dir.create(ddir)
file.copy(c(fmeas, fsto), ddir)
#> [1] TRUE TRUE

Data exploring

Let fit available data with B-splines of order 4 and with 5 internal knots (default in dynafluxr), then plot them to have an idea how data looks like:

fmeas=file.path(ddir, "data.tsv")
fsto=file.path(ddir, "network.txt")
mf=read.delim(fmeas, comment.char="#")
nm=colnames(mf)[-1L]
msp=fitsmbsp(mf$Time, mf[, nm], n=4, nki=5)
matplot(mf$Time, msp(mf$Time), t="l", ylab="Concentration [mM]", xlab="Time [min]", lwd=2)
coltr=apply(col2rgb(1:6)/255, 2, function(v) do.call(rgb, c(as.list(v), alpha=0.25)))
matpoints(mf$Time, mf[, nm], pch="o", cex=0.75, col=coltr)
legend("topright", legend=nm, lty=1:5, col=1:6, cex=0.75, lwd=2)

plot of chunk unnamed-chunk-6

Data treatment can make that measured concentration are getting negative.

As concentrations cannot get negative by definition, dynafluxr constraints B-splines to non negative values. Another non desirable effect can be that consumed metabolite is sometimes increasing which has no physical meaning. In our case, Glucose is only consumed so in dynafluxr, it can be signaled in options --decresing while --increasing is reserved for accumulated products like FBP. All options can be abbreviated till unambiguous prefix, e.g. --decr for --decreasing.

We can see also in the previous plot that FBP has non zero values at the beginning of the experiment. To skip these nonphysical data, we can use an option --skip which indicates the number of first time points to skip. In our case, we will skip 24 first time points.

First estimate of fluxes with dynafluxr

We try dynafluxr::cli() (Command Line Interface) function with minimal options, just indicating measurement data, stoichiometric model and, as seen before, that GLC/FBP must be monotonously decreasing/increasing respectively and that 24 first time points must be skipped. The cli() function is the main entry point to dynafluxr package and it can be used from R or from a shell environment. It is a wrapper around dynafluxr::fdyn() function which does the real work of flux estimation.

res=cli(c("-m", fmeas, "-s", fsto, "--decr", "GLC", "--incr", "FBP", "--skip", "24"))

The same command could be run from a shell (not R) environment if variables $fmeas and $fsto are defined. In this case it would look like:

$ Rscript -e "dynafluxr::cli()" -m $fmeas -s $fsto --decr GLC --incr FBP --skip 24

The cli() function accepts a vector of command line arguments. The first argument is -m or --meas which indicates the measurement data file, then -s or --stoich for stoichiometric model file, then options for decreasing and increasing species, skipping time points and so on. The result of the function is a list with several elements. The full list of available options can be viewed with dynafluxr::cli("-h") command. The result of cli() function (in case of no error in run-time), is a series of data and pdf files written in a directory glyco/data which is the base name of measurement data file:

list.files("glyco/data/")
#> character(0)

The file Readme.md describes the content of the result directory. These files can be explored by system tools like pdf-viewers and spreadsheet software but for needs of this vignette we will reproduce some results by R means. As we won’t need result files in this demo page, we’ll cancel their writing with empty option -o "" which normally indicates result directory/archive name if it is different from the default one:

res=cli(c("-m", fmeas, "-s", fsto, "--decr", "GLC", "--incr", "FBP", "--skip", "24", "-o", ""))

Let now glance on fit quality, i.e. results of \(\chi^2\)-test:

print(res$chi2tab)
#>             rss      var_ref   df      chi2          pval
#> GLC   0.2140855 3.282197e-04  990  652.2628  1.000000e+00
#> G6P   0.3940253 2.982900e-04  990 1320.9474  7.145778e-12
#> F6P   0.1586435 3.712298e-05  990 4273.4582  0.000000e+00
#> FBP   2.7157317 2.610422e-03  990 1040.3420  1.297346e-01
#> Total 3.4824861 3.274055e-03 3960 7287.0105 9.723348e-201

We can see that \(p\)-values are almost 0 for G6P and FBP which usually means a poor fit quality. Let see what is going on by plotting integrated species with measurements:

nm=colnames(res$mf)[-1]
matplot(res$tpp, res$isp(res$tpp, nm), t="l", xlab="Time [min]", ylab="Concentration [mM]", lwd=2)
matpoints(res$tp, res$mf[,nm], pch="o", cex=0.5, col=coltr)
legend("topright", legend=nm, lty=1:5, col=1:6, cex=0.75, lwd=2)

plot of chunk unnamed-chunk-11

For a human eye is does not look bad but for a computer, it is not good enough. The problem is that the uses underestimated reference variance visible in the column RefVar of the \(\chi^2\)-test table.

Getting fit better

The reference variance is used to estimate the noise in measurements and it is used to weight the residuals in least squares. If it is underestimated, the residuals are too large and the fit is considered as poor. The reference variance can be estimated automatically like it was done till now. But It can also be set to a some more realistic value manually. In our case, we can set it to from 0.05 mM$^2$ to 0.1 mM$^2$ which are reasonable values for NMR measurements. This can be done with --sderr option: --sderr=GLC=0.05,G6P=0.05,FBP=0.1,F6P=0.05

From experiment setup, we know that FBP and F6P are not measured in absolute concentration as there is a doubt about proton attribution in the measured signal. Let apply scaling factors to FBP and F6P:

res=cli(c("-m", fmeas, "-s", fsto, "--decr", "GLC", "--incr", "FBP", "--skip", "24", "--sf=FBP,F6P", "-o", "", "--sderr=GLC=0.05,G6P=0.05,FBP=0.1,F6P=0.05"))
print(res$chi2tab)
#>             rss     var_ref   df      chi2 pval
#> GLC   0.2339998 0.002500000  990  93.59992    1
#> G6P   0.3708691 0.002500000  990 148.34765    1
#> F6P   0.1402564 0.002821918  990  49.70250    1
#> FBP   2.6265546 0.009745498  990 269.51466    1
#> Total 3.3716799 0.017567416 3960 561.16473    1

Now, all \(p\)-values are well above 0.05 usual threshold.

Exploring results

We are ready to see the main result: estimated reaction rates.

matplot(res$tpp, res$vsp(res$tpp), t="l", ylab="Rate [1/min]", xlab="Time [min]", lwd=2)
legend("topright", colnames(bsppar(res$vsp)$qw), lty=1:5, col=1:6, cex=0.75, lwd=2)

plot of chunk unnamed-chunk-13

Having estimated rates, we can now explore details of metabolic fluxes. Not only we have the total flux \(\frac{dM_i}{dt}\) but also all its components coming from involved reactions \(S_{ij}v_j\). For G6P, it gives:

nm="G6P"
jnz=names(which(res$stofull[nm,] != 0)) # non-zero coeffs, i.e. reactions involved in G6P mass balance
fl=t(res$stofull[nm,jnz]*t(res$vsp(res$tpp, jnz))) # each rate v_j is multiplied by S_ij
fl=cbind(res$fsp(res$tpp, nm), fl) # add total flux
matplot(res$tpp, fl, t="l", ylab="Flux [mM/min]", xlab="Time [min]", main="G6P flux components", lwd=2)
abline(h=0)
legend("topright", legend=c("Total", jnz), lty=1:5, col=1:6, lwd=2)

plot of chunk unnamed-chunk-14

Examples of plotting B-spline curves with smooth error intervals are given in a notebook available at

Conclusion

In this vignette, we have shown a step-by-step procedure for reaction rate estimation from metabolite concentration time courses and a stoichiometric model. We have seen how package parameters can be chosen, how fit quality can be evaluated and how data can be presented and exploited. Based on this vignette, user should be ready to start working with his own data set and stoichiometric model.