Abstract
In a time of data abundance, automatic methods increasingly support manual modeling. To this end, the Sparse Identification of Non-linear Dynamics (SINDy) provides a solid foundation for identifying non-linear dynamical systems in the form of differential equations. In biochemistry, reaction networks imply coupled differential equations. It has recently been demonstrated how this intrinsic coupling can be achieved within the SINDy framework, providing a straightforward interpretation of the learned equations as reaction systems with mass-action kinetics. However, this extension inherits from SINDy the requirement to enumerate all candidate reactions in a library, resulting in ill-posed optimization problems and long model descriptions, limiting its utility for identifying models with many species. Here, we elaborate on the recent advances in bringing SINDy to the biochemical domain by considering the sub-sampling of reaction libraries as part of an evolutionary optimization scheme. This enables the generation of parsimonious models, as well as the inclusion of model-level constraints, and allows the consideration of large numbers of candidate reactions. We evaluate the approach on two smaller case studies and the recovery of a large Wnt signaling model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmed, Y., Telmer, C., Miskov-Zivanov, N.: Accordion: Clustering and selecting relevant data for guided network extension and query answering. arXiv preprint arXiv:2002.05748 (2020). https://doi.org/10.48550/arXiv.2002.05748
Askari, E., Crevecoeur, G.: Evolutionary sparse data-driven discovery of multibody system dynamics. Multibody Syst. Dyn. 58, 197–226 (2023). https://doi.org/10.1007/s11044-023-09901-z
Bortolussi, L., Cairoli, F., Klein, J., Petrov, T.: Data-driven inference of chemical reaction networks via graph-based variational autoencoders, pp. 143–147. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-43835-6_10
Boutillier, P.: The kappa platform for rule-based modeling. Bioinformatics 34(13), i583–i592 (2018). https://doi.org/10.1093/bioinformatics/bty272
Bro, R., De Jong, S.: A fast non-negativity-constrained least squares algorithm. J. Chemom. 11(5), 393–401 (1997). https://doi.org/10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L
Brummer, A.B., et al.: Data driven model discovery and interpretation for car t-cell killing using sparse identification and latent variables. Front. Immunol. 14 (2023). https://doi.org/10.3389/fimmu.2023.1115536
Brunton, S.L., Proctor, J.L., Kutz, J.N.: Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 113, 3932–3937 (2016). https://doi.org/10.1073/pnas.1517384113
Burrage, P.M., Weerasinghe, H.N., Burrage, K.: Using a library of chemical reactions to fit systems of ordinary differential equations to agent-based models: a machine learning approach. Numer. Algor. (2024). https://doi.org/10.1007/s11075-023-01737-0
Craciun, G., Pantea, C.: Identifiability of chemical reaction networks. J. Math. Chem. 44(1), 244–259 (2008). https://doi.org/10.1007/s10910-007-9307-x
Daniels, B.C., Nemenman, I.: Automated adaptive inference of phenomenological dynamical models. Nat. Commun. 6 (2015). https://doi.org/10.1038/ncomms9133
Faeder, J.R., Blinov, M.L., Hlavacek, W.S.: Rule-based modeling of biochemical systems with bionetgen. In: Systems Biology, pp. 113–167. Springer, Heidelberg (2009). https://doi.org/10.1007/978-1-59745-525-1_5
Fasel, U., Kutz, J.N., Brunton, B.W., Brunton, S.L.: Ensemble-sindy: robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Proc. Royal Soc. A: Math. Phys. Eng. Sci. 478 (2022). https://doi.org/10.1098/rspa.2021.0904
Großmann, G., Zimmerlin, J., Backenköhler, M., Wolf, V.: Unsupervised relational inference using masked reconstruction. Appl. Netw. Sci. 8(1), 18 (2023). https://doi.org/10.1007/s41109-023-00542-x
Haack, F., Lemcke, H., Ewald, R., Rharass, T., Uhrmacher, A.M.: Spatio-temporal model of endogenous ros and raft-dependent wnt/beta-catenin signaling driving cell fate commitment in human neural progenitor cells. PLoS Comput. Biol. 11(3), 1–28 (2015). https://doi.org/10.1371/journal.pcbi.1004106
Helms, T., Warnke, T., Maus, C., Uhrmacher, A.M.: Semantics and efficient simulation algorithms of an expressive multilevel modeling language. ACM Trans. Model. Comput. Simul. (TOMACS) 27(2), 1–25 (2017). https://doi.org/10.1145/2998499
Keating, S.M., et al.: Sbml level 3: an extensible format for the exchange and reuse of biological models. Molec. Syst. Biol. 16(8), e9110 (2020). https://doi.org/10.15252/msb.20199110
Klimovskaia, A., Ganscha, S., Claassen, M.: Sparse regression based structure learning of stochastic reaction networks from single cell snapshot time series. PLoS Comput. Biol. 12, e1005234 (2016). https://doi.org/10.1371/journal.pcbi.1005234
Koza, J.R., Mydlowec, W., Lanza, G., Yu, J., Keane, M.A.: Reverse Engineering of Metabolic Pathways From Observed Data Using Genetic Programming, pp. 434–445. World Scientific (2000). https://doi.org/10.1142/9789814447362_0043
Kozin, F., Natke, H.: System identification techniques. Struct. Saf. 3(3–4), 269–316 (1986). https://doi.org/10.1016/0167-4730(86)90006-8
Kramer, O.: Genetic Algorithms. In: Genetic Algorithm Essentials. SCI, vol. 679, pp. 11–19. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52156-5_2
Lee, E., Salic, A., Krüger, R., Heinrich, R., Kirschner, M.W.: The roles of apc and axin derived from experimental and theoretical analysis of the wnt pathway. PLoS Biol. 1(1), e10 (2003). https://doi.org/10.1371/journal.pbio.0000010
Mangan, N.M., Brunton, S.L., Proctor, J.L., Kutz, J.N.: Inferring biological networks by sparse identification of nonlinear dynamics. IEEE Trans. Molec. Biol. Multi-Scale Commun. 2, 52–63 (2016). https://doi.org/10.1109/tmbmc.2016.2633265
Manzi, M., Vasile, M.: Orbital anomaly reconstruction using deep symbolic regression. In: 71st International Astronautical Congress, IAC 2020 (2020)
Martinelli, J., Grignard, J., Soliman, S., Ballesta, A., Fages, F.: Reactmine: a statistical search algorithm for inferring chemical reactions from time series data. arXiv preprint arXiv:2209.03185v2 (2022)
Milgroom, M.G.: Epidemiology and SIR Models, pp. 253–268. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-38941-2_16
Nobile, M.S., Besozzi, D., Cazzaniga, P., Pescini, D., Mauri, G.: Reverse engineering of kinetic reaction networks by means of cartesian genetic programming and particle swarm optimization. In: 2013 IEEE Congress on Evolutionary Computation (CEC). IEEE ( 2013).https://doi.org/10.1109/cec.2013.6557752
NumPy team an contributors: Numpy. Version 1.24.3 (2023). https://numpy.org/
Parker, M., Kamenev, A.: Extinction in the lotka-volterra model. Phys. Rev. E 80, 021129 (2009). https://doi.org/10.1103/PhysRevE.80.021129
Petzold, L.: Automatic selection of methods for solving stiff and nonstiff systems of ordinary differential equations. SIAM J. Sci. Stat. Comput. 4(1), 136–148 (1983). https://doi.org/10.1137/0904010
Rackauckas, C., et al.: Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385v4 (2020). https://doi.org/10.48550/arXiv.2001.04385
SciPy team and contributors: Scipy. Verison 1.10.1 (2023). https://scipy.org/
Soliman, S., Heiner, M.: A unique transformation from ordinary differential equations to reaction networks. PLoS ONE 5(12), e14284 (2010). https://doi.org/10.1371/journal.pone.0014284
Spitzer, M.H., Nolan, G.P.: Mass cytometry: single cells, many features. Cell 165(4), 780–791 (2016). https://doi.org/10.1016/j.cell.2016.04.019
Staehlke, S., et al.: Ros dependent wnt/\(\beta \)-catenin pathway and its regulation on defined micro-pillars-a combined in vitro and in silico study. Cells 9(8) (2020). https://doi.org/10.3390/cells9081784
Székely, T., Burrage, K.: Stochastic simulation in systems biology. Comput. Struct. Biotechnol. J. 12(20), 14–25 (2014). https://doi.org/10.1016/j.csbj.2014.10.003
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Series B (Methodological) 58(1), 267–288 (1996). http://www.jstor.org/stable/2346178
Acknowledgments
JNK and AU acknowledge the funding of the DFG Project GrEASE (grant number 320435134). KB acknowledges the funding of the ARC Centre of Excellence for Plant Success in Nature and Agriculture CE 200100015. The authors thank Fiete Haack for many helpful discussions, particularly over his model of the Wnt pathway, and his feedback on the evaluation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Complete List of Experiment (Hyper-)parameters
(See Table 1).
B Learned Model’s Trajectories for the Wnt Pathway
Results from simulating the models learned with evolib for the Wnt pathway. The + symbols mark measurement points, and the lines are the trajectories simulated for the 19 species. Note that, for clarity, we omit the labeling of species, and only every second measurement point is shown. The integration of the (large) models produced by c-SINDy resulted in errors due to numerical problems.
C Learned Models for the Wnt Pathway
The models inferred in the Wnt case study (unconstrained) compared to the ground truth model. For the extension, the reactions above the line are fixed. Only reactions with a rate above \(10^{-6}\) are shown, and if applicable the number of excluded reactions is shown in the lower left. Bolded reactions indicate an overlap with the ground truth reactions. Note in particular how in (a) the Axin-induced degradation of \(\beta \)-catenin and in (b) the synthesis of Ros was recovered, which are both central components of the ground truth model of the Wnt pathway.
The models inferred in the Wnt case study (constrained) compared to the ground truth model. For the extension, the reactions above the line are fixed. Only reactions with a rate above \(10^{-6}\) are shown, and if applicable, the number of excluded reactions is shown in the lower left. Bolded reactions indicate an overlap with the ground truth reactions. Note in particular how in (a) the shuttling of \(\beta \)-catenin in and out of the nucleus and in (b) the production of TCF was recovered.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kreikemeyer, J.N., Burrage, K., Uhrmacher, A.M. (2024). Discovering Biochemical Reaction Models by Evolving Libraries. In: Gori, R., Milazzo, P., Tribastone, M. (eds) Computational Methods in Systems Biology. CMSB 2024. Lecture Notes in Computer Science(), vol 14971. Springer, Cham. https://doi.org/10.1007/978-3-031-71671-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-71671-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71670-6
Online ISBN: 978-3-031-71671-3
eBook Packages: Computer ScienceComputer Science (R0)