Abstract
With a view towards molecular communication systems and molecular multi-agent systems, we propose the Chemical Baum-Welch Algorithm, a novel reaction network scheme that learns parameters for Hidden Markov Models (HMMs). Each reaction in our scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every fixed point of the Baum-Welch algorithm for HMMs is a fixed point of our reaction network scheme, and every positive fixed point of our scheme is a fixed point of the Baum-Welch algorithm. We prove that the “Expectation” step and the “Maximization” step of our reaction network separately converge exponentially fast. We simulate mass-action kinetics for our network on an example sequence, and show that it learns the same parameters for the HMM as the Baum-Welch algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Soloveichik, D., Seelig, G., Winfree, E.: DNA as a universal substrate for chemical kinetics. Proc. Natl. Acad. Sci. 107(12), 5393–5398 (2010)
Srinivas, N.: Programming chemical kinetics: engineering dynamic reaction networks with DNA strand displacement. Ph.D. thesis, California Institute of Technology (2015)
Qian, L., Soloveichik, D., Winfree, E.: Efficient turing-universal computation with DNA polymers. In: Sakakibara, Y., Mi, Y. (eds.) DNA 2010. LNCS, vol. 6518, pp. 123–140. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18305-8_12
Cardelli, L.: Strand algebras for DNA computing. Nat. Comput. 10, 407–428 (2011)
Lakin, M.R., Youssef, S., Cardelli, L., Phillips, A.: Abstractions for DNA circuit design. J. R. Soc. Interface 9(68), 470–486 (2011)
Cardelli, L.: Two-domain DNA strand displacement. Math. Struct. Comput. Sci. 23, 02 (2013)
Chen, Y.-J., et al.: Programmable chemical controllers made from DNA. Nat. Nanotechnol. 8(10), 755–762 (2013)
Lakin, M.R., Stefanovic, D., Phillips, A.: Modular verification of chemical reaction network encodings via serializability analysis. Theor. Comput. Sci. 632, 21–42 (2016)
Srinivas, N., Parkin, J., Seelig, G., Winfree, E., Soloveichik, D.: Enzyme-free nucleic acid dynamical systems. Science 358, 6369 (2017)
Cherry, K.M., Qian, L.: Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks. Nature 559, 7714 (2018)
Zechner, C., Seelig, G., Rullan, M., Khammash, M.: Molecular circuits for dynamic noise filtering. Proc. Natl. Acad. Sci. 113(17), 4729–4734 (2016)
Badelt, S., Shin, S.W., Johnson, R.F., Dong, Q., Thachuk, C., Winfree, E.: A general-purpose CRN-to-DSD compiler with formal verification, optimization, and simulation capabilities. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 232–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_15
Lakin, M.R., Youssef, S., Polo, F., Emmott, S., Phillips, A.: Visual DSD: a design and analysis tool for DNA strand displacement systems. Bioinformatics 27(22), 3211–3213 (2011)
Hjelmfelt, A., Weinberger, E.D., Ross, J.: Chemical implementation of neural networks and Turing machines. Proc. Natl. Acad. Sci. 88(24), 10983–10987 (1991)
Buisman, H.J., ten Eikelder, H.M.M., Hilbers, P.A.J., Liekens, A.M.L.: Computing algebraic functions with biochemical reaction networks. Artif. Life 15(1), 5–19 (2009)
Oishi, K., Klavins, E.: Biomolecular implementation of linear I/O systems. IET Syst. Biol. 5(4), 252–260 (2011)
Soloveichik, D., Cook, M., Winfree, E., Bruck, J.: Computation with finite stochastic chemical reaction networks. Nat. Comput. 7(4), 615–633 (2008)
Chen, H.-L., Doty, D., Soloveichik, D.: Deterministic function computation with chemical reaction networks. Nat. Comput. 13(4), 517–534 (2014)
Qian, L., Winfree, E.: Scaling up digital circuit computation with DNA strand displacement cascades. Science 332(6034), 1196–1201 (2011)
Napp, N.E., Adams, R.P.: Message passing inference with chemical reaction networks. In: Advances in Neural Information Processing Systems, pp. 2247–2255 (2013)
Qian, L., Winfree, E., Bruck, J.: Neural network computation with DNA strand displacement cascades. Nature 475(7356), 368–372 (2011)
Cardelli, L., Kwiatkowska, M., Whitby, M.: Chemical reaction network designs for asynchronous logic circuits. Nat. Comput. 17(1), 109–130 (2018)
Gopalkrishnan, M.: A scheme for molecular computation of maximum likelihood estimators for log-linear models. In: Rondelez, Y., Woods, D. (eds.) DNA 2016. LNCS, vol. 9818, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43994-5_1
Virinchi, M.V., Behera, A., Gopalkrishnan, M.: A stochastic molecular scheme for an artificial cell to infer its environment from partial observations. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 82–97. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_6
Viswa Virinchi, M., Behera, A., Gopalkrishnan, M.: A reaction network scheme which implements the EM algorithm. In: Doty, D., Dietz, H. (eds.) DNA 2018. LNCS, vol. 11145, pp. 189–207. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00030-1_12
Amari, S.: Information Geometry and Its Applications. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8
Csiszár, I., Matus, F.: Information projections revisited. IEEE Trans. Inf. Theor. 49(6), 1474–1490 (2003)
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 4 (1957)
Shin, J.-S., Pierce, N.A.: A synthetic DNA walker for molecular transport. J. Am. Chem. Soc. 126(35), 10834–10835 (2004)
Reif, J.: The design of autonomous DNA nano-mechanical devices: walking and rolling DNA. In: DNA Computing, pp. 439–461 (2003)
Sherman, W., Seeman, N.: A precisely controlled DNA biped walking device. Nano Lett. 4, 1203–1207 (2004)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)
Feinberg, M.: On chemical kinetics of a certain class. Arch. Rational Mech. Anal 46, 1–41 (1972)
Horn, F.J.M.: Necessary and sufficient conditions for complex balancing in chemical kinetics. Arch. Ration. Mech. Anal. 49, 172–186 (1972)
Feinberg, M.: Lectures on chemical reaction networks (1979). http://www.che.eng.ohio-state.edu/FEINBERG/LecturesOnReactionNetworks/
Gopalkrishnan, M.: Catalysis in reaction networks. Bull. Math. Biol. 73(12), 2962–2982 (2011)
Anderson, D.F., Craciun, G., Kurtz, T.G.: Product-form stationary distributions for deficiency zero chemical reaction networks. Bull. Math. Biol. 72(8), 1947–1970 (2010)
Tu, B.P., McKnight, S.L.: Metabolic cycles as an underlying basis of biological oscillations. Nat. Rev. Mol. Cell Biol. 7, 9 (2006)
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)
Singh, A., Gopalkrishnan, M.: EM algorithm with DNA molecules. In: Poster Presentations of the 24th Edition of International Conference on DNA Computing and Molecular Programming (2018)
Poole, W., et al.: Chemical Boltzmann machines. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 210–231. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_14
Roweis, S., Ghahramani, Z.: A unifying review of linear Gaussian models. Neural Comput. 11(2), 305–345 (1999)
Maass, W.: On the computational power of winner-take-all. Neural Comput. 12(11), 2519–2535 (2000)
Kappel, D., Nessler, B., Maass, W.: STDP installs in winner-take-all circuits an online approximation to hidden Markov model learning. PLoS Comput. Biol. 10, 3 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Comparing Points of Equilibria
We will now prove Theorem 1. For the sake of convenience, we first recall the statement.
Theorem 1. Every fixed point of the Baum-Welch algorithm for an HMM \(\mathcal {M}=(H,V,\theta ,\psi ,\pi )\) is a fixed point for the corresponding Chemical Baum-Welch Algorithm with permissible rates k.
Proof
Consider a point \(\varPhi = (\alpha ',\beta ',\gamma ',\xi ',\theta ',\psi ')\) with \(\alpha ',\beta ',\gamma '\in \mathbb {R}_{\ge 0}^{L\times H}\), \(\xi '\in \mathbb {R}_{\ge 0}^{(L-1)\times H\times H}\), \(\theta '\in \mathbb {R}_{\ge 0}^{H\times H}\) and \(\psi '\in \mathbb {R}_{\ge 0}^{H\times V}\). If \(\varPhi \) is a fixed point of Baum-Welch Algorithm then it must satisfy:
-
\(\alpha '_{1h} = \pi _h\psi '_{h v_1}\) for all \(h\in H\). Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\alpha }_{1h}\big |_\varPhi =k^\alpha _{1h}\left( {\alpha '_{1h^*} }{ \pi _h\psi '_{h v_1}}-{\alpha '_{1h} }{ \pi _{h^*}\psi '_{h^* v_1}}\right) =0 \end{aligned}$$for all \(h\in H\setminus \{h^*\}\) and \(\dot{\alpha }_{1h^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\alpha }_{1h}\big |_\varPhi =0\).
-
\(\alpha '_{l h}=\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}\) for all \(h\in H\) and \(l=2,\ldots ,L\). Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\alpha }_{lh}\big |_\varPhi =k^\alpha _{lh}\left( {\alpha '_{l h^*}}{\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}-{\alpha '_{l h}}{\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh^*}\psi '_{h^* v_l}}\right) =0 \end{aligned}$$for all \(h\in H\setminus \{h^*\}\) and \(l=2,\ldots ,L\) and \(\dot{\alpha }_{lh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\alpha }_{lh}\big |_\varPhi =0\).
-
\(\beta '_{l h}=\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}\) for all \(h\in H\) and \(l=1,\ldots ,L-1\). Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\beta }_{lh}\big |_\varPhi =k^\beta _{lh}\left( {\beta '_{l h^*}}{\sum _{g\in H} \theta '_{h^*g}\psi '_{g v_{l+1}}\beta '_{l+1,g}}-{\beta '_{l h}}{\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\right) =0 \end{aligned}$$for all \(h\in H\setminus \{h^*\}\) and \(l=1,2,\dots ,L-1\) and \(\dot{\beta }_{lh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\beta }_{lh}\big |_\varPhi =0\).
-
\(\gamma '_l(h)=\frac{\alpha '_{l h}\beta '_{l h}}{\sum _{g\in H} \alpha '_{l g}\beta '_{l g}}\) for all \(h\in H\) and \(l=1,2,\dots ,L-1\). Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\gamma }_{lh}=k^\gamma _{lh}\left( {\gamma '_{lh^*}}{\alpha '_{l h}\beta '_{l h}}-{\gamma '_{lh}}{\alpha '_{l h^*}\beta '_{l h^*}}\right) =0 \end{aligned}$$for all \(h\in H\setminus \{h^*\}\) and \(l=1,2,\dots ,L-1\) and \(\dot{\gamma }_{lh^*}=-\sum _{h\not =h^*}\dot{\gamma }_{lh}=0\).
-
\(\xi '_l(g,h)=\frac{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}{\sum _{f\in H} \alpha '_{l f}\beta '_{l f}}\text { for all } g,h\in H\text { and }l=1,\ldots ,L-1 \). Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\xi }_{lgh}\big |_\varPhi =k^\xi _{lgh}\left( {\xi '_{lh^*h^*}}{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}-{\xi '_{lgh}}{\alpha '_{l h^*}\theta '_{h^*h^*}\psi '_{ h^*v_{l+1}}\beta '_{l+1,h^*}}\right) =0 \end{aligned}$$\(\text { for all } g,h\in H\times H\setminus \{(h^*,h^*)\}\text { and } l=1,\ldots ,L-1\) and \(\dot{\xi }_{lh^*h^*}\big |_\varPhi =-\sum _{(g,h)\not =(h^*,h^*)}\dot{\xi }_{lgh}\big |_\varPhi =0\).
-
\(\theta '_{gh}=\frac{\sum _{l=1}^{L-1}\xi '_l(g,h)}{\sum _{l=1}^{L-1}\sum _{f\in H}\xi '_l(g,f)}\text { for all }g,h\in H \). Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\theta }_{gh}\big |_\varPhi =k^\theta _{gh}\left( {\theta '_{gh^*}}{\sum _{l=1}^{L-1}\xi '_{lgh}}-{\theta '_{gh}}{\sum _{l=1}^{L-1}\xi '_l(g,h^*)}\right) =0 \end{aligned}$$\(\text { for all }g\in H\text { and }h\in H\setminus \{h^*\}\) and \(\dot{\theta }_{gh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\theta }_{gh}\big |_\varPhi =0 \).
-
\(\psi '_{hw}=\frac{\sum _{l=1}^L\gamma '_l(h)\delta _{w,v_l}}{\sum _{l=1}^L\gamma '_l(h)}\text { for all }h\in H\text { and }w\in V \). Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\theta }_{gh}\big |_\varPhi =k^\theta _{gh}\left( {\theta '_{gh^*}}{\sum _{l=1}^{L-1}\xi '_{lgh}}-{\theta '_{gh}}{\sum _{l=1}^{L-1}\xi '_l(g,h^*)}\right) =0 \end{aligned}$$\(\text { for all }h\in H\text { and }w\in V\setminus \{v^*\}\) and \(\dot{\psi }_{hw^*}\big |_\varPhi =-\sum _{w\not =w^*}\dot{\psi }_{hw}\big |_\varPhi =0\).
So \(\varPhi \) is fixed point of the chemical Baum-Welch Algorithm. \(\square \)
We will now prove Theorem 2. For the sake of convenience, we first recall the statement.
Theorem 2. Every positive fixed point for the Chemical Baum-Welch Algorithm on a Baum Welch Reaction system \((BW(\mathcal {M},h^*,v^*,L),k)\) with permissible rate k is a fixed point for the Baum-Welch algorithm for the HMM \(\mathcal {M}=(H,V,\theta ,\psi ,\pi )\).
Proof
Consider a positive point \(\varPhi =(\alpha ',\beta ',\gamma ',\xi ',\theta ',\psi ')\) with \(\alpha ',\beta ',\gamma '\in \mathbb {R}_{>0}^{L\times H}\), \(\xi '\in \mathbb {R}_{>0}^{(L-1)\times H\times H}\), \(\theta '\in \mathbb {R}_{>0}^{H\times H}\) and \(\psi '\in \mathbb {R}_{>0}^{H\times V}\). If \(\varPhi \) is a fixed point for the Chemical Baum-Welch Algorithm then we must have:
-
\(\dot{\alpha }_{1h}\big |_\varPhi =0\) for all \(h\in H\). This implies \({\alpha '_{1h} }\times { \pi _{h^*}\psi '_{h^* v_1}}={\alpha '_{1h^*} }\times { \pi _h\psi '_{h v_1}}\text {for all }h\in H\setminus {h^*}\). Since \(\varPhi \) is positive, this implies
$$\begin{aligned} {\alpha '_{1h} }=\left( { \pi _h\psi '_{h v_1}}\right) \frac{\sum _{f\in H}\alpha '_{1f} }{\sum _{f\in H} \pi _{f}\psi '_{f v_1}}\quad \text { for all }h\in H \end{aligned}$$ -
\(\dot{\alpha }_{lh}\big |_\varPhi =0\) for all \(h\in H\) and \(l=2,\ldots ,L\). This implies \({\alpha '_{l h}}\times {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh^*}\psi '_{h^* v_l}}={\alpha '_{l h^*}}\times {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=2,\ldots ,L\). Since \(\varPhi \) is positive, this implies
$$\begin{aligned} {\alpha '_{l h}}=\left( {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}\right) \frac{\sum _{f\in H}\alpha '_{l f}}{\sum _{f,g\in H} \alpha '_{l-1,g}\theta '_{gf}\psi '_{f v_l}}\quad \text { for all }h\in H\text { and }l=2,\ldots ,L \end{aligned}$$ -
\(\dot{\beta }_{lh}\big |_\varPhi =0\) for all \(h\in H\) and \(l=1,\ldots ,L\). This implies \({\beta '_{l h}}\times \sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}={\beta '_{l h^*}}\times \sum _{g\in H} \theta '_{h^*g}\psi '_{g v_{l+1}}\beta '_{l+1,g}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=1,\ldots ,L-1\). Since \(\varPhi \) is positive, this implies
$$\begin{aligned} {\beta '_{l h}}=\left( {\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\right) \frac{\sum _{f\in H}\beta '_{l f}}{\sum _{f,g\in H} \theta '_{fg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\quad \text { for all }h\in H\text { and }l=1,\ldots ,L-1 \end{aligned}$$ -
\(\dot{\gamma }_{lh}\big |_\varPhi =0\) for all \(h\in H\) and \(l=1,\ldots ,L\). This implies \({\gamma '_l(h)}\times {\alpha '_{l h^*}\beta '_{l h^*}}={\gamma '_l(h^*)}\times {\alpha '_{l h}\beta '_{l h}}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=1,2,\dots ,L-1\). Since \(\varPhi \) is positive, this implies
$$\begin{aligned} {\gamma '_{lh}}=\left( \frac{\alpha '_{l h}\beta '_{l h}}{\sum _{g\in H}\alpha '_{l g}\beta '_{l g}}\right) {\sum _{g\in H}\gamma '_{lg}}\quad \text { for all }h\in H\text { and }l=1,2,\dots ,L-1 \end{aligned}$$ -
\(\dot{\xi }_{lgh}\big |_\varPhi =0\) for all \(g,h\in H\) and \(l=1,\ldots ,L-1\). This implies \({\xi '_l(g,h)}\times {\alpha '_{l h^*}\theta '_{h^*h^*}\psi '_{ h^*v_{l+1}}\beta '_{l+1,h^*}}={\xi '_l(h^*,h^*)}\times {\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}\) \(\text { for all } g,h\in H\times H\setminus \{(h^*,h^*)\}\text { and } l=1,\ldots ,L-1\). Since \(\varPhi \) is positive, this implies
$$\begin{aligned} {\xi '_{lgh}}=\left( \frac{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}{\sum _{e,f\in H}\alpha '_{l f}\theta '_{ef}\psi '_{ fv_{l+1}}\beta '_{l+1,f}}\right) {\sum _{e,f\in H}\xi '_{lef}} \quad \text { for all } g,h\in H\times H\text { and } l=1,\ldots ,L-1 \end{aligned}$$ -
\(\dot{\theta }_{gh}\big |_\varPhi =0\) for all \(g,h\in H\). This implies \({\theta '_{gh}}\times {\sum _{l=1}^{L-1}\xi '_l(g,h^*)}= {\theta '_{gh^*}}\times {\sum _{l=1}^{L-1}\xi '_l(g,h)}\quad \text { for all }g\in H\text { and }h\in H\setminus \{h^*\}\). Since \(\varPhi \) is positive, this implies
$$\begin{aligned} {\theta '_{gh}}=\left( \frac{\sum _{l=1}^{L-1}\xi '_{lgh}}{\sum _{f\in H}\sum _{l=1}^{L-1}\xi '_{lgf}}\right) {\sum _{f\in H}\theta '_{gf}}\quad \text { for all }g\in H\text { and }h\in H \end{aligned}$$ -
\(\dot{\psi }_{hw}\big |_\varPhi =0\) for all \(h\in H\) and \(w\in V\). This implies \({\psi '_{hw}}\times {\sum _{l=1}^L\gamma '_l(h)\delta _{v^*,v_l}} = {\psi '_{hv^*}}\times {\sum _{l=1}^L\gamma '_l(h)\delta _{w,v_l}}\quad \text { for all }h\in H\text { and }w\in V\setminus \{v^*\}\). Since \(\varPhi \) is positive, \(E_{lv}=\delta _{v,v_l}\) and \(\sum _{v\in V}\delta _{v,v_l}=1\) this implies
$$\begin{aligned} {\psi '_{hw}}=\left( \frac{\sum _{l=1}^L\gamma '_{lh}\delta _{w,v_l}}{\sum _{l=1}^L\gamma '_{lh}}\right) {\sum _{v\in V}\psi '_{hv}}\quad \text { for all }h\in H\text { and }w\in V \end{aligned}$$
Because of the relaxation we get by Remark 1, the point \(\varPhi \) qualifies as a fixed point of the Baum-Welch algorithm. \(\square \)
1.2 A.2 Rate of Convergence Analysis
In this section we will prove Theorems 3 and 4, but first we will state and prove two useful lemmas.
Lemma 1
Let A be an arbitrary \(n\times n\) matrix. Let W be an \(r\times n\) matrix comprising of r linearly independent left kernel vectors of A so that \(WA=0_{r,n}\), where \(0_{i,j}\) denotes a \(i\times j\) matrix with all entries zero. Further suppose W is in the row reduced form, that is,
where \(I_j\) denotes the \(j\times j\) identity matrix and \(W'\) is a \( r\times (n-r)\) matrix. Let A be given as
where \(A_{11}\) is a \((n-r)\times (n-r)\) matrix, \(A_{12}\) is a \( (n-r)\times r\) matrix, \(A_{21}\) is a \(r\times (n-r)\) matrix, and \(A_{22}\) is a \(r\times r\) matrix.
Then the \(n-r\) eigenvalues (with multiplicity) of the matrix \(A_{11} -A_{12}W'\) are the same as the eigenvalues of A except for r zero eigenvalues.
Proof
Consider the \(n\times n\) invertible matrix P given by
with determinant \(\det (P)=\det (P^{-1})=1\). We have
This implies that the characteristic polynomial of A fulfils
and the statement follows. \(\square \)
Now we revisit the observation that every reaction in the Baum-Welch reaction network is a monomolecular transformation catalyzed by a set of species. For the purposes of our analysis, each reaction can be abstracted as a monomolecular reaction with time varying rate constants. This prompts us to consider the following monomolecular reaction system with n species \(X_1,\ldots ,X_n\) and m reactions
where \(r_j\not =p_j\), and \(r_j,p_j\in \{1,\ldots ,n\}\), and \(k_j(t)\), \(j=1,\ldots ,m\), are mass-action reaction rate constants, possibly depending on time. We assume \(k_j(t)>0\) for \(t\ge 0\) and let \(k(t)=(k_1(t),\ldots ,k_{m}(t))\) be the vector of reaction rate constants. Furthermore, we assume there is at most one reaction j such that \((r_j,p_j)=(r,p)\in \{1,\ldots ,n\}^2\) and that the reaction network is strongly connected. The later means there is a reaction path from any species \(X_i\) to any other species \(X_{i'}\). (In reaction network terms it means the network is weakly reversible and deficiency zero.)
The mass action kinetics of this reaction system is given by the ODE system
Define the \(n\times n\) matrix \(A(t)=(a_{ii'}(t))_{i,i'=1,\ldots ,n}\) by
Then the ODE system might be written as
Note that the column sums of A(t) are zero, implying that \(\sum _{i=1}^n x_i\) is conserved.
Lemma 2
Assume k(t) for \(t\ge 0\), converges exponentially fast towards \(k=(k_1,\ldots ,k_{m})\in R^{m}_{>0}\) as \(t\rightarrow \infty \), that is, there exists \(\gamma _1>0\) and \(K_1>0\) such that
Let A(t) be the matrix as defined in Eq. 1. And let A be the matrix obtained with k inserted for k(t) in the matrix A(t) that is, \(A=\lim _{t\rightarrow \infty } A(t)\).
Then solutions of ODE system \(\dot{x}=A(t)x\) starting at \(x(0)\in \mathbb {R}^n_{\ge 0}\) converges exponentially fast towards the equilibrium \(a\in \mathbb {R}^n_{>0}\) of the ODE system \(\dot{x}=Ax\) starting at \(x(0)\in \mathbb {R}^n_{\ge 0}\), that is, there exists \(\gamma >0\) and \(K>0\) such that
Proof
We will first rephrase the ODE system such that standard theory is applicable. Let rank of A be \(n-r\). Let W be as defined in Lemma 1, that is, W be an \(r\times n\) matrix comprising of r linearly independent left kernel vectors of A so that \(WA=0\). Here since rank of A is \(n-r\), the rows of W would form a basis for the left kernel of A. And as in Lemma 1, further suppose W is in the row reduced form, that is,
Then
is a linear dynamical system with r conservation laws (one for each row of W). Let \(Wx(0)=T\in \mathbb {R}^r\) be the vector of conserved amounts. Let \(\hat{x}=(x_1,\ldots ,x_{n-r})\) and \(\tilde{x}=(x_{n-r+1},\ldots ,x_{n})\). We will consider the (equivalent) dynamical system in which r variables are eliminated, expressed through the conservation laws
As in Lemma 1, let A be given as
where \(A_{11}\) is a \((n-r)\times (n-r)\) matrix, \(A_{12}\) is a \( (n-r)\times r\) matrix, \(A_{21}\) is a \(r\times (n-r)\) matrix, and \(A_{22}\) is a \(r\times r\) matrix. This yields
with \(C=A_{11}-A_{12}W'\) and \(D=A_{12}\). We call this as the reduced ODE system. Note that this reduced system has only \(n-r\) variables and that the conservation laws are built directly into it. This implies that the differential equation changes if T is changed. The role of this construction is so that we can work only with the non-zero eigenvalues of the A.
As we also have \(WA(t)=0\) for all \(t\ge 0\), the ODE \(\dot{x}=A(t)x\) can also be similarly reduced to
with \(C(t)=A_{11}(t)-A_{12}(t)W'\) and \(D(t)=A_{12}(t)\), where analogous to \(A_{11}\), we define \(A_{11}(t)\) to be the top-left \((n-r)\times (n-r)\) sub-matrix of A(t) and analogous to \(A_{12}\), we define \(A_{12}(t)\) to be the top-right \( (n-r)\times r\) sub-matrix of A(t).
Now if a is the equilibrium of the ODE system \(\dot{x}=Ax\) starting at x(0), then \(\hat{a}=(a_1,\ldots ,a_{n-r})\) is an equilibrium of the reduced ODE system \(\dot{\hat{x}}= C\hat{x}+DT\) starting at \(\hat{x}(0)=(x_1(0),\ldots ,x_{n-r}(0))\). Suppose we are able to prove that solutions of reduced ODE \(\dot{\hat{x}}=C(t)\hat{x}+D(t)T\) starting at \(\hat{x}(0)\) converges exponentially fast towards \(\hat{a}\) then because of the conservation laws \(\tilde{x}=T-W' \hat{x}\), we would also have that solutions of \(\dot{x} = A(t)x\) starting at x(0) converges exponentially fast towards a. So henceforth, we will work only with the reduced ODE systems. For notational convenience, we will drop the hats off \(\hat{x}\) and \(\hat{a}\) and simply refer to them as x and a respectively.
By subtracting and adding terms to the reduced ODE system (in Eq. 5), we have
As a is an equilibrium of the ODE system \(\dot{x}=C x+DT\), we have \(C a+DT=0\).
Define \(y=x-a\). Then
where it is used that \(Ca+DT=0\), and \(E(t)=(C(t)-C)x(t) + (D(t)-D)T\).
The solution to the above ODE system is known to be
We have, using (6) and the triangle inequality,
Now A as defined (see Eq. 1 with k inserted for k(t)) would form a Laplacian matrix over a strongly connected graph and so it follows that all the eigenvalues of A are either zero or have negative real part. And using \(C=A_{11}-A_{12}W'\) and Lemma 1 it follows that all eigenvalues of C have negative real part. Hence it follows that
where \(0<\gamma _2<-\mathfrak {R}(\lambda _1)\) and \(K_2>0\). Here \(\lambda _1\) is the eigenvalue of C with the largest real part.
The matrices C(t) and D(t) are linear in k(t). And as k(t) converges exponentially fast towards k, it follows that the matrices C(t) and D(t) converge exponentially fast towards C and D respectively. Hence it follows that
where
-
\(\,||\,(C_0(t)-C)\,||\,\,||\,x(t)\,||\,\le K_3 e^{-\gamma _3 t}\) for some \(K_3,\gamma _3>0\) as C(t) converges exponentially fast towards C and x(t) is bounded (as in the original ODE \(\sum _{i=1}^n x_i\) is conserved), and
-
\(\,||\,(D_0(t)-D)\,||\,\,||\,T\,||\,\le K_4 e^{-\gamma _4 t}\) for some \(K_4,\gamma _4>0\) as D(t) converges exponentially fast towards D, and
-
\(K_5=\frac{1}{2}\max (K_3, K_4)\) and \(\gamma _5=\min (\gamma _3,\gamma _4)\).
Collecting all terms we have for all \(t\ge 0\),
by choosing \(K_0=\max \left( K_2K_5, \,||\,y(0)\,||\,K_2\right) \) and \(\gamma _0=\min (\gamma _1,\gamma _2)\). In the last line \(\gamma \) is chosen such that \(0<\gamma <\gamma _0\) and K is sufficiently large. Since \(y(t)=x(t)-a\) we have,
as required. \(\square \)
We will now prove Theorem 3. For the sake of convenience, we first recall the statement.
Theorem 3. For the Baum Welch Reaction System \((BW(\mathcal {M},h^*,v^*,L),k)\) with permissible rates k, if the concentrations of \(\theta \) and \(\psi \) species are held fixed at a positive point then the Forward, Backward and Expection step reaction systems on \(\alpha ,\beta ,\gamma \) and \(\psi \) species converge to equilibrium exponentially fast.
Proof
It follows by repeated use of Lemma 2. For \(l=1\) the forward reaction network can be interpretated as the following molecular reactions:
for all \(h\in H\setminus \{h^*\}\) and \(w\in V\), as they are dynamically equivalent. Here the effective rate constants (\(\pi _{h^*}\psi _{h^*w}E_{1w}\) or \(\pi _{h}\psi _{hw}E_{1w}\)) are independent of time and so the conditions of Lemma 2 are fulfilled. Thus this portion of the reaction network converges exponentially fast.
The rest of the forward reaction network can be similarly interpretated as the following molecular reactions:
for all \(g\in H\), \(h\in H\setminus \{h^*\}, l=2,\ldots , L\) and \(w\in V\). For layers \(l=2,\ldots ,L\) of the forward reaction network we observe that the effective rate constants (\(\alpha _{l-1,g}\theta _{gh^*}\psi _{h^*w}E_{lw}\) or \(\alpha _{l-1,g}\theta _{gh}\psi _{hw}E_{lw}\)) for layer l depend on time only through \(\alpha _{l-1,g}\). If we suppose that the concentration of \(\alpha _{l-1,g}\) converges exponentially fast, then we can use Lemma 2 to conclude that the concentration of \(\alpha _{lh}\) also converges exponentially fast. Thus using Lemma 2 inductively layer by layer we conclude that forward reaction network converges exponentially fast. The backward reaction network converges exponentially fast, similarly.
For the expectation reaction network it likewise follows by induction. But here, notice if we interpreted the expectation network similarly into molecular reactions, the effective rate constants would depend on time through the products such as \(\alpha _{lh}\beta _{lh}\) or \(\alpha _{lh}\beta _{l+1,h}\). So to apply Lemma 2 we need the following: If \(\alpha _l(t)\) and \(\beta _l(t)\) converge exponentially fast towards \(a_l\) and \(b_l\) then the product \(\alpha _l(t)\beta _l(t)\) converges exponentially fast towards \(a_l b_l\) as
where K is some suitably large constant. We can further observe that \(\alpha _l(t)\beta _l(t)\) converges exponentially fast towards \(a_l b_l\) at rate \(\gamma =\min (\gamma _a,\gamma _b)\), where \(\gamma _a\) and \(\gamma _b\), respectively, are the exponential convergence rates of \(\alpha _l(t)\) and \(\beta _l(t)\). A similar argument goes for the products of the form \(\alpha _l(t)\beta _{l+1}(t)\). And thus the expectation reaction network, also converges exponentially fast. \(\square \)
We will now prove Theorem 4. For the sake of convenience, we first recall the statement.
Theorem 4. For the Baum Welch Reaction System \((BW(\mathcal {M},h^*,v^*,L),k)\) with permissible rates k, if the concentrations of \(\alpha ,\beta ,\gamma \) and \(\xi \) species are held fixed at a positive point then the Maximization step reaction system on \(\theta \) and \(\psi \) converges to equilibrium exponentially fast.
Proof
Exponential convergence of the maximisation network follows by a similar layer by layer inductive use of Lemma 2. \(\square \)
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Singh, A., Wiuf, C., Behera, A., Gopalkrishnan, M. (2019). A Reaction Network Scheme Which Implements Inference and Learning for Hidden Markov Models. In: Thachuk, C., Liu, Y. (eds) DNA Computing and Molecular Programming. DNA 2019. Lecture Notes in Computer Science(), vol 11648. Springer, Cham. https://doi.org/10.1007/978-3-030-26807-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-26807-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26806-0
Online ISBN: 978-3-030-26807-7
eBook Packages: Computer ScienceComputer Science (R0)