A Reaction Network Scheme Which Implements Inference and Learning for Hidden Markov Models | SpringerLink
Skip to main content

A Reaction Network Scheme Which Implements Inference and Learning for Hidden Markov Models

  • Conference paper
  • First Online:
DNA Computing and Molecular Programming (DNA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11648))

Included in the following conference series:

Abstract

With a view towards molecular communication systems and molecular multi-agent systems, we propose the Chemical Baum-Welch Algorithm, a novel reaction network scheme that learns parameters for Hidden Markov Models (HMMs). Each reaction in our scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every fixed point of the Baum-Welch algorithm for HMMs is a fixed point of our reaction network scheme, and every positive fixed point of our scheme is a fixed point of the Baum-Welch algorithm. We prove that the “Expectation” step and the “Maximization” step of our reaction network separately converge exponentially fast. We simulate mass-action kinetics for our network on an example sequence, and show that it learns the same parameters for the HMM as the Baum-Welch algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 6634
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 8293
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

  1. Soloveichik, D., Seelig, G., Winfree, E.: DNA as a universal substrate for chemical kinetics. Proc. Natl. Acad. Sci. 107(12), 5393–5398 (2010)

    Article  Google Scholar 

  2. Srinivas, N.: Programming chemical kinetics: engineering dynamic reaction networks with DNA strand displacement. Ph.D. thesis, California Institute of Technology (2015)

    Google Scholar 

  3. Qian, L., Soloveichik, D., Winfree, E.: Efficient turing-universal computation with DNA polymers. In: Sakakibara, Y., Mi, Y. (eds.) DNA 2010. LNCS, vol. 6518, pp. 123–140. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18305-8_12

    Chapter  MATH  Google Scholar 

  4. Cardelli, L.: Strand algebras for DNA computing. Nat. Comput. 10, 407–428 (2011)

    Article  MathSciNet  Google Scholar 

  5. Lakin, M.R., Youssef, S., Cardelli, L., Phillips, A.: Abstractions for DNA circuit design. J. R. Soc. Interface 9(68), 470–486 (2011)

    Article  Google Scholar 

  6. Cardelli, L.: Two-domain DNA strand displacement. Math. Struct. Comput. Sci. 23, 02 (2013)

    Article  MathSciNet  Google Scholar 

  7. Chen, Y.-J., et al.: Programmable chemical controllers made from DNA. Nat. Nanotechnol. 8(10), 755–762 (2013)

    Article  Google Scholar 

  8. Lakin, M.R., Stefanovic, D., Phillips, A.: Modular verification of chemical reaction network encodings via serializability analysis. Theor. Comput. Sci. 632, 21–42 (2016)

    Article  MathSciNet  Google Scholar 

  9. Srinivas, N., Parkin, J., Seelig, G., Winfree, E., Soloveichik, D.: Enzyme-free nucleic acid dynamical systems. Science 358, 6369 (2017)

    Article  Google Scholar 

  10. Cherry, K.M., Qian, L.: Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks. Nature 559, 7714 (2018)

    Article  Google Scholar 

  11. Zechner, C., Seelig, G., Rullan, M., Khammash, M.: Molecular circuits for dynamic noise filtering. Proc. Natl. Acad. Sci. 113(17), 4729–4734 (2016)

    Article  Google Scholar 

  12. Badelt, S., Shin, S.W., Johnson, R.F., Dong, Q., Thachuk, C., Winfree, E.: A general-purpose CRN-to-DSD compiler with formal verification, optimization, and simulation capabilities. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 232–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_15

    Chapter  MATH  Google Scholar 

  13. Lakin, M.R., Youssef, S., Polo, F., Emmott, S., Phillips, A.: Visual DSD: a design and analysis tool for DNA strand displacement systems. Bioinformatics 27(22), 3211–3213 (2011)

    Article  Google Scholar 

  14. Hjelmfelt, A., Weinberger, E.D., Ross, J.: Chemical implementation of neural networks and Turing machines. Proc. Natl. Acad. Sci. 88(24), 10983–10987 (1991)

    Article  Google Scholar 

  15. Buisman, H.J., ten Eikelder, H.M.M., Hilbers, P.A.J., Liekens, A.M.L.: Computing algebraic functions with biochemical reaction networks. Artif. Life 15(1), 5–19 (2009)

    Article  Google Scholar 

  16. Oishi, K., Klavins, E.: Biomolecular implementation of linear I/O systems. IET Syst. Biol. 5(4), 252–260 (2011)

    Article  Google Scholar 

  17. Soloveichik, D., Cook, M., Winfree, E., Bruck, J.: Computation with finite stochastic chemical reaction networks. Nat. Comput. 7(4), 615–633 (2008)

    Article  MathSciNet  Google Scholar 

  18. Chen, H.-L., Doty, D., Soloveichik, D.: Deterministic function computation with chemical reaction networks. Nat. Comput. 13(4), 517–534 (2014)

    Article  MathSciNet  Google Scholar 

  19. Qian, L., Winfree, E.: Scaling up digital circuit computation with DNA strand displacement cascades. Science 332(6034), 1196–1201 (2011)

    Article  Google Scholar 

  20. Napp, N.E., Adams, R.P.: Message passing inference with chemical reaction networks. In: Advances in Neural Information Processing Systems, pp. 2247–2255 (2013)

    Google Scholar 

  21. Qian, L., Winfree, E., Bruck, J.: Neural network computation with DNA strand displacement cascades. Nature 475(7356), 368–372 (2011)

    Article  Google Scholar 

  22. Cardelli, L., Kwiatkowska, M., Whitby, M.: Chemical reaction network designs for asynchronous logic circuits. Nat. Comput. 17(1), 109–130 (2018)

    Article  MathSciNet  Google Scholar 

  23. Gopalkrishnan, M.: A scheme for molecular computation of maximum likelihood estimators for log-linear models. In: Rondelez, Y., Woods, D. (eds.) DNA 2016. LNCS, vol. 9818, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43994-5_1

    Chapter  MATH  Google Scholar 

  24. Virinchi, M.V., Behera, A., Gopalkrishnan, M.: A stochastic molecular scheme for an artificial cell to infer its environment from partial observations. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 82–97. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_6

    Chapter  MATH  Google Scholar 

  25. Viswa Virinchi, M., Behera, A., Gopalkrishnan, M.: A reaction network scheme which implements the EM algorithm. In: Doty, D., Dietz, H. (eds.) DNA 2018. LNCS, vol. 11145, pp. 189–207. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00030-1_12

    Chapter  MATH  Google Scholar 

  26. Amari, S.: Information Geometry and Its Applications. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8

    Book  MATH  Google Scholar 

  27. Csiszár, I., Matus, F.: Information projections revisited. IEEE Trans. Inf. Theor. 49(6), 1474–1490 (2003)

    Article  MathSciNet  Google Scholar 

  28. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 4 (1957)

    Article  MathSciNet  Google Scholar 

  29. Shin, J.-S., Pierce, N.A.: A synthetic DNA walker for molecular transport. J. Am. Chem. Soc. 126(35), 10834–10835 (2004)

    Article  Google Scholar 

  30. Reif, J.: The design of autonomous DNA nano-mechanical devices: walking and rolling DNA. In: DNA Computing, pp. 439–461 (2003)

    Article  MathSciNet  Google Scholar 

  31. Sherman, W., Seeman, N.: A precisely controlled DNA biped walking device. Nano Lett. 4, 1203–1207 (2004)

    Article  Google Scholar 

  32. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)

    MATH  Google Scholar 

  33. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  34. Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)

    Article  MathSciNet  Google Scholar 

  35. Feinberg, M.: On chemical kinetics of a certain class. Arch. Rational Mech. Anal 46, 1–41 (1972)

    Article  MathSciNet  Google Scholar 

  36. Horn, F.J.M.: Necessary and sufficient conditions for complex balancing in chemical kinetics. Arch. Ration. Mech. Anal. 49, 172–186 (1972)

    Article  MathSciNet  Google Scholar 

  37. Feinberg, M.: Lectures on chemical reaction networks (1979). http://www.che.eng.ohio-state.edu/FEINBERG/LecturesOnReactionNetworks/

  38. Gopalkrishnan, M.: Catalysis in reaction networks. Bull. Math. Biol. 73(12), 2962–2982 (2011)

    Article  MathSciNet  Google Scholar 

  39. Anderson, D.F., Craciun, G., Kurtz, T.G.: Product-form stationary distributions for deficiency zero chemical reaction networks. Bull. Math. Biol. 72(8), 1947–1970 (2010)

    Article  MathSciNet  Google Scholar 

  40. Tu, B.P., McKnight, S.L.: Metabolic cycles as an underlying basis of biological oscillations. Nat. Rev. Mol. Cell Biol. 7, 9 (2006)

    Google Scholar 

  41. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)

    MATH  Google Scholar 

  42. Singh, A., Gopalkrishnan, M.: EM algorithm with DNA molecules. In: Poster Presentations of the 24th Edition of International Conference on DNA Computing and Molecular Programming (2018)

    Google Scholar 

  43. Poole, W., et al.: Chemical Boltzmann machines. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 210–231. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_14

    Chapter  Google Scholar 

  44. Roweis, S., Ghahramani, Z.: A unifying review of linear Gaussian models. Neural Comput. 11(2), 305–345 (1999)

    Article  Google Scholar 

  45. Maass, W.: On the computational power of winner-take-all. Neural Comput. 12(11), 2519–2535 (2000)

    Article  Google Scholar 

  46. Kappel, D., Nessler, B., Maass, W.: STDP installs in winner-take-all circuits an online approximation to hidden Markov model learning. PLoS Comput. Biol. 10, 3 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhinav Singh .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Comparing Points of Equilibria

We will now prove Theorem 1. For the sake of convenience, we first recall the statement.

Theorem 1. Every fixed point of the Baum-Welch algorithm for an HMM \(\mathcal {M}=(H,V,\theta ,\psi ,\pi )\) is a fixed point for the corresponding Chemical Baum-Welch Algorithm with permissible rates k.

Proof

Consider a point \(\varPhi = (\alpha ',\beta ',\gamma ',\xi ',\theta ',\psi ')\) with \(\alpha ',\beta ',\gamma '\in \mathbb {R}_{\ge 0}^{L\times H}\), \(\xi '\in \mathbb {R}_{\ge 0}^{(L-1)\times H\times H}\), \(\theta '\in \mathbb {R}_{\ge 0}^{H\times H}\) and \(\psi '\in \mathbb {R}_{\ge 0}^{H\times V}\). If \(\varPhi \) is a fixed point of Baum-Welch Algorithm then it must satisfy:

  • \(\alpha '_{1h} = \pi _h\psi '_{h v_1}\) for all \(h\in H\). Then for the chemical Baum-Welch Algorithm we have

    $$\begin{aligned} \dot{\alpha }_{1h}\big |_\varPhi =k^\alpha _{1h}\left( {\alpha '_{1h^*} }{ \pi _h\psi '_{h v_1}}-{\alpha '_{1h} }{ \pi _{h^*}\psi '_{h^* v_1}}\right) =0 \end{aligned}$$

    for all \(h\in H\setminus \{h^*\}\) and \(\dot{\alpha }_{1h^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\alpha }_{1h}\big |_\varPhi =0\).

  • \(\alpha '_{l h}=\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}\) for all \(h\in H\) and \(l=2,\ldots ,L\). Then for the chemical Baum-Welch Algorithm we have

    $$\begin{aligned} \dot{\alpha }_{lh}\big |_\varPhi =k^\alpha _{lh}\left( {\alpha '_{l h^*}}{\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}-{\alpha '_{l h}}{\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh^*}\psi '_{h^* v_l}}\right) =0 \end{aligned}$$

    for all \(h\in H\setminus \{h^*\}\) and \(l=2,\ldots ,L\) and \(\dot{\alpha }_{lh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\alpha }_{lh}\big |_\varPhi =0\).

  • \(\beta '_{l h}=\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}\) for all \(h\in H\) and \(l=1,\ldots ,L-1\). Then for the chemical Baum-Welch Algorithm we have

    $$\begin{aligned} \dot{\beta }_{lh}\big |_\varPhi =k^\beta _{lh}\left( {\beta '_{l h^*}}{\sum _{g\in H} \theta '_{h^*g}\psi '_{g v_{l+1}}\beta '_{l+1,g}}-{\beta '_{l h}}{\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\right) =0 \end{aligned}$$

    for all \(h\in H\setminus \{h^*\}\) and \(l=1,2,\dots ,L-1\) and \(\dot{\beta }_{lh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\beta }_{lh}\big |_\varPhi =0\).

  • \(\gamma '_l(h)=\frac{\alpha '_{l h}\beta '_{l h}}{\sum _{g\in H} \alpha '_{l g}\beta '_{l g}}\) for all \(h\in H\) and \(l=1,2,\dots ,L-1\). Then for the chemical Baum-Welch Algorithm we have

    $$\begin{aligned} \dot{\gamma }_{lh}=k^\gamma _{lh}\left( {\gamma '_{lh^*}}{\alpha '_{l h}\beta '_{l h}}-{\gamma '_{lh}}{\alpha '_{l h^*}\beta '_{l h^*}}\right) =0 \end{aligned}$$

    for all \(h\in H\setminus \{h^*\}\) and \(l=1,2,\dots ,L-1\) and \(\dot{\gamma }_{lh^*}=-\sum _{h\not =h^*}\dot{\gamma }_{lh}=0\).

  • \(\xi '_l(g,h)=\frac{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}{\sum _{f\in H} \alpha '_{l f}\beta '_{l f}}\text { for all } g,h\in H\text { and }l=1,\ldots ,L-1 \). Then for the chemical Baum-Welch Algorithm we have

    $$\begin{aligned} \dot{\xi }_{lgh}\big |_\varPhi =k^\xi _{lgh}\left( {\xi '_{lh^*h^*}}{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}-{\xi '_{lgh}}{\alpha '_{l h^*}\theta '_{h^*h^*}\psi '_{ h^*v_{l+1}}\beta '_{l+1,h^*}}\right) =0 \end{aligned}$$

    \(\text { for all } g,h\in H\times H\setminus \{(h^*,h^*)\}\text { and } l=1,\ldots ,L-1\) and \(\dot{\xi }_{lh^*h^*}\big |_\varPhi =-\sum _{(g,h)\not =(h^*,h^*)}\dot{\xi }_{lgh}\big |_\varPhi =0\).

  • \(\theta '_{gh}=\frac{\sum _{l=1}^{L-1}\xi '_l(g,h)}{\sum _{l=1}^{L-1}\sum _{f\in H}\xi '_l(g,f)}\text { for all }g,h\in H \). Then for the chemical Baum-Welch Algorithm we have

    $$\begin{aligned} \dot{\theta }_{gh}\big |_\varPhi =k^\theta _{gh}\left( {\theta '_{gh^*}}{\sum _{l=1}^{L-1}\xi '_{lgh}}-{\theta '_{gh}}{\sum _{l=1}^{L-1}\xi '_l(g,h^*)}\right) =0 \end{aligned}$$

    \(\text { for all }g\in H\text { and }h\in H\setminus \{h^*\}\) and \(\dot{\theta }_{gh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\theta }_{gh}\big |_\varPhi =0 \).

  • \(\psi '_{hw}=\frac{\sum _{l=1}^L\gamma '_l(h)\delta _{w,v_l}}{\sum _{l=1}^L\gamma '_l(h)}\text { for all }h\in H\text { and }w\in V \). Then for the chemical Baum-Welch Algorithm we have

    $$\begin{aligned} \dot{\theta }_{gh}\big |_\varPhi =k^\theta _{gh}\left( {\theta '_{gh^*}}{\sum _{l=1}^{L-1}\xi '_{lgh}}-{\theta '_{gh}}{\sum _{l=1}^{L-1}\xi '_l(g,h^*)}\right) =0 \end{aligned}$$

    \(\text { for all }h\in H\text { and }w\in V\setminus \{v^*\}\) and \(\dot{\psi }_{hw^*}\big |_\varPhi =-\sum _{w\not =w^*}\dot{\psi }_{hw}\big |_\varPhi =0\).

So \(\varPhi \) is fixed point of the chemical Baum-Welch Algorithm.   \(\square \)

We will now prove Theorem 2. For the sake of convenience, we first recall the statement.

Theorem 2. Every positive fixed point for the Chemical Baum-Welch Algorithm on a Baum Welch Reaction system \((BW(\mathcal {M},h^*,v^*,L),k)\) with permissible rate k is a fixed point for the Baum-Welch algorithm for the HMM \(\mathcal {M}=(H,V,\theta ,\psi ,\pi )\).

Proof

Consider a positive point \(\varPhi =(\alpha ',\beta ',\gamma ',\xi ',\theta ',\psi ')\) with \(\alpha ',\beta ',\gamma '\in \mathbb {R}_{>0}^{L\times H}\), \(\xi '\in \mathbb {R}_{>0}^{(L-1)\times H\times H}\), \(\theta '\in \mathbb {R}_{>0}^{H\times H}\) and \(\psi '\in \mathbb {R}_{>0}^{H\times V}\). If \(\varPhi \) is a fixed point for the Chemical Baum-Welch Algorithm then we must have:

  • \(\dot{\alpha }_{1h}\big |_\varPhi =0\) for all \(h\in H\). This implies \({\alpha '_{1h} }\times { \pi _{h^*}\psi '_{h^* v_1}}={\alpha '_{1h^*} }\times { \pi _h\psi '_{h v_1}}\text {for all }h\in H\setminus {h^*}\). Since \(\varPhi \) is positive, this implies

    $$\begin{aligned} {\alpha '_{1h} }=\left( { \pi _h\psi '_{h v_1}}\right) \frac{\sum _{f\in H}\alpha '_{1f} }{\sum _{f\in H} \pi _{f}\psi '_{f v_1}}\quad \text { for all }h\in H \end{aligned}$$
  • \(\dot{\alpha }_{lh}\big |_\varPhi =0\) for all \(h\in H\) and \(l=2,\ldots ,L\). This implies \({\alpha '_{l h}}\times {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh^*}\psi '_{h^* v_l}}={\alpha '_{l h^*}}\times {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=2,\ldots ,L\). Since \(\varPhi \) is positive, this implies

    $$\begin{aligned} {\alpha '_{l h}}=\left( {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}\right) \frac{\sum _{f\in H}\alpha '_{l f}}{\sum _{f,g\in H} \alpha '_{l-1,g}\theta '_{gf}\psi '_{f v_l}}\quad \text { for all }h\in H\text { and }l=2,\ldots ,L \end{aligned}$$
  • \(\dot{\beta }_{lh}\big |_\varPhi =0\) for all \(h\in H\) and \(l=1,\ldots ,L\). This implies \({\beta '_{l h}}\times \sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}={\beta '_{l h^*}}\times \sum _{g\in H} \theta '_{h^*g}\psi '_{g v_{l+1}}\beta '_{l+1,g}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=1,\ldots ,L-1\). Since \(\varPhi \) is positive, this implies

    $$\begin{aligned} {\beta '_{l h}}=\left( {\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\right) \frac{\sum _{f\in H}\beta '_{l f}}{\sum _{f,g\in H} \theta '_{fg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\quad \text { for all }h\in H\text { and }l=1,\ldots ,L-1 \end{aligned}$$
  • \(\dot{\gamma }_{lh}\big |_\varPhi =0\) for all \(h\in H\) and \(l=1,\ldots ,L\). This implies \({\gamma '_l(h)}\times {\alpha '_{l h^*}\beta '_{l h^*}}={\gamma '_l(h^*)}\times {\alpha '_{l h}\beta '_{l h}}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=1,2,\dots ,L-1\). Since \(\varPhi \) is positive, this implies

    $$\begin{aligned} {\gamma '_{lh}}=\left( \frac{\alpha '_{l h}\beta '_{l h}}{\sum _{g\in H}\alpha '_{l g}\beta '_{l g}}\right) {\sum _{g\in H}\gamma '_{lg}}\quad \text { for all }h\in H\text { and }l=1,2,\dots ,L-1 \end{aligned}$$
  • \(\dot{\xi }_{lgh}\big |_\varPhi =0\) for all \(g,h\in H\) and \(l=1,\ldots ,L-1\). This implies \({\xi '_l(g,h)}\times {\alpha '_{l h^*}\theta '_{h^*h^*}\psi '_{ h^*v_{l+1}}\beta '_{l+1,h^*}}={\xi '_l(h^*,h^*)}\times {\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}\) \(\text { for all } g,h\in H\times H\setminus \{(h^*,h^*)\}\text { and } l=1,\ldots ,L-1\). Since \(\varPhi \) is positive, this implies

    $$\begin{aligned} {\xi '_{lgh}}=\left( \frac{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}{\sum _{e,f\in H}\alpha '_{l f}\theta '_{ef}\psi '_{ fv_{l+1}}\beta '_{l+1,f}}\right) {\sum _{e,f\in H}\xi '_{lef}} \quad \text { for all } g,h\in H\times H\text { and } l=1,\ldots ,L-1 \end{aligned}$$
  • \(\dot{\theta }_{gh}\big |_\varPhi =0\) for all \(g,h\in H\). This implies \({\theta '_{gh}}\times {\sum _{l=1}^{L-1}\xi '_l(g,h^*)}= {\theta '_{gh^*}}\times {\sum _{l=1}^{L-1}\xi '_l(g,h)}\quad \text { for all }g\in H\text { and }h\in H\setminus \{h^*\}\). Since \(\varPhi \) is positive, this implies

    $$\begin{aligned} {\theta '_{gh}}=\left( \frac{\sum _{l=1}^{L-1}\xi '_{lgh}}{\sum _{f\in H}\sum _{l=1}^{L-1}\xi '_{lgf}}\right) {\sum _{f\in H}\theta '_{gf}}\quad \text { for all }g\in H\text { and }h\in H \end{aligned}$$
  • \(\dot{\psi }_{hw}\big |_\varPhi =0\) for all \(h\in H\) and \(w\in V\). This implies \({\psi '_{hw}}\times {\sum _{l=1}^L\gamma '_l(h)\delta _{v^*,v_l}} = {\psi '_{hv^*}}\times {\sum _{l=1}^L\gamma '_l(h)\delta _{w,v_l}}\quad \text { for all }h\in H\text { and }w\in V\setminus \{v^*\}\). Since \(\varPhi \) is positive, \(E_{lv}=\delta _{v,v_l}\) and \(\sum _{v\in V}\delta _{v,v_l}=1\) this implies

    $$\begin{aligned} {\psi '_{hw}}=\left( \frac{\sum _{l=1}^L\gamma '_{lh}\delta _{w,v_l}}{\sum _{l=1}^L\gamma '_{lh}}\right) {\sum _{v\in V}\psi '_{hv}}\quad \text { for all }h\in H\text { and }w\in V \end{aligned}$$

Because of the relaxation we get by Remark 1, the point \(\varPhi \) qualifies as a fixed point of the Baum-Welch algorithm.   \(\square \)

1.2 A.2 Rate of Convergence Analysis

In this section we will prove Theorems 3 and 4, but first we will state and prove two useful lemmas.

Lemma 1

Let A be an arbitrary \(n\times n\) matrix. Let W be an \(r\times n\) matrix comprising of r linearly independent left kernel vectors of A so that \(WA=0_{r,n}\), where \(0_{i,j}\) denotes a \(i\times j\) matrix with all entries zero. Further suppose W is in the row reduced form, that is,

$$\begin{aligned} W=\begin{pmatrix} W'&I_r\end{pmatrix} \end{aligned}$$

where \(I_j\) denotes the \(j\times j\) identity matrix and \(W'\) is a \( r\times (n-r)\) matrix. Let A be given as

$$\begin{aligned} A=\begin{pmatrix} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{pmatrix}, \end{aligned}$$

where \(A_{11}\) is a \((n-r)\times (n-r)\) matrix, \(A_{12}\) is a \( (n-r)\times r\) matrix, \(A_{21}\) is a \(r\times (n-r)\) matrix, and \(A_{22}\) is a \(r\times r\) matrix.

Then the \(n-r\) eigenvalues (with multiplicity) of the matrix \(A_{11} -A_{12}W'\) are the same as the eigenvalues of A except for r zero eigenvalues.

Proof

Consider the \(n\times n\) invertible matrix P given by

$$P=\begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ W' &{} I_r \end{pmatrix}, \quad P^{-1}=\begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ -W' &{} I_r\end{pmatrix},$$

with determinant \(\det (P)=\det (P^{-1})=1\). We have

$$\begin{aligned} PAP^{-1}&= \begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ W' &{} I_r \end{pmatrix}\begin{pmatrix} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{pmatrix}\begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ -W' &{} I_r\end{pmatrix}\\&=\begin{pmatrix} A_{11} &{} A_{12} \\ 0_{r,n-r} &{} 0_{r,r} \end{pmatrix}\begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ -W' &{} I_r\end{pmatrix} \\&=\begin{pmatrix} A_{11} -A_{12}W' &{} A_{12} \\ 0_{r,n-r} &{} 0_{r,r} \end{pmatrix} \end{aligned}$$

This implies that the characteristic polynomial of A fulfils

$$\begin{aligned} \det (A-\lambda I_n)&=\det (P)\det (A-\lambda I_n)\det (P^{-1}) = \det (PAP^{-1}-\lambda I_n )\\&=\det \begin{pmatrix} A_{11} -A_{12}W' - \lambda I_{n-r} &{} A_{12} \\ 0_{r,n-r} &{} 0_{r,r} -\lambda I_r\end{pmatrix} \\&= (-1)^r\lambda ^r \det (A_{11} -A_{12}W' -\lambda I_{n-r}), \end{aligned}$$

and the statement follows.    \(\square \)

Now we revisit the observation that every reaction in the Baum-Welch reaction network is a monomolecular transformation catalyzed by a set of species. For the purposes of our analysis, each reaction can be abstracted as a monomolecular reaction with time varying rate constants. This prompts us to consider the following monomolecular reaction system with n species \(X_1,\ldots ,X_n\) and m reactions

where \(r_j\not =p_j\), and \(r_j,p_j\in \{1,\ldots ,n\}\), and \(k_j(t)\), \(j=1,\ldots ,m\), are mass-action reaction rate constants, possibly depending on time. We assume \(k_j(t)>0\) for \(t\ge 0\) and let \(k(t)=(k_1(t),\ldots ,k_{m}(t))\) be the vector of reaction rate constants. Furthermore, we assume there is at most one reaction j such that \((r_j,p_j)=(r,p)\in \{1,\ldots ,n\}^2\) and that the reaction network is strongly connected. The later means there is a reaction path from any species \(X_i\) to any other species \(X_{i'}\). (In reaction network terms it means the network is weakly reversible and deficiency zero.)

The mass action kinetics of this reaction system is given by the ODE system

$$\begin{aligned} \dot{x}_i&=-x_i\!\!\sum _{j=1:r_j=i}^{m}k_j(t) +\sum _{j=1:p_j=i}^{m}k_j(t) x_{r_j}, \qquad i=1,\ldots ,n. \end{aligned}$$

Define the \(n\times n\) matrix \(A(t)=(a_{ii'}(t))_{i,i'=1,\ldots ,n}\) by

$$\begin{aligned} \begin{aligned} a_{ii}(t)&=-\sum _{j=1:r_j=i}^{m}k_j(t), \\ a_{ii'}(t)&=k_j(t),\quad \text {if there is }j\in \{1,\ldots ,m\}\,\,\text {such that}\,\, (r_j,p_j)=(i',i). \end{aligned} \end{aligned}$$
(1)

Then the ODE system might be written as

$$\begin{aligned} \dot{x}= A(t) x. \end{aligned}$$
(2)

Note that the column sums of A(t) are zero, implying that \(\sum _{i=1}^n x_i\) is conserved.

Lemma 2

Assume k(t) for \(t\ge 0\), converges exponentially fast towards \(k=(k_1,\ldots ,k_{m})\in R^{m}_{>0}\) as \(t\rightarrow \infty \), that is, there exists \(\gamma _1>0\) and \(K_1>0\) such that

$$\begin{aligned} \,||\, k(t)-k\,||\,\le K_1 e^{-\gamma _1 t}\quad \text {for}\quad t\ge 0. \end{aligned}$$

Let A(t) be the matrix as defined in Eq. 1. And let A be the matrix obtained with k inserted for k(t) in the matrix A(t) that is, \(A=\lim _{t\rightarrow \infty } A(t)\).

Then solutions of ODE system \(\dot{x}=A(t)x\) starting at \(x(0)\in \mathbb {R}^n_{\ge 0}\) converges exponentially fast towards the equilibrium \(a\in \mathbb {R}^n_{>0}\) of the ODE system \(\dot{x}=Ax\) starting at \(x(0)\in \mathbb {R}^n_{\ge 0}\), that is, there exists \(\gamma >0\) and \(K>0\) such that

$$\begin{aligned} \,||\, x(t)-a\,||\,\le K e^{-\gamma t}\quad \text {for}\quad t\ge 0. \end{aligned}$$

Proof

We will first rephrase the ODE system such that standard theory is applicable. Let rank of A be \(n-r\). Let W be as defined in Lemma 1, that is, W be an \(r\times n\) matrix comprising of r linearly independent left kernel vectors of A so that \(WA=0\). Here since rank of A is \(n-r\), the rows of W would form a basis for the left kernel of A. And as in Lemma 1, further suppose W is in the row reduced form, that is,

$$\begin{aligned} W=\begin{pmatrix} W'&I_r\end{pmatrix}. \end{aligned}$$

Then

$$\begin{aligned} \dot{x}=A x \end{aligned}$$
(3)

is a linear dynamical system with r conservation laws (one for each row of W). Let \(Wx(0)=T\in \mathbb {R}^r\) be the vector of conserved amounts. Let \(\hat{x}=(x_1,\ldots ,x_{n-r})\) and \(\tilde{x}=(x_{n-r+1},\ldots ,x_{n})\). We will consider the (equivalent) dynamical system in which r variables are eliminated, expressed through the conservation laws

$$\begin{aligned} T=Wx=\begin{pmatrix} W'&I_r \end{pmatrix}x, \quad \text {or} \quad \tilde{x}=T-W' \hat{x}. \end{aligned}$$

As in Lemma 1, let A be given as

$$\begin{aligned} A=\begin{pmatrix} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{pmatrix}, \end{aligned}$$

where \(A_{11}\) is a \((n-r)\times (n-r)\) matrix, \(A_{12}\) is a \( (n-r)\times r\) matrix, \(A_{21}\) is a \(r\times (n-r)\) matrix, and \(A_{22}\) is a \(r\times r\) matrix. This yields

$$\begin{aligned} \begin{aligned} \dot{\hat{x}}&=\begin{pmatrix} A_{11}&A_{12} \end{pmatrix}\begin{pmatrix} \hat{x} \\ \tilde{x} \end{pmatrix} =\begin{pmatrix} A_{11}&A_{12} \end{pmatrix}\begin{pmatrix} \hat{x} \\ T-W' \hat{x} \end{pmatrix} \\&= (A_{11}-A_{12}W')\hat{x} +A_{12}T \\&= C\hat{x}+DT, \end{aligned} \end{aligned}$$
(4)

with \(C=A_{11}-A_{12}W'\) and \(D=A_{12}\). We call this as the reduced ODE system. Note that this reduced system has only \(n-r\) variables and that the conservation laws are built directly into it. This implies that the differential equation changes if T is changed. The role of this construction is so that we can work only with the non-zero eigenvalues of the A.

As we also have \(WA(t)=0\) for all \(t\ge 0\), the ODE \(\dot{x}=A(t)x\) can also be similarly reduced to

$$\begin{aligned} \dot{\hat{x}}=C(t)\hat{x}+D(t)T, \end{aligned}$$
(5)

with \(C(t)=A_{11}(t)-A_{12}(t)W'\) and \(D(t)=A_{12}(t)\), where analogous to \(A_{11}\), we define \(A_{11}(t)\) to be the top-left \((n-r)\times (n-r)\) sub-matrix of A(t) and analogous to \(A_{12}\), we define \(A_{12}(t)\) to be the top-right \( (n-r)\times r\) sub-matrix of A(t).

Now if a is the equilibrium of the ODE system \(\dot{x}=Ax\) starting at x(0), then \(\hat{a}=(a_1,\ldots ,a_{n-r})\) is an equilibrium of the reduced ODE system \(\dot{\hat{x}}= C\hat{x}+DT\) starting at \(\hat{x}(0)=(x_1(0),\ldots ,x_{n-r}(0))\). Suppose we are able to prove that solutions of reduced ODE \(\dot{\hat{x}}=C(t)\hat{x}+D(t)T\) starting at \(\hat{x}(0)\) converges exponentially fast towards \(\hat{a}\) then because of the conservation laws \(\tilde{x}=T-W' \hat{x}\), we would also have that solutions of \(\dot{x} = A(t)x\) starting at x(0) converges exponentially fast towards a. So henceforth, we will work only with the reduced ODE systems. For notational convenience, we will drop the hats off \(\hat{x}\) and \(\hat{a}\) and simply refer to them as x and a respectively.

By subtracting and adding terms to the reduced ODE system (in Eq. 5), we have

$$\begin{aligned} \dot{x}&=C(t)x +D(t)T \\&= C x +DT+(C(t)-C)x + (D(t)-D)T. \end{aligned}$$

As a is an equilibrium of the ODE system \(\dot{x}=C x+DT\), we have \(C a+DT=0\).

Define \(y=x-a\). Then

$$\begin{aligned} \dot{y}&= C x +DT+(C(t)-C)x + (D(t)-D)T \\&=C y +C a +DT+(C(t)-C)x + (D(t)-D)T \\&=C y +(C(t)-C)x(t) + (D(t)-D)T \\&= C y +E(t) \end{aligned}$$

where it is used that \(Ca+DT=0\), and \(E(t)=(C(t)-C)x(t) + (D(t)-D)T\).

The solution to the above ODE system is known to be

$$\begin{aligned} y(t)= e^{C t}y(0)+\int _0^t e^{C(t-s)}E(s)\,ds. \end{aligned}$$
(6)

We have, using (6) and the triangle inequality,

$$\begin{aligned} \,||\,y(t)\,||\,\le \,||\,e^{C t}y(0)\,||\,+\int _0^t \,||\,e^{C(t-s)}\,||\,\,||\,E(s)\,||\, ds. \end{aligned}$$

Now A as defined (see Eq. 1 with k inserted for k(t)) would form a Laplacian matrix over a strongly connected graph and so it follows that all the eigenvalues of A are either zero or have negative real part. And using \(C=A_{11}-A_{12}W'\) and Lemma 1 it follows that all eigenvalues of C have negative real part. Hence it follows that

$$\begin{aligned} \,||\,e^{C t}\,||\,\le K_2 e^{-\gamma _2 t}, \end{aligned}$$

where \(0<\gamma _2<-\mathfrak {R}(\lambda _1)\) and \(K_2>0\). Here \(\lambda _1\) is the eigenvalue of C with the largest real part.

The matrices C(t) and D(t) are linear in k(t). And as k(t) converges exponentially fast towards k, it follows that the matrices C(t) and D(t) converge exponentially fast towards C and D respectively. Hence it follows that

$$\begin{aligned} \,||\,E(t)\,||\,&= \,||\,(C(t)-C)x(t) + (D(t)-D)T\,||\, \\&\le \,||\,(C(t)-C)\,||\,\,||\,x(t)\,||\,+ \,||\,(D(t)-D)\,||\,\,||\,T\,||\, \\&\le K_3 e^{-\gamma _3 t}+ K_4 e^{-\gamma _4 t}\\&\le K_5 e^{-\gamma _5 t} \end{aligned}$$

where

  • \(\,||\,(C_0(t)-C)\,||\,\,||\,x(t)\,||\,\le K_3 e^{-\gamma _3 t}\) for some \(K_3,\gamma _3>0\) as C(t) converges exponentially fast towards C and x(t) is bounded (as in the original ODE \(\sum _{i=1}^n x_i\) is conserved), and

  • \(\,||\,(D_0(t)-D)\,||\,\,||\,T\,||\,\le K_4 e^{-\gamma _4 t}\) for some \(K_4,\gamma _4>0\) as D(t) converges exponentially fast towards D, and

  • \(K_5=\frac{1}{2}\max (K_3, K_4)\) and \(\gamma _5=\min (\gamma _3,\gamma _4)\).

Collecting all terms we have for all \(t\ge 0\),

$$\begin{aligned} \,||\,y(t)\,||\,&\le \,||\,y(0)\,||\,K_2e^{-\gamma _2 t}+\int _0^t K_2 e^{-\gamma _2 (t-s)}\times K_5 e^{-\gamma _5 s}ds \\&\le K_0e^{-\gamma _0 t}+K_0\int _0^t e^{-\gamma _0 t}ds\\&= K_0e^{-\gamma _0 t}(1+t)\\&\le K e^{-\gamma t} \end{aligned}$$

by choosing \(K_0=\max \left( K_2K_5, \,||\,y(0)\,||\,K_2\right) \) and \(\gamma _0=\min (\gamma _1,\gamma _2)\). In the last line \(\gamma \) is chosen such that \(0<\gamma <\gamma _0\) and K is sufficiently large. Since \(y(t)=x(t)-a\) we have,

$$\begin{aligned} \,||\,x(t)-a\,||\,\le K e^{-\gamma t}, \end{aligned}$$

as required.   \(\square \)

We will now prove Theorem 3. For the sake of convenience, we first recall the statement.

Theorem 3. For the Baum Welch Reaction System \((BW(\mathcal {M},h^*,v^*,L),k)\) with permissible rates k, if the concentrations of \(\theta \) and \(\psi \) species are held fixed at a positive point then the Forward, Backward and Expection step reaction systems on \(\alpha ,\beta ,\gamma \) and \(\psi \) species converge to equilibrium exponentially fast.

Proof

It follows by repeated use of Lemma 2. For \(l=1\) the forward reaction network can be interpretated as the following molecular reactions:

for all \(h\in H\setminus \{h^*\}\) and \(w\in V\), as they are dynamically equivalent. Here the effective rate constants (\(\pi _{h^*}\psi _{h^*w}E_{1w}\) or \(\pi _{h}\psi _{hw}E_{1w}\)) are independent of time and so the conditions of Lemma 2 are fulfilled. Thus this portion of the reaction network converges exponentially fast.

The rest of the forward reaction network can be similarly interpretated as the following molecular reactions:

for all \(g\in H\), \(h\in H\setminus \{h^*\}, l=2,\ldots , L\) and \(w\in V\). For layers \(l=2,\ldots ,L\) of the forward reaction network we observe that the effective rate constants (\(\alpha _{l-1,g}\theta _{gh^*}\psi _{h^*w}E_{lw}\) or \(\alpha _{l-1,g}\theta _{gh}\psi _{hw}E_{lw}\)) for layer l depend on time only through \(\alpha _{l-1,g}\). If we suppose that the concentration of \(\alpha _{l-1,g}\) converges exponentially fast, then we can use Lemma 2 to conclude that the concentration of \(\alpha _{lh}\) also converges exponentially fast. Thus using Lemma 2 inductively layer by layer we conclude that forward reaction network converges exponentially fast. The backward reaction network converges exponentially fast, similarly.

For the expectation reaction network it likewise follows by induction. But here, notice if we interpreted the expectation network similarly into molecular reactions, the effective rate constants would depend on time through the products such as \(\alpha _{lh}\beta _{lh}\) or \(\alpha _{lh}\beta _{l+1,h}\). So to apply Lemma 2 we need the following: If \(\alpha _l(t)\) and \(\beta _l(t)\) converge exponentially fast towards \(a_l\) and \(b_l\) then the product \(\alpha _l(t)\beta _l(t)\) converges exponentially fast towards \(a_l b_l\) as

$$\begin{aligned} \,||\,\alpha _l(t)\beta _l(t)-a_l b_l\,||\,&=\,||\,(\alpha _l(t)-a_l)(\beta _l(t)-b_l)+a_l( \beta _l(t)-b_l)+b_l( \alpha _l(t)-a_l)\,||\, \\&\le \,||\,\alpha _l(t)-a_l\,||\,\,||\,\beta _l(t)-b_l\,||\,+K \,||\,\beta _l(t)-b_l\,||\,+K \,||\,\alpha _l(t)-a_l\,||\,, \end{aligned}$$

where K is some suitably large constant. We can further observe that \(\alpha _l(t)\beta _l(t)\) converges exponentially fast towards \(a_l b_l\) at rate \(\gamma =\min (\gamma _a,\gamma _b)\), where \(\gamma _a\) and \(\gamma _b\), respectively, are the exponential convergence rates of \(\alpha _l(t)\) and \(\beta _l(t)\). A similar argument goes for the products of the form \(\alpha _l(t)\beta _{l+1}(t)\). And thus the expectation reaction network, also converges exponentially fast.   \(\square \)

We will now prove Theorem 4. For the sake of convenience, we first recall the statement.

Theorem 4. For the Baum Welch Reaction System \((BW(\mathcal {M},h^*,v^*,L),k)\) with permissible rates k, if the concentrations of \(\alpha ,\beta ,\gamma \) and \(\xi \) species are held fixed at a positive point then the Maximization step reaction system on \(\theta \) and \(\psi \) converges to equilibrium exponentially fast.

Proof

Exponential convergence of the maximisation network follows by a similar layer by layer inductive use of Lemma 2.   \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A., Wiuf, C., Behera, A., Gopalkrishnan, M. (2019). A Reaction Network Scheme Which Implements Inference and Learning for Hidden Markov Models. In: Thachuk, C., Liu, Y. (eds) DNA Computing and Molecular Programming. DNA 2019. Lecture Notes in Computer Science(), vol 11648. Springer, Cham. https://doi.org/10.1007/978-3-030-26807-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26807-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26806-0

  • Online ISBN: 978-3-030-26807-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics