A Reaction Network Scheme Which Implements Inference and Learning for Hidden Markov Models

Singh, Abhinav; Wiuf, Carsten; Behera, Abhishek; Gopalkrishnan, Manoj

doi:10.1007/978-3-030-26807-7_4

Abhinav Singh¹⁰,
Carsten Wiuf¹¹,
Abhishek Behera¹² &
…
Manoj Gopalkrishnan¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11648))

Included in the following conference series:

International Conference on DNA Computing and Molecular Programming

1137 Accesses
4 Citations

Abstract

With a view towards molecular communication systems and molecular multi-agent systems, we propose the Chemical Baum-Welch Algorithm, a novel reaction network scheme that learns parameters for Hidden Markov Models (HMMs). Each reaction in our scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every fixed point of the Baum-Welch algorithm for HMMs is a fixed point of our reaction network scheme, and every positive fixed point of our scheme is a fixed point of the Baum-Welch algorithm. We prove that the “Expectation” step and the “Maximization” step of our reaction network separately converge exponentially fast. We simulate mass-action kinetics for our network on an example sequence, and show that it learns the same parameters for the HMM as the Baum-Welch algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 6634; Price includes VAT (Japan)

Softcover Book: JPY 8293; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

Soloveichik, D., Seelig, G., Winfree, E.: DNA as a universal substrate for chemical kinetics. Proc. Natl. Acad. Sci. 107(12), 5393–5398 (2010)
Article Google Scholar
Srinivas, N.: Programming chemical kinetics: engineering dynamic reaction networks with DNA strand displacement. Ph.D. thesis, California Institute of Technology (2015)
Google Scholar
Qian, L., Soloveichik, D., Winfree, E.: Efficient turing-universal computation with DNA polymers. In: Sakakibara, Y., Mi, Y. (eds.) DNA 2010. LNCS, vol. 6518, pp. 123–140. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18305-8_12
Chapter MATH Google Scholar
Cardelli, L.: Strand algebras for DNA computing. Nat. Comput. 10, 407–428 (2011)
Article MathSciNet Google Scholar
Lakin, M.R., Youssef, S., Cardelli, L., Phillips, A.: Abstractions for DNA circuit design. J. R. Soc. Interface 9(68), 470–486 (2011)
Article Google Scholar
Cardelli, L.: Two-domain DNA strand displacement. Math. Struct. Comput. Sci. 23, 02 (2013)
Article MathSciNet Google Scholar
Chen, Y.-J., et al.: Programmable chemical controllers made from DNA. Nat. Nanotechnol. 8(10), 755–762 (2013)
Article Google Scholar
Lakin, M.R., Stefanovic, D., Phillips, A.: Modular verification of chemical reaction network encodings via serializability analysis. Theor. Comput. Sci. 632, 21–42 (2016)
Article MathSciNet Google Scholar
Srinivas, N., Parkin, J., Seelig, G., Winfree, E., Soloveichik, D.: Enzyme-free nucleic acid dynamical systems. Science 358, 6369 (2017)
Article Google Scholar
Cherry, K.M., Qian, L.: Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks. Nature 559, 7714 (2018)
Article Google Scholar
Zechner, C., Seelig, G., Rullan, M., Khammash, M.: Molecular circuits for dynamic noise filtering. Proc. Natl. Acad. Sci. 113(17), 4729–4734 (2016)
Article Google Scholar
Badelt, S., Shin, S.W., Johnson, R.F., Dong, Q., Thachuk, C., Winfree, E.: A general-purpose CRN-to-DSD compiler with formal verification, optimization, and simulation capabilities. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 232–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_15
Chapter MATH Google Scholar
Lakin, M.R., Youssef, S., Polo, F., Emmott, S., Phillips, A.: Visual DSD: a design and analysis tool for DNA strand displacement systems. Bioinformatics 27(22), 3211–3213 (2011)
Article Google Scholar
Hjelmfelt, A., Weinberger, E.D., Ross, J.: Chemical implementation of neural networks and Turing machines. Proc. Natl. Acad. Sci. 88(24), 10983–10987 (1991)
Article Google Scholar
Buisman, H.J., ten Eikelder, H.M.M., Hilbers, P.A.J., Liekens, A.M.L.: Computing algebraic functions with biochemical reaction networks. Artif. Life 15(1), 5–19 (2009)
Article Google Scholar
Oishi, K., Klavins, E.: Biomolecular implementation of linear I/O systems. IET Syst. Biol. 5(4), 252–260 (2011)
Article Google Scholar
Soloveichik, D., Cook, M., Winfree, E., Bruck, J.: Computation with finite stochastic chemical reaction networks. Nat. Comput. 7(4), 615–633 (2008)
Article MathSciNet Google Scholar
Chen, H.-L., Doty, D., Soloveichik, D.: Deterministic function computation with chemical reaction networks. Nat. Comput. 13(4), 517–534 (2014)
Article MathSciNet Google Scholar
Qian, L., Winfree, E.: Scaling up digital circuit computation with DNA strand displacement cascades. Science 332(6034), 1196–1201 (2011)
Article Google Scholar
Napp, N.E., Adams, R.P.: Message passing inference with chemical reaction networks. In: Advances in Neural Information Processing Systems, pp. 2247–2255 (2013)
Google Scholar
Qian, L., Winfree, E., Bruck, J.: Neural network computation with DNA strand displacement cascades. Nature 475(7356), 368–372 (2011)
Article Google Scholar
Cardelli, L., Kwiatkowska, M., Whitby, M.: Chemical reaction network designs for asynchronous logic circuits. Nat. Comput. 17(1), 109–130 (2018)
Article MathSciNet Google Scholar
Gopalkrishnan, M.: A scheme for molecular computation of maximum likelihood estimators for log-linear models. In: Rondelez, Y., Woods, D. (eds.) DNA 2016. LNCS, vol. 9818, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43994-5_1
Chapter MATH Google Scholar
Virinchi, M.V., Behera, A., Gopalkrishnan, M.: A stochastic molecular scheme for an artificial cell to infer its environment from partial observations. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 82–97. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_6
Chapter MATH Google Scholar
Viswa Virinchi, M., Behera, A., Gopalkrishnan, M.: A reaction network scheme which implements the EM algorithm. In: Doty, D., Dietz, H. (eds.) DNA 2018. LNCS, vol. 11145, pp. 189–207. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00030-1_12
Chapter MATH Google Scholar
Amari, S.: Information Geometry and Its Applications. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8
Book MATH Google Scholar
Csiszár, I., Matus, F.: Information projections revisited. IEEE Trans. Inf. Theor. 49(6), 1474–1490 (2003)
Article MathSciNet Google Scholar
Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 4 (1957)
Article MathSciNet Google Scholar
Shin, J.-S., Pierce, N.A.: A synthetic DNA walker for molecular transport. J. Am. Chem. Soc. 126(35), 10834–10835 (2004)
Article Google Scholar
Reif, J.: The design of autonomous DNA nano-mechanical devices: walking and rolling DNA. In: DNA Computing, pp. 439–461 (2003)
Article MathSciNet Google Scholar
Sherman, W., Seeman, N.: A precisely controlled DNA biped walking device. Nano Lett. 4, 1203–1207 (2004)
Article Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)
MATH Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)
Article MathSciNet Google Scholar
Feinberg, M.: On chemical kinetics of a certain class. Arch. Rational Mech. Anal 46, 1–41 (1972)
Article MathSciNet Google Scholar
Horn, F.J.M.: Necessary and sufficient conditions for complex balancing in chemical kinetics. Arch. Ration. Mech. Anal. 49, 172–186 (1972)
Article MathSciNet Google Scholar
Feinberg, M.: Lectures on chemical reaction networks (1979). http://www.che.eng.ohio-state.edu/FEINBERG/LecturesOnReactionNetworks/
Gopalkrishnan, M.: Catalysis in reaction networks. Bull. Math. Biol. 73(12), 2962–2982 (2011)
Article MathSciNet Google Scholar
Anderson, D.F., Craciun, G., Kurtz, T.G.: Product-form stationary distributions for deficiency zero chemical reaction networks. Bull. Math. Biol. 72(8), 1947–1970 (2010)
Article MathSciNet Google Scholar
Tu, B.P., McKnight, S.L.: Metabolic cycles as an underlying basis of biological oscillations. Nat. Rev. Mol. Cell Biol. 7, 9 (2006)
Google Scholar
McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)
MATH Google Scholar
Singh, A., Gopalkrishnan, M.: EM algorithm with DNA molecules. In: Poster Presentations of the 24th Edition of International Conference on DNA Computing and Molecular Programming (2018)
Google Scholar
Poole, W., et al.: Chemical Boltzmann machines. In: Brijder, R., Qian, L. (eds.) DNA 2017. LNCS, vol. 10467, pp. 210–231. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66799-7_14
Chapter Google Scholar
Roweis, S., Ghahramani, Z.: A unifying review of linear Gaussian models. Neural Comput. 11(2), 305–345 (1999)
Article Google Scholar
Maass, W.: On the computational power of winner-take-all. Neural Comput. 12(11), 2519–2535 (2000)
Article Google Scholar
Kappel, D., Nessler, B., Maass, W.: STDP installs in winner-take-all circuits an online approximation to hidden Markov model learning. PLoS Comput. Biol. 10, 3 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

UM-DAE Centre for Excellence in Basic Sciences, Mumbai, India
Abhinav Singh
Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
Carsten Wiuf
Indian Institute of Technology Bombay, Mumbai, India
Abhishek Behera & Manoj Gopalkrishnan

Authors

Abhinav Singh
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Wiuf
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Behera
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Gopalkrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhinav Singh .

Editor information

Editors and Affiliations

California Institute of Technology, Pasadena, CA, USA
Chris Thachuk
Arizona State University, Tempe, AZ, USA
Yan Liu

A Appendix

1.1 A.1 Comparing Points of Equilibria

We will now prove Theorem 1. For the sake of convenience, we first recall the statement.

Theorem 1. Every fixed point of the Baum-Welch algorithm for an HMM $\mathcal {M}=(H,V,\theta ,\psi ,\pi )$ is a fixed point for the corresponding Chemical Baum-Welch Algorithm with permissible rates k.

Proof

Consider a point $\varPhi = (\alpha ',\beta ',\gamma ',\xi ',\theta ',\psi ')$ with $\alpha ',\beta ',\gamma '\in \mathbb {R}_{\ge 0}^{L\times H}$, $\xi '\in \mathbb {R}_{\ge 0}^{(L-1)\times H\times H}$, $\theta '\in \mathbb {R}_{\ge 0}^{H\times H}$ and $\psi '\in \mathbb {R}_{\ge 0}^{H\times V}$. If $\varPhi $ is a fixed point of Baum-Welch Algorithm then it must satisfy:

$\alpha '_{1h} = \pi _h\psi '_{h v_1}$ for all $h\in H$. Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\alpha }_{1h}\big |_\varPhi =k^\alpha _{1h}\left( {\alpha '_{1h^*} }{ \pi _h\psi '_{h v_1}}-{\alpha '_{1h} }{ \pi _{h^*}\psi '_{h^* v_1}}\right) =0 \end{aligned}$$
for all $h\in H\setminus \{h^*\}$ and $\dot{\alpha }_{1h^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\alpha }_{1h}\big |_\varPhi =0$.
$\alpha '_{l h}=\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}$ for all $h\in H$ and $l=2,\ldots ,L$. Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\alpha }_{lh}\big |_\varPhi =k^\alpha _{lh}\left( {\alpha '_{l h^*}}{\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}-{\alpha '_{l h}}{\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh^*}\psi '_{h^* v_l}}\right) =0 \end{aligned}$$
for all $h\in H\setminus \{h^*\}$ and $l=2,\ldots ,L$ and $\dot{\alpha }_{lh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\alpha }_{lh}\big |_\varPhi =0$.
$\beta '_{l h}=\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}$ for all $h\in H$ and $l=1,\ldots ,L-1$. Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\beta }_{lh}\big |_\varPhi =k^\beta _{lh}\left( {\beta '_{l h^*}}{\sum _{g\in H} \theta '_{h^*g}\psi '_{g v_{l+1}}\beta '_{l+1,g}}-{\beta '_{l h}}{\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\right) =0 \end{aligned}$$
for all $h\in H\setminus \{h^*\}$ and $l=1,2,\dots ,L-1$ and $\dot{\beta }_{lh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\beta }_{lh}\big |_\varPhi =0$.
$\gamma '_l(h)=\frac{\alpha '_{l h}\beta '_{l h}}{\sum _{g\in H} \alpha '_{l g}\beta '_{l g}}$ for all $h\in H$ and $l=1,2,\dots ,L-1$. Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\gamma }_{lh}=k^\gamma _{lh}\left( {\gamma '_{lh^*}}{\alpha '_{l h}\beta '_{l h}}-{\gamma '_{lh}}{\alpha '_{l h^*}\beta '_{l h^*}}\right) =0 \end{aligned}$$
for all $h\in H\setminus \{h^*\}$ and $l=1,2,\dots ,L-1$ and $\dot{\gamma }_{lh^*}=-\sum _{h\not =h^*}\dot{\gamma }_{lh}=0$.
$\xi '_l(g,h)=\frac{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}{\sum _{f\in H} \alpha '_{l f}\beta '_{l f}}\text { for all } g,h\in H\text { and }l=1,\ldots ,L-1 $. Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\xi }_{lgh}\big |_\varPhi =k^\xi _{lgh}\left( {\xi '_{lh^*h^*}}{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}-{\xi '_{lgh}}{\alpha '_{l h^*}\theta '_{h^*h^*}\psi '_{ h^*v_{l+1}}\beta '_{l+1,h^*}}\right) =0 \end{aligned}$$
$\text { for all } g,h\in H\times H\setminus \{(h^*,h^*)\}\text { and } l=1,\ldots ,L-1$ and $\dot{\xi }_{lh^*h^*}\big |_\varPhi =-\sum _{(g,h)\not =(h^*,h^*)}\dot{\xi }_{lgh}\big |_\varPhi =0$.
$\theta '_{gh}=\frac{\sum _{l=1}^{L-1}\xi '_l(g,h)}{\sum _{l=1}^{L-1}\sum _{f\in H}\xi '_l(g,f)}\text { for all }g,h\in H $. Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\theta }_{gh}\big |_\varPhi =k^\theta _{gh}\left( {\theta '_{gh^*}}{\sum _{l=1}^{L-1}\xi '_{lgh}}-{\theta '_{gh}}{\sum _{l=1}^{L-1}\xi '_l(g,h^*)}\right) =0 \end{aligned}$$
$\text { for all }g\in H\text { and }h\in H\setminus \{h^*\}$ and $\dot{\theta }_{gh^*}\big |_\varPhi =-\sum _{h\not =h^*}\dot{\theta }_{gh}\big |_\varPhi =0 $.
$\psi '_{hw}=\frac{\sum _{l=1}^L\gamma '_l(h)\delta _{w,v_l}}{\sum _{l=1}^L\gamma '_l(h)}\text { for all }h\in H\text { and }w\in V $. Then for the chemical Baum-Welch Algorithm we have
$$\begin{aligned} \dot{\theta }_{gh}\big |_\varPhi =k^\theta _{gh}\left( {\theta '_{gh^*}}{\sum _{l=1}^{L-1}\xi '_{lgh}}-{\theta '_{gh}}{\sum _{l=1}^{L-1}\xi '_l(g,h^*)}\right) =0 \end{aligned}$$
$\text { for all }h\in H\text { and }w\in V\setminus \{v^*\}$ and $\dot{\psi }_{hw^*}\big |_\varPhi =-\sum _{w\not =w^*}\dot{\psi }_{hw}\big |_\varPhi =0$.

So $\varPhi $ is fixed point of the chemical Baum-Welch Algorithm. $\square $

We will now prove Theorem 2. For the sake of convenience, we first recall the statement.

Theorem 2. Every positive fixed point for the Chemical Baum-Welch Algorithm on a Baum Welch Reaction system $(BW(\mathcal {M},h^*,v^*,L),k)$ with permissible rate k is a fixed point for the Baum-Welch algorithm for the HMM $\mathcal {M}=(H,V,\theta ,\psi ,\pi )$.

Proof

Consider a positive point $\varPhi =(\alpha ',\beta ',\gamma ',\xi ',\theta ',\psi ')$ with $\alpha ',\beta ',\gamma '\in \mathbb {R}_{>0}^{L\times H}$, $\xi '\in \mathbb {R}_{>0}^{(L-1)\times H\times H}$, $\theta '\in \mathbb {R}_{>0}^{H\times H}$ and $\psi '\in \mathbb {R}_{>0}^{H\times V}$. If $\varPhi $ is a fixed point for the Chemical Baum-Welch Algorithm then we must have:

$\dot{\alpha }_{1h}\big |_\varPhi =0$ for all $h\in H$. This implies ${\alpha '_{1h} }\times { \pi _{h^*}\psi '_{h^* v_1}}={\alpha '_{1h^*} }\times { \pi _h\psi '_{h v_1}}\text {for all }h\in H\setminus {h^*}$. Since $\varPhi $ is positive, this implies
$$\begin{aligned} {\alpha '_{1h} }=\left( { \pi _h\psi '_{h v_1}}\right) \frac{\sum _{f\in H}\alpha '_{1f} }{\sum _{f\in H} \pi _{f}\psi '_{f v_1}}\quad \text { for all }h\in H \end{aligned}$$
$\dot{\alpha }_{lh}\big |_\varPhi =0$ for all $h\in H$ and $l=2,\ldots ,L$. This implies ${\alpha '_{l h}}\times {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh^*}\psi '_{h^* v_l}}={\alpha '_{l h^*}}\times {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=2,\ldots ,L$. Since $\varPhi $ is positive, this implies
$$\begin{aligned} {\alpha '_{l h}}=\left( {\sum _{g\in H} \alpha '_{l-1,g}\theta '_{gh}\psi '_{h v_l}}\right) \frac{\sum _{f\in H}\alpha '_{l f}}{\sum _{f,g\in H} \alpha '_{l-1,g}\theta '_{gf}\psi '_{f v_l}}\quad \text { for all }h\in H\text { and }l=2,\ldots ,L \end{aligned}$$
$\dot{\beta }_{lh}\big |_\varPhi =0$ for all $h\in H$ and $l=1,\ldots ,L$. This implies ${\beta '_{l h}}\times \sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}={\beta '_{l h^*}}\times \sum _{g\in H} \theta '_{h^*g}\psi '_{g v_{l+1}}\beta '_{l+1,g}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=1,\ldots ,L-1$. Since $\varPhi $ is positive, this implies
$$\begin{aligned} {\beta '_{l h}}=\left( {\sum _{g\in H} \theta '_{hg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\right) \frac{\sum _{f\in H}\beta '_{l f}}{\sum _{f,g\in H} \theta '_{fg}\psi '_{g v_{l+1}}\beta '_{l+1,g}}\quad \text { for all }h\in H\text { and }l=1,\ldots ,L-1 \end{aligned}$$
$\dot{\gamma }_{lh}\big |_\varPhi =0$ for all $h\in H$ and $l=1,\ldots ,L$. This implies ${\gamma '_l(h)}\times {\alpha '_{l h^*}\beta '_{l h^*}}={\gamma '_l(h^*)}\times {\alpha '_{l h}\beta '_{l h}}\quad \text { for all }h\in H\setminus \{h^*\}\text { and }l=1,2,\dots ,L-1$. Since $\varPhi $ is positive, this implies
$$\begin{aligned} {\gamma '_{lh}}=\left( \frac{\alpha '_{l h}\beta '_{l h}}{\sum _{g\in H}\alpha '_{l g}\beta '_{l g}}\right) {\sum _{g\in H}\gamma '_{lg}}\quad \text { for all }h\in H\text { and }l=1,2,\dots ,L-1 \end{aligned}$$
$\dot{\xi }_{lgh}\big |_\varPhi =0$ for all $g,h\in H$ and $l=1,\ldots ,L-1$. This implies ${\xi '_l(g,h)}\times {\alpha '_{l h^*}\theta '_{h^*h^*}\psi '_{ h^*v_{l+1}}\beta '_{l+1,h^*}}={\xi '_l(h^*,h^*)}\times {\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}$ $\text { for all } g,h\in H\times H\setminus \{(h^*,h^*)\}\text { and } l=1,\ldots ,L-1$. Since $\varPhi $ is positive, this implies
$$\begin{aligned} {\xi '_{lgh}}=\left( \frac{\alpha '_{l g}\theta '_{gh}\psi '_{ hv_{l+1}}\beta '_{l+1,h}}{\sum _{e,f\in H}\alpha '_{l f}\theta '_{ef}\psi '_{ fv_{l+1}}\beta '_{l+1,f}}\right) {\sum _{e,f\in H}\xi '_{lef}} \quad \text { for all } g,h\in H\times H\text { and } l=1,\ldots ,L-1 \end{aligned}$$
$\dot{\theta }_{gh}\big |_\varPhi =0$ for all $g,h\in H$. This implies ${\theta '_{gh}}\times {\sum _{l=1}^{L-1}\xi '_l(g,h^*)}= {\theta '_{gh^*}}\times {\sum _{l=1}^{L-1}\xi '_l(g,h)}\quad \text { for all }g\in H\text { and }h\in H\setminus \{h^*\}$. Since $\varPhi $ is positive, this implies
$$\begin{aligned} {\theta '_{gh}}=\left( \frac{\sum _{l=1}^{L-1}\xi '_{lgh}}{\sum _{f\in H}\sum _{l=1}^{L-1}\xi '_{lgf}}\right) {\sum _{f\in H}\theta '_{gf}}\quad \text { for all }g\in H\text { and }h\in H \end{aligned}$$
$\dot{\psi }_{hw}\big |_\varPhi =0$ for all $h\in H$ and $w\in V$. This implies ${\psi '_{hw}}\times {\sum _{l=1}^L\gamma '_l(h)\delta _{v^*,v_l}} = {\psi '_{hv^*}}\times {\sum _{l=1}^L\gamma '_l(h)\delta _{w,v_l}}\quad \text { for all }h\in H\text { and }w\in V\setminus \{v^*\}$. Since $\varPhi $ is positive, $E_{lv}=\delta _{v,v_l}$ and $\sum _{v\in V}\delta _{v,v_l}=1$ this implies
$$\begin{aligned} {\psi '_{hw}}=\left( \frac{\sum _{l=1}^L\gamma '_{lh}\delta _{w,v_l}}{\sum _{l=1}^L\gamma '_{lh}}\right) {\sum _{v\in V}\psi '_{hv}}\quad \text { for all }h\in H\text { and }w\in V \end{aligned}$$

Because of the relaxation we get by Remark 1, the point $\varPhi $ qualifies as a fixed point of the Baum-Welch algorithm. $\square $

1.2 A.2 Rate of Convergence Analysis

In this section we will prove Theorems 3 and 4, but first we will state and prove two useful lemmas.

Lemma 1

Let A be an arbitrary $n\times n$ matrix. Let W be an $r\times n$ matrix comprising of r linearly independent left kernel vectors of A so that $WA=0_{r,n}$, where $0_{i,j}$ denotes a $i\times j$ matrix with all entries zero. Further suppose W is in the row reduced form, that is,

$$\begin{aligned} W=\begin{pmatrix} W'&I_r\end{pmatrix} \end{aligned}$$

where $I_j$ denotes the $j\times j$ identity matrix and $W'$ is a $ r\times (n-r)$ matrix. Let A be given as

$$\begin{aligned} A=\begin{pmatrix} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{pmatrix}, \end{aligned}$$

where $A_{11}$ is a $(n-r)\times (n-r)$ matrix, $A_{12}$ is a $ (n-r)\times r$ matrix, $A_{21}$ is a $r\times (n-r)$ matrix, and $A_{22}$ is a $r\times r$ matrix.

Then the $n-r$ eigenvalues (with multiplicity) of the matrix $A_{11} -A_{12}W'$ are the same as the eigenvalues of A except for r zero eigenvalues.

Proof

Consider the $n\times n$ invertible matrix P given by

$$P=\begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ W' &{} I_r \end{pmatrix}, \quad P^{-1}=\begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ -W' &{} I_r\end{pmatrix},$$

with determinant $\det (P)=\det (P^{-1})=1$. We have

$$\begin{aligned} PAP^{-1}&= \begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ W' &{} I_r \end{pmatrix}\begin{pmatrix} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{pmatrix}\begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ -W' &{} I_r\end{pmatrix}\\&=\begin{pmatrix} A_{11} &{} A_{12} \\ 0_{r,n-r} &{} 0_{r,r} \end{pmatrix}\begin{pmatrix} I_{n-r} &{}0_{n-r,r} \\ -W' &{} I_r\end{pmatrix} \\&=\begin{pmatrix} A_{11} -A_{12}W' &{} A_{12} \\ 0_{r,n-r} &{} 0_{r,r} \end{pmatrix} \end{aligned}$$

This implies that the characteristic polynomial of A fulfils

$$\begin{aligned} \det (A-\lambda I_n)&=\det (P)\det (A-\lambda I_n)\det (P^{-1}) = \det (PAP^{-1}-\lambda I_n )\\&=\det \begin{pmatrix} A_{11} -A_{12}W' - \lambda I_{n-r} &{} A_{12} \\ 0_{r,n-r} &{} 0_{r,r} -\lambda I_r\end{pmatrix} \\&= (-1)^r\lambda ^r \det (A_{11} -A_{12}W' -\lambda I_{n-r}), \end{aligned}$$

and the statement follows. $\square $

Now we revisit the observation that every reaction in the Baum-Welch reaction network is a monomolecular transformation catalyzed by a set of species. For the purposes of our analysis, each reaction can be abstracted as a monomolecular reaction with time varying rate constants. This prompts us to consider the following monomolecular reaction system with n species $X_1,\ldots ,X_n$ and m reactions

where $r_j\not =p_j$, and $r_j,p_j\in \{1,\ldots ,n\}$, and $k_j(t)$, $j=1,\ldots ,m$, are mass-action reaction rate constants, possibly depending on time. We assume $k_j(t)>0$ for $t\ge 0$ and let $k(t)=(k_1(t),\ldots ,k_{m}(t))$ be the vector of reaction rate constants. Furthermore, we assume there is at most one reaction j such that $(r_j,p_j)=(r,p)\in \{1,\ldots ,n\}^2$ and that the reaction network is strongly connected. The later means there is a reaction path from any species $X_i$ to any other species $X_{i'}$. (In reaction network terms it means the network is weakly reversible and deficiency zero.)

The mass action kinetics of this reaction system is given by the ODE system

$$\begin{aligned} \dot{x}_i&=-x_i\!\!\sum _{j=1:r_j=i}^{m}k_j(t) +\sum _{j=1:p_j=i}^{m}k_j(t) x_{r_j}, \qquad i=1,\ldots ,n. \end{aligned}$$

Define the $n\times n$ matrix $A(t)=(a_{ii'}(t))_{i,i'=1,\ldots ,n}$ by

$$\begin{aligned} \begin{aligned} a_{ii}(t)&=-\sum _{j=1:r_j=i}^{m}k_j(t), \\ a_{ii'}(t)&=k_j(t),\quad \text {if there is }j\in \{1,\ldots ,m\}\,\,\text {such that}\,\, (r_j,p_j)=(i',i). \end{aligned} \end{aligned}$$

(1)

Then the ODE system might be written as

$$\begin{aligned} \dot{x}= A(t) x. \end{aligned}$$

(2)

Note that the column sums of A(t) are zero, implying that $\sum _{i=1}^n x_i$ is conserved.

Lemma 2

Assume k(t) for $t\ge 0$, converges exponentially fast towards $k=(k_1,\ldots ,k_{m})\in R^{m}_{>0}$ as $t\rightarrow \infty $, that is, there exists $\gamma _1>0$ and $K_1>0$ such that

$$\begin{aligned} \,||\, k(t)-k\,||\,\le K_1 e^{-\gamma _1 t}\quad \text {for}\quad t\ge 0. \end{aligned}$$

Let A(t) be the matrix as defined in Eq. 1. And let A be the matrix obtained with k inserted for k(t) in the matrix A(t) that is, $A=\lim _{t\rightarrow \infty } A(t)$.

Then solutions of ODE system $\dot{x}=A(t)x$ starting at $x(0)\in \mathbb {R}^n_{\ge 0}$ converges exponentially fast towards the equilibrium $a\in \mathbb {R}^n_{>0}$ of the ODE system $\dot{x}=Ax$ starting at $x(0)\in \mathbb {R}^n_{\ge 0}$, that is, there exists $\gamma >0$ and $K>0$ such that

$$\begin{aligned} \,||\, x(t)-a\,||\,\le K e^{-\gamma t}\quad \text {for}\quad t\ge 0. \end{aligned}$$

Proof

We will first rephrase the ODE system such that standard theory is applicable. Let rank of A be $n-r$. Let W be as defined in Lemma 1, that is, W be an $r\times n$ matrix comprising of r linearly independent left kernel vectors of A so that $WA=0$. Here since rank of A is $n-r$, the rows of W would form a basis for the left kernel of A. And as in Lemma 1, further suppose W is in the row reduced form, that is,

$$\begin{aligned} W=\begin{pmatrix} W'&I_r\end{pmatrix}. \end{aligned}$$

Then

$$\begin{aligned} \dot{x}=A x \end{aligned}$$

(3)

is a linear dynamical system with r conservation laws (one for each row of W). Let $Wx(0)=T\in \mathbb {R}^r$ be the vector of conserved amounts. Let $\hat{x}=(x_1,\ldots ,x_{n-r})$ and $\tilde{x}=(x_{n-r+1},\ldots ,x_{n})$. We will consider the (equivalent) dynamical system in which r variables are eliminated, expressed through the conservation laws

$$\begin{aligned} T=Wx=\begin{pmatrix} W'&I_r \end{pmatrix}x, \quad \text {or} \quad \tilde{x}=T-W' \hat{x}. \end{aligned}$$

As in Lemma 1, let A be given as

$$\begin{aligned} A=\begin{pmatrix} A_{11} &{} A_{12} \\ A_{21} &{} A_{22} \end{pmatrix}, \end{aligned}$$

where $A_{11}$ is a $(n-r)\times (n-r)$ matrix, $A_{12}$ is a $ (n-r)\times r$ matrix, $A_{21}$ is a $r\times (n-r)$ matrix, and $A_{22}$ is a $r\times r$ matrix. This yields

$$\begin{aligned} \begin{aligned} \dot{\hat{x}}&=\begin{pmatrix} A_{11}&A_{12} \end{pmatrix}\begin{pmatrix} \hat{x} \\ \tilde{x} \end{pmatrix} =\begin{pmatrix} A_{11}&A_{12} \end{pmatrix}\begin{pmatrix} \hat{x} \\ T-W' \hat{x} \end{pmatrix} \\&= (A_{11}-A_{12}W')\hat{x} +A_{12}T \\&= C\hat{x}+DT, \end{aligned} \end{aligned}$$

(4)

with $C=A_{11}-A_{12}W'$ and $D=A_{12}$. We call this as the reduced ODE system. Note that this reduced system has only $n-r$ variables and that the conservation laws are built directly into it. This implies that the differential equation changes if T is changed. The role of this construction is so that we can work only with the non-zero eigenvalues of the A.

As we also have $WA(t)=0$ for all $t\ge 0$, the ODE $\dot{x}=A(t)x$ can also be similarly reduced to

$$\begin{aligned} \dot{\hat{x}}=C(t)\hat{x}+D(t)T, \end{aligned}$$

(5)

with $C(t)=A_{11}(t)-A_{12}(t)W'$ and $D(t)=A_{12}(t)$, where analogous to $A_{11}$, we define $A_{11}(t)$ to be the top-left $(n-r)\times (n-r)$ sub-matrix of A(t) and analogous to $A_{12}$, we define $A_{12}(t)$ to be the top-right $ (n-r)\times r$ sub-matrix of A(t).

Now if a is the equilibrium of the ODE system $\dot{x}=Ax$ starting at x(0), then $\hat{a}=(a_1,\ldots ,a_{n-r})$ is an equilibrium of the reduced ODE system $\dot{\hat{x}}= C\hat{x}+DT$ starting at $\hat{x}(0)=(x_1(0),\ldots ,x_{n-r}(0))$. Suppose we are able to prove that solutions of reduced ODE $\dot{\hat{x}}=C(t)\hat{x}+D(t)T$ starting at $\hat{x}(0)$ converges exponentially fast towards $\hat{a}$ then because of the conservation laws $\tilde{x}=T-W' \hat{x}$, we would also have that solutions of $\dot{x} = A(t)x$ starting at x(0) converges exponentially fast towards a. So henceforth, we will work only with the reduced ODE systems. For notational convenience, we will drop the hats off $\hat{x}$ and $\hat{a}$ and simply refer to them as x and a respectively.

By subtracting and adding terms to the reduced ODE system (in Eq. 5), we have

$$\begin{aligned} \dot{x}&=C(t)x +D(t)T \\&= C x +DT+(C(t)-C)x + (D(t)-D)T. \end{aligned}$$

As a is an equilibrium of the ODE system $\dot{x}=C x+DT$, we have $C a+DT=0$.

Define $y=x-a$. Then

$$\begin{aligned} \dot{y}&= C x +DT+(C(t)-C)x + (D(t)-D)T \\&=C y +C a +DT+(C(t)-C)x + (D(t)-D)T \\&=C y +(C(t)-C)x(t) + (D(t)-D)T \\&= C y +E(t) \end{aligned}$$

where it is used that $Ca+DT=0$, and $E(t)=(C(t)-C)x(t) + (D(t)-D)T$.

The solution to the above ODE system is known to be

$$\begin{aligned} y(t)= e^{C t}y(0)+\int _0^t e^{C(t-s)}E(s)\,ds. \end{aligned}$$

(6)

We have, using (6) and the triangle inequality,

$$\begin{aligned} \,||\,y(t)\,||\,\le \,||\,e^{C t}y(0)\,||\,+\int _0^t \,||\,e^{C(t-s)}\,||\,\,||\,E(s)\,||\, ds. \end{aligned}$$

Now A as defined (see Eq. 1 with k inserted for k(t)) would form a Laplacian matrix over a strongly connected graph and so it follows that all the eigenvalues of A are either zero or have negative real part. And using $C=A_{11}-A_{12}W'$ and Lemma 1 it follows that all eigenvalues of C have negative real part. Hence it follows that

$$\begin{aligned} \,||\,e^{C t}\,||\,\le K_2 e^{-\gamma _2 t}, \end{aligned}$$

where $0<\gamma _2<-\mathfrak {R}(\lambda _1)$ and $K_2>0$. Here $\lambda _1$ is the eigenvalue of C with the largest real part.

The matrices C(t) and D(t) are linear in k(t). And as k(t) converges exponentially fast towards k, it follows that the matrices C(t) and D(t) converge exponentially fast towards C and D respectively. Hence it follows that

$$\begin{aligned} \,||\,E(t)\,||\,&= \,||\,(C(t)-C)x(t) + (D(t)-D)T\,||\, \\&\le \,||\,(C(t)-C)\,||\,\,||\,x(t)\,||\,+ \,||\,(D(t)-D)\,||\,\,||\,T\,||\, \\&\le K_3 e^{-\gamma _3 t}+ K_4 e^{-\gamma _4 t}\\&\le K_5 e^{-\gamma _5 t} \end{aligned}$$

where

$\,||\,(C_0(t)-C)\,||\,\,||\,x(t)\,||\,\le K_3 e^{-\gamma _3 t}$ for some $K_3,\gamma _3>0$ as C(t) converges exponentially fast towards C and x(t) is bounded (as in the original ODE $\sum _{i=1}^n x_i$ is conserved), and
$\,||\,(D_0(t)-D)\,||\,\,||\,T\,||\,\le K_4 e^{-\gamma _4 t}$ for some $K_4,\gamma _4>0$ as D(t) converges exponentially fast towards D, and
$K_5=\frac{1}{2}\max (K_3, K_4)$ and $\gamma _5=\min (\gamma _3,\gamma _4)$.

Collecting all terms we have for all $t\ge 0$,

$$\begin{aligned} \,||\,y(t)\,||\,&\le \,||\,y(0)\,||\,K_2e^{-\gamma _2 t}+\int _0^t K_2 e^{-\gamma _2 (t-s)}\times K_5 e^{-\gamma _5 s}ds \\&\le K_0e^{-\gamma _0 t}+K_0\int _0^t e^{-\gamma _0 t}ds\\&= K_0e^{-\gamma _0 t}(1+t)\\&\le K e^{-\gamma t} \end{aligned}$$

by choosing $K_0=\max \left( K_2K_5, \,||\,y(0)\,||\,K_2\right) $ and $\gamma _0=\min (\gamma _1,\gamma _2)$. In the last line $\gamma $ is chosen such that $0<\gamma <\gamma _0$ and K is sufficiently large. Since $y(t)=x(t)-a$ we have,

$$\begin{aligned} \,||\,x(t)-a\,||\,\le K e^{-\gamma t}, \end{aligned}$$

as required. $\square $

We will now prove Theorem 3. For the sake of convenience, we first recall the statement.

Theorem 3. For the Baum Welch Reaction System $(BW(\mathcal {M},h^*,v^*,L),k)$ with permissible rates k, if the concentrations of $\theta $ and $\psi $ species are held fixed at a positive point then the Forward, Backward and Expection step reaction systems on $\alpha ,\beta ,\gamma $ and $\psi $ species converge to equilibrium exponentially fast.

Proof

It follows by repeated use of Lemma 2. For $l=1$ the forward reaction network can be interpretated as the following molecular reactions:

for all $h\in H\setminus \{h^*\}$ and $w\in V$, as they are dynamically equivalent. Here the effective rate constants ($\pi _{h^*}\psi _{h^*w}E_{1w}$ or $\pi _{h}\psi _{hw}E_{1w}$) are independent of time and so the conditions of Lemma 2 are fulfilled. Thus this portion of the reaction network converges exponentially fast.

The rest of the forward reaction network can be similarly interpretated as the following molecular reactions:

for all $g\in H$, $h\in H\setminus \{h^*\}, l=2,\ldots , L$ and $w\in V$. For layers $l=2,\ldots ,L$ of the forward reaction network we observe that the effective rate constants ($\alpha _{l-1,g}\theta _{gh^*}\psi _{h^*w}E_{lw}$ or $\alpha _{l-1,g}\theta _{gh}\psi _{hw}E_{lw}$) for layer l depend on time only through $\alpha _{l-1,g}$. If we suppose that the concentration of $\alpha _{l-1,g}$ converges exponentially fast, then we can use Lemma 2 to conclude that the concentration of $\alpha _{lh}$ also converges exponentially fast. Thus using Lemma 2 inductively layer by layer we conclude that forward reaction network converges exponentially fast. The backward reaction network converges exponentially fast, similarly.

For the expectation reaction network it likewise follows by induction. But here, notice if we interpreted the expectation network similarly into molecular reactions, the effective rate constants would depend on time through the products such as $\alpha _{lh}\beta _{lh}$ or $\alpha _{lh}\beta _{l+1,h}$. So to apply Lemma 2 we need the following: If $\alpha _l(t)$ and $\beta _l(t)$ converge exponentially fast towards $a_l$ and $b_l$ then the product $\alpha _l(t)\beta _l(t)$ converges exponentially fast towards $a_l b_l$ as

$$\begin{aligned} \,||\,\alpha _l(t)\beta _l(t)-a_l b_l\,||\,&=\,||\,(\alpha _l(t)-a_l)(\beta _l(t)-b_l)+a_l( \beta _l(t)-b_l)+b_l( \alpha _l(t)-a_l)\,||\, \\&\le \,||\,\alpha _l(t)-a_l\,||\,\,||\,\beta _l(t)-b_l\,||\,+K \,||\,\beta _l(t)-b_l\,||\,+K \,||\,\alpha _l(t)-a_l\,||\,, \end{aligned}$$

where K is some suitably large constant. We can further observe that $\alpha _l(t)\beta _l(t)$ converges exponentially fast towards $a_l b_l$ at rate $\gamma =\min (\gamma _a,\gamma _b)$, where $\gamma _a$ and $\gamma _b$, respectively, are the exponential convergence rates of $\alpha _l(t)$ and $\beta _l(t)$. A similar argument goes for the products of the form $\alpha _l(t)\beta _{l+1}(t)$. And thus the expectation reaction network, also converges exponentially fast. $\square $

We will now prove Theorem 4. For the sake of convenience, we first recall the statement.

Theorem 4. For the Baum Welch Reaction System $(BW(\mathcal {M},h^*,v^*,L),k)$ with permissible rates k, if the concentrations of $\alpha ,\beta ,\gamma $ and $\xi $ species are held fixed at a positive point then the Maximization step reaction system on $\theta $ and $\psi $ converges to equilibrium exponentially fast.

Proof

Exponential convergence of the maximisation network follows by a similar layer by layer inductive use of Lemma 2. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Wiuf, C., Behera, A., Gopalkrishnan, M. (2019). A Reaction Network Scheme Which Implements Inference and Learning for Hidden Markov Models. In: Thachuk, C., Liu, Y. (eds) DNA Computing and Molecular Programming. DNA 2019. Lecture Notes in Computer Science(), vol 11648. Springer, Cham. https://doi.org/10.1007/978-3-030-26807-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-26807-7_4
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26806-0
Online ISBN: 978-3-030-26807-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Reaction Network Scheme Which Implements Inference and Learning for Hidden Markov Models

Abstract

Access this chapter

Subscribe and save

Buy Now

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Comparing Points of Equilibria

Proof

Proof

1.2 A.2 Rate of Convergence Analysis

Lemma 1

Proof

Lemma 2

Proof

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation