Bayesian spatiotemporal modeling for inverse problems | Statistics and Computing Skip to main content

Advertisement

Log in

Bayesian spatiotemporal modeling for inverse problems

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Inverse problems with spatiotemporal observations are ubiquitous in scientific studies and engineering applications. In these spatiotemporal inverse problems, observed multivariate time series are used to infer parameters of physical or biological interests. Traditional solutions for these problems often ignore the spatial or temporal correlations in the data (static model), or simply model the data summarized over time (time-averaged model). In either case, the data information that contains the spatiotemporal interactions is not fully utilized for parameter learning, which leads to insufficient modeling in these problems. In this paper, we apply Bayesian models based on spatiotemporal Gaussian processess (STGP) to inverse problems with spatiotemporal data and show that the spatial and temporal information provides more effective parameter estimation and uncertainty quantification (UQ). We demonstrate the merit of Bayesian spatiotemporal modeling for inverse problems compared with traditional static and time-averaged approaches using a time-dependent advection–diffusion partial different equation (PDE) and three chaotic ordinary differential equations (ODE). We also provide theoretic justification for the superiority of spatiotemporal modeling to fit the trajectories even if it appears cumbersome (e.g. for chaotic dynamics).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

Download references

Acknowledgements

SL is supported by NSF grant DMS-2134256.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiwei Lan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Proofs

Theorem

(3.1) If we set the maximal eigenvalues of \(\textbf{C}_\textbf{x}\) and \(\textbf{C}_t\) such that \(\lambda _{\max }(\textbf{C}_\textbf{x})\lambda _{\max }(\textbf{C}_t)\le \sigma ^2_\epsilon \), then the following inequality holds regarding the Fisher information matrices, \({{\mathcal {I}}}_\text {\tiny S}\) and \({{\mathcal {I}}}_\text {\tiny ST}\), of the static model and the STGP model respectively:

$$\begin{aligned} {{\mathcal {I}}}_\text {\tiny ST}(u) \ge {{\mathcal {I}}}_\text {\tiny S}(u) \end{aligned}$$
(A1)

If we control the maximal eigenvalues of \(\textbf{C}_\textbf{x}\) and \(\textbf{C}_t\) such that \(\lambda _{\max }(\textbf{C}_\textbf{x})\lambda _{\max }(\textbf{C}_t)\le J\lambda _{\min }(\Gamma _\text {obs})\), then the following inequality holds regarding the Fisher information matrices, \({{\mathcal {I}}}_\text {\tiny T}\) and \({{\mathcal {I}}}_\text {\tiny ST}\), of the time-averaged model and the STGP model respectively:

$$\begin{aligned} {{\mathcal {I}}}_\text {\tiny ST}(u) \ge {{\mathcal {I}}}_\text {\tiny T}(u) \end{aligned}$$
(A2)

Proof

Denote \(\textbf{Y}_0=\textbf{Y}-\textbf{M}\). We have \(\Phi _*(u)=\frac{1}{2}\textrm{tr}\left[ \textbf{V}_*^{-1} {\textbf{Y}}^{{\textsf{T}}}_0 \textbf{U}_*^{-1} \textbf{Y}_0\right] \) with \(*\) being S or ST. \(\textbf{U}_\text {\tiny S}\), \(\textbf{V}_\text {\tiny S}\), \(\textbf{U}_\text {\tiny ST}\) and \(\textbf{V}_\text {\tiny ST}\) are specified in (16). We notice that both \(\textbf{U}_*\) and \(\textbf{V}_*\) are symmetric, then we have

$$\begin{aligned}{} & {} \frac{\partial \Phi _*}{\partial u_i} = \frac{1}{2}\left\{ \textrm{tr}\left[ \textbf{V}_*^{-1} \frac{\partial {\textbf{Y}}^{{\textsf{T}}}_0}{\partial u_i} \textbf{U}_*^{-1}\textbf{Y}_0 \right] + \textrm{tr}\left[ \textbf{V}_*^{-1} {\textbf{Y}}^{{\textsf{T}}}_0 \textbf{U}_*^{-1}\frac{\partial \textbf{Y}_0}{\partial u_i} \right] \right\} \\{} & {} \qquad \quad = \textrm{tr}\left[ \textbf{V}_*^{-1} {\textbf{Y}}^{{\textsf{T}}}_0 \textbf{U}_*^{-1}\frac{\partial \textbf{Y}_0}{\partial u_i} \right] \\{} & {} \frac{\partial ^2 \Phi _*}{\partial u_i \partial u_j} = \textrm{tr}\left[ \textbf{V}_*^{-1} {\textbf{Y}}^{{\textsf{T}}}_0 \textbf{U}_*^{-1}\frac{\partial ^2 \textbf{Y}_0}{\partial u_i \partial u_j} \right] + \textrm{tr}\left[ \textbf{V}_*^{-1} \frac{\partial {\textbf{Y}}^{{\textsf{T}}}_0}{\partial u_i} \textbf{U}_*^{-1}\frac{\partial \textbf{Y}_0}{\partial u_j} \right] \end{aligned}$$

Due to the i.i.d. assumption in both models, \(\textbf{Y}_0\) is independent of either \(\frac{\partial \textbf{Y}_0}{\partial u_i}\) or \(\frac{\partial ^2 \textbf{Y}_0}{\partial u_i \partial u_j}\). Therefore

$$\begin{aligned} ({{\mathcal {I}}}_*)_{ij}= & {} \textrm{E}\left[ \frac{\partial ^2 \Phi _*}{\partial u_i \partial u_j} \right] = \textrm{E}\left[ \textrm{tr}\left( \textbf{V}_*^{-1} \frac{\partial {\textbf{Y}}^{{\textsf{T}}}_0}{\partial u_i} \textbf{U}_*^{-1}\frac{\partial \textbf{Y}_0}{\partial u_j} \right) \right] \nonumber \\= & {} \textrm{E}\left[ \textrm{vec}{\left( \frac{\partial \textbf{Y}_0}{\partial u_i}\right) }^{{\textsf{T}}} (\textbf{V}_*^{-1} \otimes \textbf{U}_*^{-1}) \textrm{vec}\left( \frac{\partial \textbf{Y}_0}{\partial u_j}\right) \right] \end{aligned}$$
(A3)

For any \(\textbf{w}=(w_1,\ldots ,w_p)\in {{\mathbb {R}}}^p\) and \(\textbf{w}\ne \textbf{0}\), denote \({{\tilde{\textbf{w}}}}:= \sum _{i=1}^p w_i \textrm{vec}\left( \frac{\partial \textbf{Y}_0}{\partial u_i}\right) \). To prove \({{\mathcal {I}}}_\text {\tiny ST}(u) \ge {{\mathcal {I}}}_\text {\tiny S}(u)\), it suffices to show \({\tilde{\textbf{w}}}^{{\textsf{T}}}(\textbf{V}_\text {\tiny ST} \otimes \textbf{U}_\text {\tiny ST})^{-1} {{\tilde{\textbf{w}}}} \ge {{{\tilde{\textbf{w}}}}}^{{\textsf{T}}}(\textbf{V}_\text {\tiny S} \otimes \textbf{U}_\text {\tiny S})^{-1} {{\tilde{\textbf{w}}}}\).

By [Theorem 4.2.12 in Horn and Johnson (1991)], we know that any eigenvalue of \(\textbf{V}_* \otimes \textbf{U}_*\) has the format as a product of eigenvalues of \(\textbf{V}_*\) and \(\textbf{U}_*\) respectively, i.e. \(\lambda _k(\textbf{V}_* \otimes \textbf{U}_*) = \lambda _i(\textbf{V}_*)\lambda _j(\textbf{U}_*)\), where where \(\{\lambda _j(M)\}\) are the ordered eigenvalues of M, i.e. \(\lambda _1(M)\ge \cdots \ge \lambda _d(M)\). By the given condition we have

$$\begin{aligned}&\lambda _{IJ}((\textbf{V}_\text {\tiny ST} \otimes \textbf{U}_\text {\tiny ST})^{-1}) = \lambda _1^{-1}(\textbf{V}_\text {\tiny ST} \otimes \textbf{U}_\text {\tiny ST})\nonumber \\&\quad = \lambda _1^{-1}(\textbf{C}_t) \lambda _1^{-1}(\textbf{C}_\textbf{x}) \ge \sigma ^{-2}_\epsilon = \lambda _1((\textbf{V}_\text {\tiny S} \otimes \textbf{U}_\text {\tiny S})^{-1}) \end{aligned}$$
(A4)

Thus it completes the proof of the first inequality.

Similarly by the second condition, we have

$$\begin{aligned}&\lambda _{IJ}((\textbf{V}_\text {\tiny ST} \otimes \textbf{U}_\text {\tiny ST})^{-1}) = \lambda _1^{-1}(\textbf{C}_t) \lambda _1^{-1}(\textbf{C}_\textbf{x})\nonumber \\&\quad \ge J^{-1} \lambda _{\min }^{-1} (\Gamma _\text {obs}) = \lambda _1(\textbf{V}_\text {\tiny S}^- \otimes \textbf{U}_\text {\tiny S}^{-1}) \end{aligned}$$
(A5)

and complete the proof of the second inequality. \(\square \)

Fig. 18
figure 18

Advection–diffusion inverse problem: auto-correlations of observations in space (left) and time (right) respectively

Fig. 19
figure 19

Lorenz inverse problem: comparing posterior estimates of parameter u for two models (time-average and STGP) in terms of relative error of median \(\textrm{REM}=\frac{\Vert {{\hat{u}}} - u^\dagger \Vert }{\Vert u^\dagger \Vert }\). Each experiment is repeated for 10 runs of EnK (EKI and EKS respectively) and shaded regions indicate 5–95% quantiles of such repeated results

Fig. 20
figure 20

Rössler inverse problem: marginal (diagonal) and pairwise (lower triangle) sections of the joint density p(u) by the time-averaged model (left) and the STGP model (right) respectively. Red dashed lines indicate the true parameter values

Fig. 21
figure 21

Rössler inverse problem: comparing posterior estimates of parameter u for two models (time-average and STGP) in terms of relative error of median \(\textrm{REM}=\frac{\Vert {{\hat{u}}} - u^\dagger \Vert }{\Vert u^\dagger \Vert }\). Each experiment is repeated for 10 runs of EnK (EKI and EKS respectively) and shaded regions indicate 5–95% quantiles of such repeated results

Fig. 22
figure 22

Chen inverse problem: marginal (diagonal) and pairwise (lower triangle) sections of the joint density p(u) by the time-averaged model (left) and the STGP model (right) respectively. Red dashed lines indicate the true parameter values

Fig. 23
figure 23

Chen inverse problem: comparing posterior estimates of parameter u for two models (time-average and STGP) in terms of relative error of median \(\textrm{REM}=\frac{\Vert {{\hat{u}}} - u^\dagger \Vert }{\Vert u^\dagger \Vert }\). Each experiment is repeated for 10 runs of EnK (EKI and EKS respectively) and shaded regions indicate 5–95% quantiles of such repeated results

Theorem

(3.2) If we choose \(\textbf{C}_\textbf{x}=\Gamma _\text {obs}\) and require the maximal eigenvalue of \(\textbf{C}_t\), \(\lambda _{\max }(\textbf{C}_t)\le J\), then the following inequality holds regarding the Fisher information matrices, \({{\mathcal {I}}}_\text {\tiny T}\) and \({{\mathcal {I}}}_\text {\tiny ST}\), of the time-averaged model and the STGP model respectively:

$$\begin{aligned} {{\mathcal {I}}}_\text {\tiny ST}(u) \ge {{\mathcal {I}}}_\text {\tiny T}(u) \end{aligned}$$
(A6)

Proof

Denote \(\textbf{Y}_0=\textbf{Y}-\textbf{M}\). We have \(\Phi _*(u)=\frac{1}{2}\textrm{tr}\left[ \textbf{V}_*^{-1} {\textbf{Y}}^{{\textsf{T}}}_0 \textbf{U}_*^{-1} \textbf{Y}_0\right] \) with \(*\) being T or ST. \(\textbf{U}_\text {\tiny T}\), \(\textbf{V}_\text {\tiny T}\), \(\textbf{U}_\text {\tiny ST}\) and \(\textbf{V}_\text {\tiny ST}\) are specified in (16).

By the similar argument of the proof in Theorem 3.1, we have

$$\begin{aligned} ({{\mathcal {I}}}_*)_{ij} = \textrm{E}\left[ \frac{\partial ^2 \Phi _*}{\partial u_i \partial u_j} \right] = \textrm{tr}\left[ \textbf{V}_*^{-1} \textrm{E}\left( \frac{\partial {\textbf{Y}}^{{\textsf{T}}}_0}{\partial u_i} \textbf{U}_*^{-1}\frac{\partial \textbf{Y}_0}{\partial u_j} \right) \right] \end{aligned}$$
(A7)

For any \(\textbf{w}=(w_1,\ldots ,w_p)\in {{\mathbb {R}}}^p\) and \(\textbf{w}\ne \textbf{0}\), denote \(\textbf{W}:= \sum _{i,j=1}^p w_i \textrm{E}\left( \frac{\partial {\textbf{Y}}^{{\textsf{T}}}_0}{\partial u_i} \textbf{U}_*^{-1}\frac{\partial \textbf{Y}_0}{\partial u_j} \right) w_j\). We know \(\textbf{W}\ge \textbf{0}_{J\times J}\). It suffices to show \(\textrm{tr}[\textbf{V}_\text {\tiny ST}^{-1}\textbf{W}]\ge \textrm{tr}[\textbf{V}_\text {\tiny T}^{-1}\textbf{W}]\).

By the corollary (Marshall et al. 2011) of Von Neumann’s trace inequality (Mirsky 1975), we have

$$\begin{aligned} \begin{aligned}&\sum _{j=1}^J\lambda _j(\textbf{V}_*^{-1})\lambda _{J-j+1}(\textbf{W}) \le \textrm{tr}(\textbf{V}_*^{-1}\textbf{W})\\&\le \sum _{j=1}^J\lambda _j(\textbf{V}_*^{-1})\lambda _j(\textbf{W}) \end{aligned} \end{aligned}$$
(A8)

where \(\{\lambda _j(M)\}\) are the ordered eigenvalues of M, i.e. \(\lambda _1(M)\ge \cdots \ge \lambda _d(M)\). The only non-zero eigenvalue of \(\textbf{V}_{\tiny T}^-=J^{-2} (\varvec{{1}}_J {\varvec{{1}}}^{{\textsf{T}}}_J)\) is \(\lambda _1(\textbf{V}_{\tiny T}^-)=J^{-1}\). Therefore, we have

$$\begin{aligned} \begin{aligned}&\textrm{tr}[\textbf{V}_{\tiny T}^-\textbf{W}] \le J^{-1}\lambda _1(\textbf{W}) \le \lambda _J(\textbf{V}_\text {\tiny ST}^{-1}) \lambda _1(\textbf{W}) + \\&\sum _{j=1}^{J-1}\lambda _j(\textbf{V}_\text {\tiny ST}^{-1})\lambda _{J-j+1}(\textbf{W}) \le \textrm{tr}[\textbf{V}_\text {\tiny ST}^{-1}\textbf{W}] \end{aligned} \end{aligned}$$
(A9)

where \(\lambda _J(\textbf{V}_\text {\tiny ST}^{-1}) = \lambda _1^{-1}(\textbf{C}_t)\ge J^{-1}\) and \(\lambda _j(\textbf{V}_\text {\tiny ST}^{-1}), \lambda _j(\textbf{W})\ge 0\). \(\square \)

Appendix B More numerical results

See Figs. 18, 19, 20, 21, 22 and 23.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lan, S., Li, S. & Pasha, M. Bayesian spatiotemporal modeling for inverse problems. Stat Comput 33, 89 (2023). https://doi.org/10.1007/s11222-023-10253-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10253-z

Keywords

Navigation