Examining the identifiability and estimability of the phase-type ageing model

Cheng, Boquan; Mamon, Rogemar

doi:10.1007/s00180-023-01329-5

Examining the identifiability and estimability of the phase-type ageing model

Original paper
Published: 28 February 2023

Volume 39, pages 963–1004, (2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

302 Accesses
Explore all metrics

Abstract

In this paper, the identifiability and the estimability of a particular phase-type ageing model (PTAM) are investigated. We consider a PTAM that is shown to be identifiable, although it has poor estimability when the only observation is the time until absorption. We use a data-cloning method to assess model estimability, which is also analysed visually via contour plots and marginal likelihood functions. The PTAM’s estimability under different scenarios is compared, from the best scenario of a fully observable Markov process for each individual to the worst scenario wherein the only observation is the time until absorption for each individual. Some in-between scenarios are also studied and include the situations where the state could be observed every several years and the state could be measured with some error. Certain conditions are provided regarding the state of the Markov chain that improves the PTAM’s estimability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A uniformisation-driven algorithm for inference-related estimation of a phase-type ageing model

Article 02 October 2022

Approaches to Statistical Analysis of Longitudinal Data on Aging, Health, and Longevity: Biodemographic Perspectives

Making Predictions Using Poorly Identified Mathematical Models

Article Open access 27 May 2024

References

Albrecher H, Bladt M, Bladt M, Yslas J (2022) Mortality modeling and regression with matrix distributions. Insur Math Econ 107:68–87
MathSciNet Google Scholar
Alalouf I, Styan G (1979) Characterizations of estimability in the general linear model. Ann Stat 7(1):194–200
MathSciNet Google Scholar
Asmussen S, Nerman O, Olsson M (1996) Fitting phase-type distributions via the EM algorithm. Scandinavian J Stat 23:419–441
Google Scholar
Augustin R, Büscher K (1982) Characteristics of the Cox-distribution. ACM Sigmetrics Perform Eval Rev 12(1):22–32
Google Scholar
Bobbio A, Cumani A (1992) ML estimation of the parameters of a PH distribution in triangular canonical form, Computer. Perform Eval 22:33–46
Google Scholar
Bolch G, Greiner S, Meer H, Trivedi K (2006) Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science applications. Wiley-Interscience, New York
Google Scholar
Bunch D (1991) Estimability in the multinomial probit model. Transp Res Part B Method 25(1):1–12
MathSciNet Google Scholar
Bunke H, Bunke O (1974) Identifiability and estimability. Stat J Theor Appl Stat 5(3):223–233
MathSciNet Google Scholar
Campbell D, Lele S (2014) An ANOVA test for parameter estimability using data cloning with application to statistical inference for dynamic systems. Comput Stat & Data Anal 70:257–267
MathSciNet Google Scholar
Casella G, Berger R (2021) Statistical Inference. Cengage Learning, Belmont
Google Scholar
Cavanaugh J, Neath A (2019) The Akaike information criterion: background, derivation, properties, application, interpretation, and refinements. Wiley Interdiscip Rev Comput Stat 11(3):e1460
MathSciNet Google Scholar
Cheng B, Jones B, Liu X, Ren J (2021) The mathematical mechanism of biological aging. North Am Actuar J 25:73–93
MathSciNet Google Scholar
Cheng B, Mamon R (2022) A uniformisation-driven algorithm for inference-related estimation of a phase-type ageing model. Lifetime Data Anal. https://doi.org/10.1007/s10985-022-09577-1
Article PubMed Google Scholar
Cox D (1955) A use of complex probabilities in the theory of stochastic processes. Math Proc Cambridge Philos Soc 51(2):313–319
ADS MathSciNet Google Scholar
Cumani A (1982) On the canonical representation of homogeneous Markov processes modelling failure-time distributions. Microelectron Reliab 22(3):583–602
MathSciNet Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39(1):1–22
MathSciNet Google Scholar
Deresa N, Keilegom I (2020) A multivariate normal regression model for survival data subject to different types of dependent censoring. Comput Stat & Data Anal 144:106879
MathSciNet Google Scholar
Dörre A (2020) Bayesian estimation of a lifetime distribution under double truncation caused by time-restricted data collection. Stat Papers 61(3):945–965
MathSciNet Google Scholar
Eberly L, Carlin B (2000) Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models. Stat Med 19(17–18):2279–2294
CAS PubMed Google Scholar
Fackrell M, Characterisation of matrix-exponential distributions, PhD Thesis, University of Adelaide, Australia, (2003)
Fackrell M (2009) Modelling healthcare systems with phase-type distributions. Health Care Manage Sci 12(1):11–26
MathSciNet Google Scholar
Faddy M (1994) Examples of fitting structured phase-type distributions. Appl Stoch Models Data Anal 10(4):247–255
Google Scholar
Faddy M (1998) On inferring the number of phases in a Coxian phase-type distribution. Stoch Models 14(1–2):407–417
Google Scholar
Faddy M, Latouche G, Taylor P (2002) Penalised maximum likelihood estimation of the parameters in a Coxian phase-type distribution. In: Latouche G, Taylor P (eds) Matrix-analytic methods: theory and applications. World Scientific, New Jersey, pp 107–114
Google Scholar
Fisher F (1966) The identification problem in econometrics. McGraw-Hill, New York
Google Scholar
Gibbs J (1902) Elementary principles in statistical mechanics: developed with special reference to the rational foundation of thermodynamics. Charles Scribner’s Sons, New York
Google Scholar
Gompertz B (1825) On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philos Trans R Soc London 115:513–583
ADS Google Scholar
Govorun M, Jones B, Liu X, Stanford D (2018) Physiological age, health costs, and their interrelation. North Am Actuar J 22(3):323–340
Google Scholar
Govorun M, Latouche G, Loisel S (2015) Phase-type aging modeling for health dependent costs. Insur Math Econ 62:173–183
MathSciNet Google Scholar
Harper P, Knight V, Marshall A (2012) Discrete conditional phase-type models utilising classification trees: Application to modelling health service capacities. Eur J Operat Res 219(3):522–530
Google Scholar
Hassan A, Jones B, Stanford D (2014) The use of phase-type models for disability insurance calculations. Scandinavian Actuar J 2014(8):714–728
MathSciNet Google Scholar
Hastings W (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109
MathSciNet Google Scholar
Hengl S, Kreutz C, Timmer J, Maiwald T (2007) Data-based identifiability analysis of non-linear dynamical models. Bioinformatics 23(19):2612–2618
CAS PubMed Google Scholar
Hsiao C (1983) Identification, Handbook of Econometrics 1. North Holland, Elsevier, pp 223–283
Google Scholar
Huzurbazar A (1999) Flowgraph models for generalized phase type distributions having non-exponential waiting times. Scandinavian J Stat 26(1):145–157
Google Scholar
Jacquez J, Greif P (1985) Numerical parameter identifiability and estimability: Integrating identifiability, estimability, and optimal sampling design. Math Biosci 77(1–2):201–227
Google Scholar
Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
MathSciNet Google Scholar
Kullback S (1997) Information theory statistics. Courier Corporation, New York
Google Scholar
Lehmann E, Casella G (2006) Theory of point estimation. Springer Science & Business Media, New York
Google Scholar
Lele S, Nadeem K, Schmuland B (2010) Estimability and likelihood inference for generalized linear mixed models using data cloning. J Am Stat Assoc 105(492):1617–1625
MathSciNet CAS Google Scholar
Lin X, Liu X (2007) Markov aging process and phase-type law of mortality. North Am Actuar J 11:92–109
MathSciNet Google Scholar
Makeham W (1860) On the law of mortality and the construction of annuity tables. The Assurance Magazine, and Journal of the Institute of Actuaries 8(6):301–310
Google Scholar
Makeham W (1889) On the further development of Gompertz’s law. J Inst Actuar (1886-1994) 28(2):152–159
Google Scholar
Marshall A, Zenga M (2009) Simulating Coxian phase-type distributions for patient survival. Int Trans Operat Res 16(2):213–226
Google Scholar
McClean S, Gillespie J, Garg L, Barton M, Scotney B, Kullerton K (2014) Using phase-type models to cost stroke patient care across health, social and community services. Eur J Operat Res 236(1):190–199
MathSciNet Google Scholar
McLean K, McAuley K (2012) Mathematical modelling of chemical processes-obtaining the best model predictions and parameter estimates using identifiability and estimability procedures. Can J Chem Eng 90(2):351–366
CAS Google Scholar
Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092
ADS CAS Google Scholar
Miettinen O (1976) Estimability and estimation in case-referent studies. Am J Epidemiol 103(2):226–235
CAS PubMed Google Scholar
Moriguchi S, Murota K (2012) On discrete Hessian matrix and convex extensibility. J Operat Res Soc Japan 55(1):48–62
MathSciNet Google Scholar
Mu J (2019) Exploring the Estimability of Mark-Recapture Models with Individual, Time-Varying Covariates using the Scaled Logit Link Function, MSc Thesis, The University of Western Ontario, Canada
Neath A, Cavanaugh J (2012) The Bayesian information criterion: background, derivation, and applications. Wiley Interdiscip Rev Comput Stat 4(2):199–203
Google Scholar
Nelder J, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
MathSciNet Google Scholar
Payne K, Marshall H, Cairns K (2012) Investigating the efficiency of fitting Coxian phase-type distributions to health care data. IMA J Manage Math 23(2):133–145
Google Scholar
Putter H, Houwelingen H (2015) Frailties in multi-state models: are they identifiable? Do we need them? Statistical Methods Med Res 24(6):675–692
MathSciNet Google Scholar
Ramírez-Cobo P, Lillo R, Wiper M (2010) Nonidentifiability of the two-state Markovian arrival process. J Appl Prob 47(3):630–649
MathSciNet Google Scholar
Titman A, Sharples L (2010) semi-Markov models with phase-type sojourn distributions. Biometrics 66(3):742–752
MathSciNet PubMed Google Scholar
Vrieze S (2012) Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol Methods 17(2):228–243
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada through R. Mamon’s Discovery Grant (RGPIN-2017-04235). The valuable comments of Bruce Jones, the anonymous reviewers, and the editor are highly appreciated.

Author information

Authors and Affiliations

Department of Statistical and Actuarial Sciences, The University of Western Ontario, 1151 Richmond Street, London, Ontario, N6A 5B7, Canada
Boquan Cheng & Rogemar Mamon
Division of Physical Sciences and Mathematics, University of the Philippines Visayas, Miagao, Iloilo, Philippines
Rogemar Mamon

Authors

Boquan Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Rogemar Mamon
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Rogemar Mamon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 528 KB)

Appendices

Appendix A: Flexibility of the PTAM’s resulting distribution

Obtaining a reasonably good fit for the PTAM was demonstrated in Sections 3.2 and 4.2 of Cheng et al. (2021). This motivated us to explore the flexibility of the PTAM. In particular, we consider the question: Can the PTAM achieve a good fit on any set of lifetime data? The assessment of the PTAM’s flexibility will be based on how well the resulting pdf and hazard function could approximate the true ones. Both functions are easily obtained from the data and commonly used to validate the calibrated result.

Definition A.1

We define i as the state label for states $i=1,\ldots ,m$ in a Coxian model with m transient states.

Lemma A.1

Consider an $m-$state Coxian model with transition rate from one transient state to the next transient state equal to $\lambda$, and the absorption rate in state i is $h_i$ for $i=1,\ldots ,m$. For any $t>0$, let $Y_t$ be the state variable at time t. Then, $P(Y_t \ge k|Y_t \in E)$ is an increasing function with respect to t for any $k=1,\ldots ,m$.

Proof

The $m\times m$ intensity matrix of the Coxian model is

$$\begin{aligned} \varvec{\Lambda }=\left[ \begin{array}{ccccc} -\left( \lambda +h_{1}\right) &{} \lambda \\ &{} -\left( \lambda +h_{2}\right) &{} \lambda \\ &{} &{} \ddots \\ &{} &{} &{} -\left( \lambda +h_{m-1}\right) &{} \lambda \\ &{} &{} &{} &{} -h_{m} \end{array}\right] , \end{aligned}$$

where $\lambda$ is the rate of transition from state i to $i+1$, and $h_i$ is the absorption rate at state i. For any fixed $t>0$, let $Y_{\ell ,t}$ be the state random variable at time t assuming the starting state is $\ell$ at 0, then $Y_t=Y_{1,t}$. For every $k=1,{\ldots },m$,

$$\begin{aligned} P(Y_t \ge k|Y_{t} \in E)&=\frac{\sum _{j=k}^m P(Y_{1,t}=j)}{\sum _{j=1}^m P(Y_{1,t}=j)}=\frac{\varvec{\alpha }e^{\varvec{\Lambda }t} \varvec{e_{k}}}{\varvec{\alpha }e^{\varvec{\Lambda }t}\varvec{e}}, \end{aligned}$$

where $\varvec{e_{k}}$ is an $m\times 1$ column vector with first $k-1$ elements equal to 0 and the remaining elements are equal to 1. The quantity $P(Y_t \ge k|Y_t \in E)$ is the probability that the state of the individual at time t is greater than or equal to k conditional on this individual being alive. We note that $P(Y_t \ge 1|Y_{t} \in E)=1$. When $k \ne 1$, the first derivative of $P(Y_t \ge k|Y_t \in E)$ with respect to t is

$$\begin{aligned} \frac{\textrm{d}P(Y_t \ge k|Y_t \in E)}{\textrm{d}t}= \frac{\varvec{\alpha }e^{\varvec{\Lambda }t}(\varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha } -\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda })e^{\varvec{\Lambda }t}\varvec{e}}{\left( \varvec{\alpha }e^{\varvec{\Lambda }t}\varvec{e}\right) ^2}. \end{aligned}$$

Let $g(t)=\varvec{\alpha }e^{\varvec{\Lambda }t}(\varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha } -\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda })e^{\varvec{\Lambda }t}\varvec{e}$. The (i, j) entry of $e^{\varvec{\Lambda }t}$, denoted by $P_{i,j}(t)$, is the probability of being in state j at time t given that the starting state is i at time 0. It may be verified that

$$\begin{aligned} \varvec{\alpha }e^{\varvec{\Lambda }t}&=(P_{1,1}(t),...,P_{1,m}(t)) \, {\text {a }1\times m \text { row vector}}\\ e^{\varvec{\Lambda }t}\varvec{e}&=\left( \sum _{j=1}^mP_{1,j}(t),...,\sum _{j=1}^mP_{m,j}(t)\right) ^{\intercal }\, \text {an} m\times 1 \text { column vector}\\ \varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha }&=\left[ \begin{array}{ccccccc} 0 &{} 0&{}0&{}\cdots \\ \vdots &{} \vdots \\ 0 &{} 0&{}0&{}\cdots \\ \lambda &{} 0&{}0&{}\cdots \\ -h_k &{} 0 &{}0 &{}\cdots \\ -h_{k+1} &{} 0 &{}0&{}\cdots \\ \vdots \\ -h_{m} &{} 0 &{}0&{}\cdots \end{array}\right] \, \text {an}\; m\times m \text { matrix}\\ \varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda }&=\left[ \begin{array}{ccccccc} 0 &{} 0&{}0&{}\cdots \\ \vdots &{} \vdots \\ 0 &{} 0&{}0&{}\cdots \\ 0 &{} 0&{}0&{}\cdots \\ -(\lambda +h_1) &{} \lambda &{}0 &{}\cdots \\ -(\lambda +h_1) &{} \lambda &{}0&{}\cdots \\ \vdots \\ -(\lambda +h_1) &{} \lambda &{}0&{}\cdots \end{array}\right] \, \text {an} \;m\times m \text { matrix}, \end{aligned}$$

where $\lambda$ is the $(k-1,1)$ entry of $\varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha }$ and the entries of the entire first $(k-1)$th rows in $\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda }$ are 0’s.

Remark A.1

Bold subscripts are used in the vector notation, and normal subscripts are associated to an element of a vector. For example, since $\varvec{A_k}$ is a vector, $\varvec{A}_k$ is the kth element of vector $\varvec{A}$.

Hence, $\varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha } -\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda } =(\varvec{\beta _1},\varvec{\beta _2},\varvec{0},$...$,\varvec{0})^{\intercal }$, where $\varvec{\beta _2}$ is an $m\times 1$ column vector with the first $k-1$ elements all equal to 0 and the remaining elements are all equal to $-\lambda$; and $\varvec{\beta _1}$ is an $m\times 1$ column vector with the first $k-2$ elements equal to 0, the $(k-1)$th element equals $\lambda$, and the $\ell$th element ($\ell \ge k$) equals $\lambda +h_1-h_{\ell }$. In matrix form,

$$\begin{aligned} \varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha } -\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda }=\left[ \begin{array}{ccccccc} 0 &{} 0&{}0&{}\cdots \\ \vdots &{} \vdots \\ 0 &{} 0&{}0&{}\cdots \\ \lambda &{} 0&{}0&{}\cdots \\ \lambda +h_1-h_k &{} -\lambda &{}0 &{}\cdots \\ \lambda +h_1-h_{k+1} &{} -\lambda &{}0&{}\cdots \\ \vdots \\ \lambda +h_1-h_{m} &{} -\lambda &{}0&{}\cdots \end{array}\right] . \end{aligned}$$

Then, g(t) simplifies to

$$\begin{aligned} g(t)&=(P_{1,1}(t),{\ldots },P_{1,m}(t))\left[ \begin{array}{ccccccc} 0 &{} 0&{}0&{}\cdots \\ \vdots &{} \vdots \\ 0 &{} 0&{}0&{}\cdots \\ \lambda &{} 0&{}0&{}\cdots \\ \lambda +h_1-h_k &{} -\lambda &{}0 &{}\cdots \\ \lambda +h_1-h_{k+1} &{} -\lambda &{}0&{}\cdots \\ \vdots \\ \lambda +h_1-h_{m} &{} -\lambda &{}0&{}\cdots \end{array}\right] \left( \sum _{j=1}^mP_{1,j}(t),...,\sum _{j=1}^mP_{m,j}(t)\right) ^{\intercal }\\&=\left( \left( \lambda P_{1,k-1}(t)+ \sum _{\ell =k}^m (\lambda +h_1-h_\ell ) P_{1,\ell }(t)\right) ,-\lambda \sum _{\ell =k}^m P_{1,\ell }(t),0,...,0\right) \left( \sum _{j=1}^mP_{1,j}(t),...,\sum _{j=1}^mP_{m,j}(t)\right) ^{\intercal } \\&=\left( \lambda P_{1,k-1}(t)+ \sum _{\ell =k}^m (\lambda +h_1-h_\ell ) P_{1,\ell }(t)\right) \sum _{j=1}^{m} P_{1,j}(t)-\sum _{\ell =k}^m \lambda P_{1,\ell }(t) \sum _{j=1}^m P_{2,j}(t). \end{aligned}$$

On the other hand, we know that

$$\begin{aligned} \frac{\textrm{d}e^{\varvec{\Lambda }t}}{\textrm{d}t} =\varvec{\Lambda }e^{\varvec{\Lambda }t}=e^{\varvec{\Lambda }t}\varvec{\Lambda }, \end{aligned}$$

yielding

$$\begin{aligned} \frac{\textrm{d}P_{1,k}(t)}{\textrm{d}t}&=-(\lambda +h_1)P_{1,k}(t)+\lambda P_{2,k}(t)=\lambda P_{1,k-1}(t)-(\lambda +h_k)P_{1,k}(t) \ \text {for} \ k=2,...,m-1\\ \frac{\textrm{d}P_{1,m}(t)}{\textrm{d}t}&=-(\lambda +h_1)P_{1,m}(t)+\lambda P_{2,m}(t)=\lambda P_{1,k-1}(t)-h_mP_{1,m}(t). \end{aligned}$$

Rearranging further, we have

$$\begin{aligned} (h_1-h_k)P_{1,k}(t)&=\lambda (P_{2,k}(t)-P_{1,k-1}(t)) \ \text {for} \ k=2,...,m-1\\ (\lambda +h_1-h_m)P_{1,m}(t)&=\lambda (P_{2,m}(t)-P_{1,m-1}(t)) \end{aligned}$$

The first part of g(t) could be written as

$$\begin{aligned}&\lambda P_{1,k-1}(t)+ \sum _{\ell =k}^m (\lambda +h_1-h_\ell ) P_{1,\ell }(t)\\&= \lambda P_{1,k-1}(t)+ \sum _{\ell =k}^{m-1} \lambda P_{1,\ell }(t)+\sum _{\ell =k}^{m-1} (h_1-h_\ell ) P_{1,\ell }(t)+(\lambda +h_1-h_m)P_{1,m}(t)\\&=\lambda P_{1,k-1}(t)+ \sum _{\ell =k}^{m-1} \lambda P_{1,\ell }(t)+\lambda \sum _{\ell =k}^m(P_{2,\ell }(t)-P_{1,\ell -1}(t)) =\lambda \sum _{\ell =k}^mP_{2,\ell }(t). \end{aligned}$$

Consequently,

$$\begin{aligned} g(t)=\lambda \left( \sum _{\ell =k}^mP_{2,\ell }(t)\sum _{j=1}^mP_{1,j}(t)-\sum _{\ell =k}^mP_{1,\ell }(t)\sum _{j=1}^mP_{2,j}(t)\right) , \end{aligned}$$

which shows that $g(t) \ge 0$ if and only if

$$\begin{aligned} P(Y_{1,t}\ge k|Y_{1,t} \in E)=\frac{\sum _{\ell =k}^mP_{1,\ell }(t)}{\sum _{j=1}^mP_{1,j}(t)} \le \frac{\sum _{\ell =k}^mP_{2,\ell }(t)}{\sum _{j=1}^mP_{2,j}(t)}=P(Y_{2,t}\ge k|Y_{2,t} \in E), \end{aligned}$$

with the equality holds if and only if $g(t)=0$. Similarly, $\frac{\textrm{d}P(Y_{\ell ,t}\ge k|Y_{\ell ,t} \in E)}{\textrm{d}t} \ge 0$ if and only if

$$\begin{aligned} P(Y_{\ell ,t}\ge k|Y_{\ell ,t} \in E) \le P(Y_{\ell +1,t}\ge k|Y_{\ell +1,t} \in E). \end{aligned}$$

(A.1)

The inequality is trivial when $\ell =m-1$ because $P(Y_{m,t}\ge k|Y_{m,t} \in E)=1$ and $P(Y_{\ell ,t}\ge k|Y_{\ell ,t} \in E) \le 1$. Recall that $Y_{\ell ,t}$ is the state variable at time t assuming the starting state is $\ell$ at time 0. By letting $t=0$, it could be verified that $P(Y_{\ell ,t=0}\ge k|Y_{\ell ,t=0} \in E)=1$ when $\ell \ge k$ and $P(Y_{\ell ,t=0}\ge k|Y_{\ell ,t=0} \in E)=0$ when $\ell < k$.

Since (A.1) holds when $\ell =m-1$, we have $\frac{\textrm{d}P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E)}{\textrm{d}t} \ge 0$, or $P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E)$ is increasing with respect to t.

The next step is to prove $P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E) \le P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E)$ by contradiction. Suppose

$$\begin{aligned} P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E) > P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E){.} \end{aligned}$$

(A.2)

By (A.1), we have $\frac{{\textrm{d}}P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)}{{\textrm{d}}t}<0$, or $P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)$ is decreasing with respect to t. Therefore, for any $t>0$,

$$\begin{aligned} P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)&< P(Y_{m-2,t=0}\ge k|Y_{m-2,t=0} \in E)\\&\le P(Y_{m-1,t=0}\ge k|Y_{m-1,t=0} \in E)\\&< P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E), \end{aligned}$$

which conflicts with (A.2). The first inequality holds because $P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)$ is decreasing with respect to t. The second inequality holds because $P(Y_{\ell ,t=0}\ge k|Y_{\ell ,t=0} \in E)$ is equal to 1 when $\ell \ge k$; otherwise, it is equal to 0, so that $P(Y_{m-1,t=0}\ge k|Y_{m-1,t=0} \in E)$ must be 1 if $P(Y_{m-2,t=0}\ge k|Y_{m-2,t=0} \in E)=1$ for any k. The third inequality holds because we proved $P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E)$ is an increasing function of t.

Hence, (A.2) is false, and it must be that

$$\begin{aligned} P(Y_{m-2,t} \ge k|Y_{m-2,t} \in E) \le P(Y_{m-1,t} \ge k|Y_{m-1,t} \in E), \end{aligned}$$

which shows $P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)$ is increasing with respect to t by (A.1). Following the same procedure recursively, it could be shown that $P(Y_{1,t}\ge k|Y_{1,t} \in E) \le P(Y_{2,t}\ge k|Y_{2,t} \in E)$. Therefore, $g(t) \ge 0$ and $\frac{\textrm{d}P(Y_{t}\ge k|Y_{t} \in E)}{\textrm{d}t} \ge 0$, implying that $P(Y_{t}\ge k|Y_{t} \in E)$ is an increasing function of t. $\square$

Theorem A.1

Consider the Coxian model with 3 restrictions: (i) the process starts in state 1 at time 0; (ii) the transition of going out from each state must be either to the next transient state or to the absorbing state; and (iii) the rates of transition from one transient state to the next transient state are the same. Suppose the absorption rate of the Coxian model is monotone with respect to the state label. The resulting hazard rate h(t) of the Coxian model is increasing with respect to age t if and only if the absorption rate $h_i$ is increasing with respect to i.

Proof

For any $t \ge 0$, the resulting hazard rate is

$$\begin{aligned} h(t)=\frac{\varvec{\alpha }\exp (\varvec{\Lambda } t)}{\varvec{\alpha }\exp (\varvec{\Lambda } t)\varvec{e}} \varvec{h} = \sum _{i=1}^m h_i P(Y_t=i|Y_t \in E) ={{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_t}|Y_t \in E\right) {.} \end{aligned}$$

The above result shows that h(t) is the expected value of $(h_{Y_t}|Y_t \in E)$, where $Y_t$ takes a state value at time t and E is the set of all transient states $\{1,\ldots ,m\}$. The ith element of $\varvec{\alpha }\exp (\varvec{\Lambda } t)$ is the probability in state i at time t.

By Lemma A.1, for any $t_1<t_2$, $Y_{t_1}$ is less than $Y_{t_2}$ in stochastic order. It is well known that if the random variable A is less than the random variable B in stochastic order, then for any non-decreasing functions u, ${{\,\mathrm{\mathbb {E}}\,}}(u(A)) \le {{\,\mathrm{\mathbb {E}}\,}}(u(B))$. Hence, when $h_i$ is increasing with respect to i, for any $0 \le t_1<t_2$,

$$\begin{aligned} h(t_1)={{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_1}}\big |Y_{t_1} \in E\right) \le {{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_2}}\big |Y_{t_2} \in E\right) = h(t_2), \end{aligned}$$

implying that h(t) is increasing with respect to age t.

On the other hand, when h(t) is increasing with respect to t, for any $0 \le t_1<t_2$,

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_1}}\big |Y_{t_1} \in E\right) =h(t_1)< h(t_2)={{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_2}}\big |Y_{t_2} \in E\right) . \end{aligned}$$

(A.3)

Suppose $h_i$ is decreasing with respect to i. Then, $-h_i$ is an increasing function. We showed that $Y_{t_1}$ is less than $Y_{t_2}$ in stochastic order. Therefore,

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}\left( -h_{Y_{t_1}}\big |Y_{t_1} \in E\right) \le {{\,\mathrm{\mathbb {E}}\,}}\left( -h_{Y_{t_2}}\big |Y_{t_2} \in E\right) , \end{aligned}$$

yielding ${{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_1}}\big |Y_{t_1} \in E\right) \ge {{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_2}}\big |Y_{t_2} \in E\right)$. Or, $h(t_1)\ge h(t_2)$, which contradicts with h(t) being increasing. Consequently, the absorption rate $h_i$ is increasing with respect to i.

$\square$

Since the PTAM’s absorption rate is monotone, the resulting hazard rate is monotone by Theorem A.1. A non-monotonic hazard rate is an impossibility for the PTAM. Therefore, we only investigate the goodness of fit of certain lifetime distributions with monotonic hazard rate.

Remark A.2

Suppose the absorption rate of the Coxian model is monotone with respect to the state label. Then, the resulting hazard rate h(t) is decreasing with respect to age t if and only if the dying rate $h_i$ is decreasing with respect to i.

The following list includes the tested lifetime distributions we are interested in: (i) Gamma distribution with increasing hazard rate, (ii) Gamma distribution with decreasing hazard rate, (ii) Weibull distribution with increasing hazard rate, (iv) Weibull distribution with decreasing hazard rate, (v) Pareto distribution with decreasing hazard rate, (vi) Convolution of two exponential distributions, (vii) Convolution of two Weibull distributions, (viii) Gompertz-Makeham distribution, and (ix) Makeham’s second extension of the Gompertz distribution.

Our experiments demonstrate that the fitted results are promising for all tested distributions, except for the tail portion of heavy-tailed distributions. As expected, it is problematic to use light-tailed distributions, which include the PTAM, to approximate heavy-tailed distributions, because there is always a distinct tail requiring a large number of mixtures (states in the PTAM) to approximate well the tail. For example, Figures S.6 and S.8 of the Supplementary Material depict the typical fitted results for the heavy-tailed and light-tailed distributions, respectively. More discussion and fitted results are presented in Appendix B.

Appendix B: Experiments showing the flexibility of the resulting distribution from a PTAM

In this Appendix, we detail the analysis of PTAM’s flexibility as described in Appendix A. This is based on the concept of goodness-of-fit to the target distribution by the calibrated PTAM.

1.1 Appendix B.1: Some distributions for lifetime modelling

The Gamma distribution has two parameters: $\alpha$ and $\beta$. Its associated pdf is

$$\begin{aligned} f(t)=\frac{\beta ^{\alpha }}{\Gamma (\alpha )}t^{\alpha -1}\exp (-\beta t), \ \alpha>0, \ \beta >0, \end{aligned}$$

where $\Gamma (x)=\int _{z=0}^{\infty }z^{x-1}e^{-z}dz$ is the Gamma function. When $\alpha >1$, the Gamma distribution has an increasing hazard rate, whilst the hazard rate for a Gamma distribution is decreasing when $\alpha <1$. When $\alpha =1$, the Gamma distribution reduces to an exponential distribution with a rate $\lambda =\beta$. Since the family of Gamma distributions is rich and capable of reproducing a variety of distribution shapes, it is extensively used in lifetime modelling. The Gamma distribution is one of the potential lifetime distributions utilised to capture lifetime processes of humans and some other living things.

The Weibull distribution has two parameters: k and $\lambda .$ Its associated pdf is

$$\begin{aligned} f(t)=\frac{k}{\lambda }\left( \frac{t}{\lambda }\right) ^{k-1}\exp \left( -\left( \frac{t}{\lambda }\right) ^k\right) , \ \lambda>0, \ k>0. \end{aligned}$$

When $k>1$, the Weibull distribution has an increasing hazard rate, whilst the hazard rate is decreasing when $k<1$. A Weibull distribution where $k=1$ reduces to an exponential distribution with rate $\lambda$. The Weibull distribution is popular in survival analysis and affords a sound basis for modelling lifetime distributions.

The Pareto distribution has two parameters: k and $\sigma$. Its associated survival function is

$$\begin{aligned} S(t)=\left( 1+\frac{kt}{\sigma }\right) ^{-\frac{1}{k}}, \ k>0, \ \sigma >0. \end{aligned}$$

It is well-known that Pareto distributions are heavy-tailed.

A convolution of two exponential distributions has two parameters, and the random variable that represents a convolution of two exponential distributions is

$$\begin{aligned} Y=X_1+X_2, \end{aligned}$$

where $X_1$ and $X_2$ follow the exponential distributions with rates $\lambda _1$ and $\lambda _2$, respectively. The convolutions of different distributions have a physical interpretation; i.e, the process could be decomposed into several components and the time spent on each component follows a certain distribution. Specifically, the physical interpretation for the convolution mechanism is consistent with our cognizance of ageing, being a progressive process. Furthermore, it is often assumed that the time spent in each component follows an exponential distribution because an exponential distribution matches most observed data. Therefore, a convolution of exponential distributions is a further extension capable of accomplishing more flexibility.

A convolution of two Weibull distributions has four parameters, and the random variable representing a convolution of two Weibull distributions is

$$\begin{aligned} Z=W_1+W_2, \end{aligned}$$

where $W_1$ and $W_2$ follow the Weibull distribution with parameters $(\lambda _1, k_1)$ and $(\lambda _2, k_2)$ respectively. This is a simple extension of the previous example by changing the distributions of each component, similar to Huzurbazar (1999). Since the Weibull distribution is utilised in survival analysis, it is plausible for the ageing process that the time spent in each component follows a Weibull distribution as well. It is then worth exploring whether one could use the PTAM to approximate well a convolution of Weibull distributions.

The Gompertz-Makeham distribution has three parameters, and its associated hazard rate is

$$\begin{aligned} h(t)=\zeta +\xi \exp (\lambda t), \ \zeta \ge -\xi , \ \xi>0, \ \lambda >0{,} \end{aligned}$$

where $\xi$ is the growth rate of mortality and $\zeta$ represents the background mortality with additional constraint $\zeta > 0$.

The Gompertz-Makeham distribution were proposed by Makeham (1860) and Gompertz (1825), and it assumes a shifted exponential increasing hazard rate with respect to age. Such a distribution fits human mortality rates from age 30 to 80 well. When we calibrate the PTAM with human lifetime observations only, the PTAM is essentially approximating the Gompertz-Makeham distribution.

The Makeham’s second extension of the Gompertz distribution has four parameters, and its survival function is

$$\begin{aligned} S(t)=\exp \left( -\xi (\exp (\lambda t)-1)-\xi \theta \lambda t-\xi \alpha (\lambda t)^2\right) , \ \xi>0, \ \lambda>0, \ \theta>0, \ \alpha >0. \end{aligned}$$

Makeham (1889) extended the assumptions on the differential equation for the hazard rate to higher-order derivatives. This was motivated by the observation that the third-order derivatives of the empirical hazard rates were much closer to a geometric progression. One may verify that the third-order derivatives, with respect to time t, of the hazard rate arising from the Makeham’s second extension of the Gompertz distribution is $\xi \lambda ^3(\exp (\lambda ))^t$, which is a geometric progression with initial value $\xi \lambda ^3$ and common ratio $\exp (\lambda )$.

Once again, it is likely to observe human lifetimes following the extended Gompertz distribution. For each example, we select appropriate parameter values to achieve a particular shape of probability density function or hazard function. Once the parameter values are chosen, the target lifetime distributions are completely determined.

1.2 Appendix B.2: Calibration criterion

We shall use the Kullback-Leibler (KL) divergence (Kullback and Leibler 1951; Kullback 1997) to summarise the information loss when using the PTAM to approximate the target lifetime distribution, which is known. When calibrating the PTAM, we minimise the KL divergence to ensure minimal lost information.

Definition B.1

For continuous distributions P and Q with respective pdf’s p(x) and q(x) defined on a sample probability space, the KL divergence $D_{KL}$ of Q from P is

$$\begin{aligned} D_{KL}(P\ || \ Q){:=}\int _{-\infty }^{+\infty }p(x)\log \left( \frac{p(x)}{q(x)}\right) \textrm{d}x{.} \end{aligned}$$

Usually, the distribution P represents the actual model, and the distribution Q represents the distribution used to approximate P. In our case, the distribution P is the target lifetime distribution in each example, and the distribution Q is the PTAM.

The KL divergence’s non-negativity is a nice property as discussed further below. Specifically,

$$\begin{aligned} D_{KL}(P\ || \ Q) \ge 0, \end{aligned}$$

with equality if and only if $P=Q$. This result is known as the Gibbs’ inequality (Gibbs 1902). Thus, the divergence value could be interpreted as the extra information required for Q to represent P. Therefore, the smaller the divergence, the less extra information is required when using Q to approximate P and the better goodness of fit is for Q. Furthermore, by its non-negative property, the closer the divergence value to 0, the better the approximation.

In our case, we calibrate the PTAM by minimising the KL divergence of the PTAM from the target lifetime distribution. The analytical formula for the PTAM’s resulting distribution is quite involved. However, the resulting distribution is fairly easy to evaluate numerically for a given set of parameter values. Therefore, we numerically search for the minimum of $D_{KL}(\text {lifetime distribution} \ || \ \text {proposed PTAM})$, where p(x) is the pdf of the target lifetime distribution, q(x) is the pdf of the PTAM, and

$$\begin{aligned} D_{KL}(\text {lifetime distribution} \ || \ \text {proposed PTAM})=\int _{0}^{+\infty }p(x)\log \left( \frac{p(x)}{q(x)}\right) \textrm{d}x. \end{aligned}$$

The numerical search utilises the optimisation function fmincon in MATLAB. The calibration process is similar to the model calibration in the Le Bras model simulation in Cheng et al. (2021), except for switching the optimisation criterion from negative log-likelihood to the KL divergence. The Tail Value-at-Risk (TVaR) with risk level 0.999 denoted by $\text {TVaR}_{0.999}(T)$ is calculated in each example, and the calculated value is used to estimate $\psi$. The estimated values for the other four parameters are the values that minimise the divergence. For the fitted results, refer to Tables S.1 – S.10 and Figures S.1 – S.10 of the Supplementary Material.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cheng, B., Mamon, R. Examining the identifiability and estimability of the phase-type ageing model. Comput Stat 39, 963–1004 (2024). https://doi.org/10.1007/s00180-023-01329-5

Download citation

Received: 17 March 2022
Accepted: 24 January 2023
Published: 28 February 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00180-023-01329-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Examining the identifiability and estimability of the phase-type ageing model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A uniformisation-driven algorithm for inference-related estimation of a phase-type ageing model

Approaches to Statistical Analysis of Longitudinal Data on Aging, Health, and Longevity: Biodemographic Perspectives

Making Predictions Using Poorly Identified Mathematical Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 528 KB)

Appendices

Appendices

Appendix A: Flexibility of the PTAM’s resulting distribution

Definition A.1

Lemma A.1

Proof

Remark A.1

Theorem A.1

Proof

Remark A.2

Appendix B: Experiments showing the flexibility of the resulting distribution from a PTAM

1.1 Appendix B.1: Some distributions for lifetime modelling

1.2 Appendix B.2: Calibration criterion

Definition B.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now