Examining the identifiability and estimability of the phase-type ageing model | Computational Statistics Skip to main content
Log in

Examining the identifiability and estimability of the phase-type ageing model

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this paper, the identifiability and the estimability of a particular phase-type ageing model (PTAM) are investigated. We consider a PTAM that is shown to be identifiable, although it has poor estimability when the only observation is the time until absorption. We use a data-cloning method to assess model estimability, which is also analysed visually via contour plots and marginal likelihood functions. The PTAM’s estimability under different scenarios is compared, from the best scenario of a fully observable Markov process for each individual to the worst scenario wherein the only observation is the time until absorption for each individual. Some in-between scenarios are also studied and include the situations where the state could be observed every several years and the state could be measured with some error. Certain conditions are provided regarding the state of the Markov chain that improves the PTAM’s estimability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Albrecher H, Bladt M, Bladt M, Yslas J (2022) Mortality modeling and regression with matrix distributions. Insur Math Econ 107:68–87

    MathSciNet  Google Scholar 

  • Alalouf I, Styan G (1979) Characterizations of estimability in the general linear model. Ann Stat 7(1):194–200

    MathSciNet  Google Scholar 

  • Asmussen S, Nerman O, Olsson M (1996) Fitting phase-type distributions via the EM algorithm. Scandinavian J Stat 23:419–441

    Google Scholar 

  • Augustin R, Büscher K (1982) Characteristics of the Cox-distribution. ACM Sigmetrics Perform Eval Rev 12(1):22–32

    Google Scholar 

  • Bobbio A, Cumani A (1992) ML estimation of the parameters of a PH distribution in triangular canonical form, Computer. Perform Eval 22:33–46

    Google Scholar 

  • Bolch G, Greiner S, Meer H, Trivedi K (2006) Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science applications. Wiley-Interscience, New York

    Google Scholar 

  • Bunch D (1991) Estimability in the multinomial probit model. Transp Res Part B Method 25(1):1–12

    MathSciNet  Google Scholar 

  • Bunke H, Bunke O (1974) Identifiability and estimability. Stat J Theor Appl Stat 5(3):223–233

    MathSciNet  Google Scholar 

  • Campbell D, Lele S (2014) An ANOVA test for parameter estimability using data cloning with application to statistical inference for dynamic systems. Comput Stat & Data Anal 70:257–267

    MathSciNet  Google Scholar 

  • Casella G, Berger R (2021) Statistical Inference. Cengage Learning, Belmont

    Google Scholar 

  • Cavanaugh J, Neath A (2019) The Akaike information criterion: background, derivation, properties, application, interpretation, and refinements. Wiley Interdiscip Rev Comput Stat 11(3):e1460

    MathSciNet  Google Scholar 

  • Cheng B, Jones B, Liu X, Ren J (2021) The mathematical mechanism of biological aging. North Am Actuar J 25:73–93

    MathSciNet  Google Scholar 

  • Cheng B, Mamon R (2022) A uniformisation-driven algorithm for inference-related estimation of a phase-type ageing model. Lifetime Data Anal. https://doi.org/10.1007/s10985-022-09577-1

    Article  PubMed  Google Scholar 

  • Cox D (1955) A use of complex probabilities in the theory of stochastic processes. Math Proc Cambridge Philos Soc 51(2):313–319

    ADS  MathSciNet  Google Scholar 

  • Cumani A (1982) On the canonical representation of homogeneous Markov processes modelling failure-time distributions. Microelectron Reliab 22(3):583–602

    MathSciNet  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39(1):1–22

    MathSciNet  Google Scholar 

  • Deresa N, Keilegom I (2020) A multivariate normal regression model for survival data subject to different types of dependent censoring. Comput Stat & Data Anal 144:106879

    MathSciNet  Google Scholar 

  • Dörre A (2020) Bayesian estimation of a lifetime distribution under double truncation caused by time-restricted data collection. Stat Papers 61(3):945–965

    MathSciNet  Google Scholar 

  • Eberly L, Carlin B (2000) Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models. Stat Med 19(17–18):2279–2294

    CAS  PubMed  Google Scholar 

  • Fackrell M, Characterisation of matrix-exponential distributions, PhD Thesis, University of Adelaide, Australia, (2003)

  • Fackrell M (2009) Modelling healthcare systems with phase-type distributions. Health Care Manage Sci 12(1):11–26

    MathSciNet  Google Scholar 

  • Faddy M (1994) Examples of fitting structured phase-type distributions. Appl Stoch Models Data Anal 10(4):247–255

    Google Scholar 

  • Faddy M (1998) On inferring the number of phases in a Coxian phase-type distribution. Stoch Models 14(1–2):407–417

    Google Scholar 

  • Faddy M, Latouche G, Taylor P (2002) Penalised maximum likelihood estimation of the parameters in a Coxian phase-type distribution. In: Latouche G, Taylor P (eds) Matrix-analytic methods: theory and applications. World Scientific, New Jersey, pp 107–114

    Google Scholar 

  • Fisher F (1966) The identification problem in econometrics. McGraw-Hill, New York

    Google Scholar 

  • Gibbs J (1902) Elementary principles in statistical mechanics: developed with special reference to the rational foundation of thermodynamics. Charles Scribner’s Sons, New York

    Google Scholar 

  • Gompertz B (1825) On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philos Trans R Soc London 115:513–583

    ADS  Google Scholar 

  • Govorun M, Jones B, Liu X, Stanford D (2018) Physiological age, health costs, and their interrelation. North Am Actuar J 22(3):323–340

    Google Scholar 

  • Govorun M, Latouche G, Loisel S (2015) Phase-type aging modeling for health dependent costs. Insur Math Econ 62:173–183

    MathSciNet  Google Scholar 

  • Harper P, Knight V, Marshall A (2012) Discrete conditional phase-type models utilising classification trees: Application to modelling health service capacities. Eur J Operat Res 219(3):522–530

    Google Scholar 

  • Hassan A, Jones B, Stanford D (2014) The use of phase-type models for disability insurance calculations. Scandinavian Actuar J 2014(8):714–728

    MathSciNet  Google Scholar 

  • Hastings W (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109

    MathSciNet  Google Scholar 

  • Hengl S, Kreutz C, Timmer J, Maiwald T (2007) Data-based identifiability analysis of non-linear dynamical models. Bioinformatics 23(19):2612–2618

    CAS  PubMed  Google Scholar 

  • Hsiao C (1983) Identification, Handbook of Econometrics 1. North Holland, Elsevier, pp 223–283

    Google Scholar 

  • Huzurbazar A (1999) Flowgraph models for generalized phase type distributions having non-exponential waiting times. Scandinavian J Stat 26(1):145–157

    Google Scholar 

  • Jacquez J, Greif P (1985) Numerical parameter identifiability and estimability: Integrating identifiability, estimability, and optimal sampling design. Math Biosci 77(1–2):201–227

    Google Scholar 

  • Kullback S, Leibler R (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    MathSciNet  Google Scholar 

  • Kullback S (1997) Information theory statistics. Courier Corporation, New York

    Google Scholar 

  • Lehmann E, Casella G (2006) Theory of point estimation. Springer Science & Business Media, New York

    Google Scholar 

  • Lele S, Nadeem K, Schmuland B (2010) Estimability and likelihood inference for generalized linear mixed models using data cloning. J Am Stat Assoc 105(492):1617–1625

    MathSciNet  CAS  Google Scholar 

  • Lin X, Liu X (2007) Markov aging process and phase-type law of mortality. North Am Actuar J 11:92–109

    MathSciNet  Google Scholar 

  • Makeham W (1860) On the law of mortality and the construction of annuity tables. The Assurance Magazine, and Journal of the Institute of Actuaries 8(6):301–310

    Google Scholar 

  • Makeham W (1889) On the further development of Gompertz’s law. J Inst Actuar (1886-1994) 28(2):152–159

    Google Scholar 

  • Marshall A, Zenga M (2009) Simulating Coxian phase-type distributions for patient survival. Int Trans Operat Res 16(2):213–226

    Google Scholar 

  • McClean S, Gillespie J, Garg L, Barton M, Scotney B, Kullerton K (2014) Using phase-type models to cost stroke patient care across health, social and community services. Eur J Operat Res 236(1):190–199

    MathSciNet  Google Scholar 

  • McLean K, McAuley K (2012) Mathematical modelling of chemical processes-obtaining the best model predictions and parameter estimates using identifiability and estimability procedures. Can J Chem Eng 90(2):351–366

    CAS  Google Scholar 

  • Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092

    ADS  CAS  Google Scholar 

  • Miettinen O (1976) Estimability and estimation in case-referent studies. Am J Epidemiol 103(2):226–235

    CAS  PubMed  Google Scholar 

  • Moriguchi S, Murota K (2012) On discrete Hessian matrix and convex extensibility. J Operat Res Soc Japan 55(1):48–62

    MathSciNet  Google Scholar 

  • Mu J (2019) Exploring the Estimability of Mark-Recapture Models with Individual, Time-Varying Covariates using the Scaled Logit Link Function, MSc Thesis, The University of Western Ontario, Canada

  • Neath A, Cavanaugh J (2012) The Bayesian information criterion: background, derivation, and applications. Wiley Interdiscip Rev Comput Stat 4(2):199–203

    Google Scholar 

  • Nelder J, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313

    MathSciNet  Google Scholar 

  • Payne K, Marshall H, Cairns K (2012) Investigating the efficiency of fitting Coxian phase-type distributions to health care data. IMA J Manage Math 23(2):133–145

    Google Scholar 

  • Putter H, Houwelingen H (2015) Frailties in multi-state models: are they identifiable? Do we need them? Statistical Methods Med Res 24(6):675–692

    MathSciNet  Google Scholar 

  • Ramírez-Cobo P, Lillo R, Wiper M (2010) Nonidentifiability of the two-state Markovian arrival process. J Appl Prob 47(3):630–649

    MathSciNet  Google Scholar 

  • Titman A, Sharples L (2010) semi-Markov models with phase-type sojourn distributions. Biometrics 66(3):742–752

    MathSciNet  PubMed  Google Scholar 

  • Vrieze S (2012) Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol Methods 17(2):228–243

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada through R. Mamon’s Discovery Grant (RGPIN-2017-04235). The valuable comments of Bruce Jones, the anonymous reviewers, and the editor are highly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rogemar Mamon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 528 KB)

Appendices

Appendices

Appendix A: Flexibility of the PTAM’s resulting distribution

Obtaining a reasonably good fit for the PTAM was demonstrated in Sections 3.2 and 4.2 of Cheng et al. (2021). This motivated us to explore the flexibility of the PTAM. In particular, we consider the question: Can the PTAM achieve a good fit on any set of lifetime data? The assessment of the PTAM’s flexibility will be based on how well the resulting pdf and hazard function could approximate the true ones. Both functions are easily obtained from the data and commonly used to validate the calibrated result.

Definition A.1

We define i as the state label for states \(i=1,\ldots ,m\) in a Coxian model with m transient states.

Lemma A.1

Consider an \(m-\)state Coxian model with transition rate from one transient state to the next transient state equal to \(\lambda\), and the absorption rate in state i is \(h_i\) for \(i=1,\ldots ,m\). For any \(t>0\), let \(Y_t\) be the state variable at time t. Then, \(P(Y_t \ge k|Y_t \in E)\) is an increasing function with respect to t for any \(k=1,\ldots ,m\).

Proof

The \(m\times m\) intensity matrix of the Coxian model is

$$\begin{aligned} \varvec{\Lambda }=\left[ \begin{array}{ccccc} -\left( \lambda +h_{1}\right) &{} \lambda \\ &{} -\left( \lambda +h_{2}\right) &{} \lambda \\ &{} &{} \ddots \\ &{} &{} &{} -\left( \lambda +h_{m-1}\right) &{} \lambda \\ &{} &{} &{} &{} -h_{m} \end{array}\right] , \end{aligned}$$

where \(\lambda\) is the rate of transition from state i to \(i+1\), and \(h_i\) is the absorption rate at state i. For any fixed \(t>0\), let \(Y_{\ell ,t}\) be the state random variable at time t assuming the starting state is \(\ell\) at 0, then \(Y_t=Y_{1,t}\). For every \(k=1,{\ldots },m\),

$$\begin{aligned} P(Y_t \ge k|Y_{t} \in E)&=\frac{\sum _{j=k}^m P(Y_{1,t}=j)}{\sum _{j=1}^m P(Y_{1,t}=j)}=\frac{\varvec{\alpha }e^{\varvec{\Lambda }t} \varvec{e_{k}}}{\varvec{\alpha }e^{\varvec{\Lambda }t}\varvec{e}}, \end{aligned}$$

where \(\varvec{e_{k}}\) is an \(m\times 1\) column vector with first \(k-1\) elements equal to 0 and the remaining elements are equal to 1. The quantity \(P(Y_t \ge k|Y_t \in E)\) is the probability that the state of the individual at time t is greater than or equal to k conditional on this individual being alive. We note that \(P(Y_t \ge 1|Y_{t} \in E)=1\). When \(k \ne 1\), the first derivative of \(P(Y_t \ge k|Y_t \in E)\) with respect to t is

$$\begin{aligned} \frac{\textrm{d}P(Y_t \ge k|Y_t \in E)}{\textrm{d}t}= \frac{\varvec{\alpha }e^{\varvec{\Lambda }t}(\varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha } -\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda })e^{\varvec{\Lambda }t}\varvec{e}}{\left( \varvec{\alpha }e^{\varvec{\Lambda }t}\varvec{e}\right) ^2}. \end{aligned}$$

Let \(g(t)=\varvec{\alpha }e^{\varvec{\Lambda }t}(\varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha } -\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda })e^{\varvec{\Lambda }t}\varvec{e}\). The (ij) entry of \(e^{\varvec{\Lambda }t}\), denoted by \(P_{i,j}(t)\), is the probability of being in state j at time t given that the starting state is i at time 0. It may be verified that

$$\begin{aligned} \varvec{\alpha }e^{\varvec{\Lambda }t}&=(P_{1,1}(t),...,P_{1,m}(t)) \, {\text {a }1\times m \text { row vector}}\\ e^{\varvec{\Lambda }t}\varvec{e}&=\left( \sum _{j=1}^mP_{1,j}(t),...,\sum _{j=1}^mP_{m,j}(t)\right) ^{\intercal }\, \text {an} m\times 1 \text { column vector}\\ \varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha }&=\left[ \begin{array}{ccccccc} 0 &{} 0&{}0&{}\cdots \\ \vdots &{} \vdots \\ 0 &{} 0&{}0&{}\cdots \\ \lambda &{} 0&{}0&{}\cdots \\ -h_k &{} 0 &{}0 &{}\cdots \\ -h_{k+1} &{} 0 &{}0&{}\cdots \\ \vdots \\ -h_{m} &{} 0 &{}0&{}\cdots \end{array}\right] \, \text {an}\; m\times m \text { matrix}\\ \varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda }&=\left[ \begin{array}{ccccccc} 0 &{} 0&{}0&{}\cdots \\ \vdots &{} \vdots \\ 0 &{} 0&{}0&{}\cdots \\ 0 &{} 0&{}0&{}\cdots \\ -(\lambda +h_1) &{} \lambda &{}0 &{}\cdots \\ -(\lambda +h_1) &{} \lambda &{}0&{}\cdots \\ \vdots \\ -(\lambda +h_1) &{} \lambda &{}0&{}\cdots \end{array}\right] \, \text {an} \;m\times m \text { matrix}, \end{aligned}$$

where \(\lambda\) is the \((k-1,1)\) entry of \(\varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha }\) and the entries of the entire first \((k-1)\)th rows in \(\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda }\) are 0’s.

Remark A.1

Bold subscripts are used in the vector notation, and normal subscripts are associated to an element of a vector. For example, since \(\varvec{A_k}\) is a vector, \(\varvec{A}_k\) is the kth element of vector \(\varvec{A}\).

Hence, \(\varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha } -\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda } =(\varvec{\beta _1},\varvec{\beta _2},\varvec{0},\)...\(,\varvec{0})^{\intercal }\), where \(\varvec{\beta _2}\) is an \(m\times 1\) column vector with the first \(k-1\) elements all equal to 0 and the remaining elements are all equal to \(-\lambda\); and \(\varvec{\beta _1}\) is an \(m\times 1\) column vector with the first \(k-2\) elements equal to 0, the \((k-1)\)th element equals \(\lambda\), and the \(\ell\)th element (\(\ell \ge k\)) equals \(\lambda +h_1-h_{\ell }\). In matrix form,

$$\begin{aligned} \varvec{\Lambda }\varvec{e_{k}}\varvec{\alpha } -\varvec{e_{k}}\varvec{\alpha }\varvec{\Lambda }=\left[ \begin{array}{ccccccc} 0 &{} 0&{}0&{}\cdots \\ \vdots &{} \vdots \\ 0 &{} 0&{}0&{}\cdots \\ \lambda &{} 0&{}0&{}\cdots \\ \lambda +h_1-h_k &{} -\lambda &{}0 &{}\cdots \\ \lambda +h_1-h_{k+1} &{} -\lambda &{}0&{}\cdots \\ \vdots \\ \lambda +h_1-h_{m} &{} -\lambda &{}0&{}\cdots \end{array}\right] . \end{aligned}$$

Then, g(t) simplifies to

$$\begin{aligned} g(t)&=(P_{1,1}(t),{\ldots },P_{1,m}(t))\left[ \begin{array}{ccccccc} 0 &{} 0&{}0&{}\cdots \\ \vdots &{} \vdots \\ 0 &{} 0&{}0&{}\cdots \\ \lambda &{} 0&{}0&{}\cdots \\ \lambda +h_1-h_k &{} -\lambda &{}0 &{}\cdots \\ \lambda +h_1-h_{k+1} &{} -\lambda &{}0&{}\cdots \\ \vdots \\ \lambda +h_1-h_{m} &{} -\lambda &{}0&{}\cdots \end{array}\right] \left( \sum _{j=1}^mP_{1,j}(t),...,\sum _{j=1}^mP_{m,j}(t)\right) ^{\intercal }\\&=\left( \left( \lambda P_{1,k-1}(t)+ \sum _{\ell =k}^m (\lambda +h_1-h_\ell ) P_{1,\ell }(t)\right) ,-\lambda \sum _{\ell =k}^m P_{1,\ell }(t),0,...,0\right) \left( \sum _{j=1}^mP_{1,j}(t),...,\sum _{j=1}^mP_{m,j}(t)\right) ^{\intercal } \\&=\left( \lambda P_{1,k-1}(t)+ \sum _{\ell =k}^m (\lambda +h_1-h_\ell ) P_{1,\ell }(t)\right) \sum _{j=1}^{m} P_{1,j}(t)-\sum _{\ell =k}^m \lambda P_{1,\ell }(t) \sum _{j=1}^m P_{2,j}(t). \end{aligned}$$

On the other hand, we know that

$$\begin{aligned} \frac{\textrm{d}e^{\varvec{\Lambda }t}}{\textrm{d}t} =\varvec{\Lambda }e^{\varvec{\Lambda }t}=e^{\varvec{\Lambda }t}\varvec{\Lambda }, \end{aligned}$$

yielding

$$\begin{aligned} \frac{\textrm{d}P_{1,k}(t)}{\textrm{d}t}&=-(\lambda +h_1)P_{1,k}(t)+\lambda P_{2,k}(t)=\lambda P_{1,k-1}(t)-(\lambda +h_k)P_{1,k}(t) \ \text {for} \ k=2,...,m-1\\ \frac{\textrm{d}P_{1,m}(t)}{\textrm{d}t}&=-(\lambda +h_1)P_{1,m}(t)+\lambda P_{2,m}(t)=\lambda P_{1,k-1}(t)-h_mP_{1,m}(t). \end{aligned}$$

Rearranging further, we have

$$\begin{aligned} (h_1-h_k)P_{1,k}(t)&=\lambda (P_{2,k}(t)-P_{1,k-1}(t)) \ \text {for} \ k=2,...,m-1\\ (\lambda +h_1-h_m)P_{1,m}(t)&=\lambda (P_{2,m}(t)-P_{1,m-1}(t)) \end{aligned}$$

The first part of g(t) could be written as

$$\begin{aligned}&\lambda P_{1,k-1}(t)+ \sum _{\ell =k}^m (\lambda +h_1-h_\ell ) P_{1,\ell }(t)\\&= \lambda P_{1,k-1}(t)+ \sum _{\ell =k}^{m-1} \lambda P_{1,\ell }(t)+\sum _{\ell =k}^{m-1} (h_1-h_\ell ) P_{1,\ell }(t)+(\lambda +h_1-h_m)P_{1,m}(t)\\&=\lambda P_{1,k-1}(t)+ \sum _{\ell =k}^{m-1} \lambda P_{1,\ell }(t)+\lambda \sum _{\ell =k}^m(P_{2,\ell }(t)-P_{1,\ell -1}(t)) =\lambda \sum _{\ell =k}^mP_{2,\ell }(t). \end{aligned}$$

Consequently,

$$\begin{aligned} g(t)=\lambda \left( \sum _{\ell =k}^mP_{2,\ell }(t)\sum _{j=1}^mP_{1,j}(t)-\sum _{\ell =k}^mP_{1,\ell }(t)\sum _{j=1}^mP_{2,j}(t)\right) , \end{aligned}$$

which shows that \(g(t) \ge 0\) if and only if

$$\begin{aligned} P(Y_{1,t}\ge k|Y_{1,t} \in E)=\frac{\sum _{\ell =k}^mP_{1,\ell }(t)}{\sum _{j=1}^mP_{1,j}(t)} \le \frac{\sum _{\ell =k}^mP_{2,\ell }(t)}{\sum _{j=1}^mP_{2,j}(t)}=P(Y_{2,t}\ge k|Y_{2,t} \in E), \end{aligned}$$

with the equality holds if and only if \(g(t)=0\). Similarly, \(\frac{\textrm{d}P(Y_{\ell ,t}\ge k|Y_{\ell ,t} \in E)}{\textrm{d}t} \ge 0\) if and only if

$$\begin{aligned} P(Y_{\ell ,t}\ge k|Y_{\ell ,t} \in E) \le P(Y_{\ell +1,t}\ge k|Y_{\ell +1,t} \in E). \end{aligned}$$
(A.1)

The inequality is trivial when \(\ell =m-1\) because \(P(Y_{m,t}\ge k|Y_{m,t} \in E)=1\) and \(P(Y_{\ell ,t}\ge k|Y_{\ell ,t} \in E) \le 1\). Recall that \(Y_{\ell ,t}\) is the state variable at time t assuming the starting state is \(\ell\) at time 0. By letting \(t=0\), it could be verified that \(P(Y_{\ell ,t=0}\ge k|Y_{\ell ,t=0} \in E)=1\) when \(\ell \ge k\) and \(P(Y_{\ell ,t=0}\ge k|Y_{\ell ,t=0} \in E)=0\) when \(\ell < k\).

Since (A.1) holds when \(\ell =m-1\), we have \(\frac{\textrm{d}P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E)}{\textrm{d}t} \ge 0\), or \(P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E)\) is increasing with respect to t.

The next step is to prove \(P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E) \le P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E)\) by contradiction. Suppose

$$\begin{aligned} P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E) > P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E){.} \end{aligned}$$
(A.2)

By (A.1), we have \(\frac{{\textrm{d}}P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)}{{\textrm{d}}t}<0\), or \(P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)\) is decreasing with respect to t. Therefore, for any \(t>0\),

$$\begin{aligned} P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)&< P(Y_{m-2,t=0}\ge k|Y_{m-2,t=0} \in E)\\&\le P(Y_{m-1,t=0}\ge k|Y_{m-1,t=0} \in E)\\&< P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E), \end{aligned}$$

which conflicts with (A.2). The first inequality holds because \(P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)\) is decreasing with respect to t. The second inequality holds because \(P(Y_{\ell ,t=0}\ge k|Y_{\ell ,t=0} \in E)\) is equal to 1 when \(\ell \ge k\); otherwise, it is equal to 0, so that \(P(Y_{m-1,t=0}\ge k|Y_{m-1,t=0} \in E)\) must be 1 if \(P(Y_{m-2,t=0}\ge k|Y_{m-2,t=0} \in E)=1\) for any k. The third inequality holds because we proved \(P(Y_{m-1,t}\ge k|Y_{m-1,t} \in E)\) is an increasing function of t.

Hence, (A.2) is false, and it must be that

$$\begin{aligned} P(Y_{m-2,t} \ge k|Y_{m-2,t} \in E) \le P(Y_{m-1,t} \ge k|Y_{m-1,t} \in E), \end{aligned}$$

which shows \(P(Y_{m-2,t}\ge k|Y_{m-2,t} \in E)\) is increasing with respect to t by (A.1). Following the same procedure recursively, it could be shown that \(P(Y_{1,t}\ge k|Y_{1,t} \in E) \le P(Y_{2,t}\ge k|Y_{2,t} \in E)\). Therefore, \(g(t) \ge 0\) and \(\frac{\textrm{d}P(Y_{t}\ge k|Y_{t} \in E)}{\textrm{d}t} \ge 0\), implying that \(P(Y_{t}\ge k|Y_{t} \in E)\) is an increasing function of t. \(\square\)

Theorem A.1

Consider the Coxian model with 3 restrictions: (i) the process starts in state 1 at time 0; (ii) the transition of going out from each state must be either to the next transient state or to the absorbing state; and (iii) the rates of transition from one transient state to the next transient state are the same. Suppose the absorption rate of the Coxian model is monotone with respect to the state label. The resulting hazard rate h(t) of the Coxian model is increasing with respect to age t if and only if the absorption rate \(h_i\) is increasing with respect to i.

Proof

For any \(t \ge 0\), the resulting hazard rate is

$$\begin{aligned} h(t)=\frac{\varvec{\alpha }\exp (\varvec{\Lambda } t)}{\varvec{\alpha }\exp (\varvec{\Lambda } t)\varvec{e}} \varvec{h} = \sum _{i=1}^m h_i P(Y_t=i|Y_t \in E) ={{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_t}|Y_t \in E\right) {.} \end{aligned}$$

The above result shows that h(t) is the expected value of \((h_{Y_t}|Y_t \in E)\), where \(Y_t\) takes a state value at time t and E is the set of all transient states \(\{1,\ldots ,m\}\). The ith element of \(\varvec{\alpha }\exp (\varvec{\Lambda } t)\) is the probability in state i at time t.

By Lemma A.1, for any \(t_1<t_2\), \(Y_{t_1}\) is less than \(Y_{t_2}\) in stochastic order. It is well known that if the random variable A is less than the random variable B in stochastic order, then for any non-decreasing functions u, \({{\,\mathrm{\mathbb {E}}\,}}(u(A)) \le {{\,\mathrm{\mathbb {E}}\,}}(u(B))\). Hence, when \(h_i\) is increasing with respect to i, for any \(0 \le t_1<t_2\),

$$\begin{aligned} h(t_1)={{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_1}}\big |Y_{t_1} \in E\right) \le {{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_2}}\big |Y_{t_2} \in E\right) = h(t_2), \end{aligned}$$

implying that h(t) is increasing with respect to age t.

On the other hand, when h(t) is increasing with respect to t, for any \(0 \le t_1<t_2\),

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_1}}\big |Y_{t_1} \in E\right) =h(t_1)< h(t_2)={{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_2}}\big |Y_{t_2} \in E\right) . \end{aligned}$$
(A.3)

Suppose \(h_i\) is decreasing with respect to i. Then, \(-h_i\) is an increasing function. We showed that \(Y_{t_1}\) is less than \(Y_{t_2}\) in stochastic order. Therefore,

$$\begin{aligned} {{\,\mathrm{\mathbb {E}}\,}}\left( -h_{Y_{t_1}}\big |Y_{t_1} \in E\right) \le {{\,\mathrm{\mathbb {E}}\,}}\left( -h_{Y_{t_2}}\big |Y_{t_2} \in E\right) , \end{aligned}$$

yielding \({{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_1}}\big |Y_{t_1} \in E\right) \ge {{\,\mathrm{\mathbb {E}}\,}}\left( h_{Y_{t_2}}\big |Y_{t_2} \in E\right)\). Or, \(h(t_1)\ge h(t_2)\), which contradicts with h(t) being increasing. Consequently, the absorption rate \(h_i\) is increasing with respect to i.

\(\square\)

Since the PTAM’s absorption rate is monotone, the resulting hazard rate is monotone by Theorem A.1. A non-monotonic hazard rate is an impossibility for the PTAM. Therefore, we only investigate the goodness of fit of certain lifetime distributions with monotonic hazard rate.

Remark A.2

Suppose the absorption rate of the Coxian model is monotone with respect to the state label. Then, the resulting hazard rate h(t) is decreasing with respect to age t if and only if the dying rate \(h_i\) is decreasing with respect to i.

The following list includes the tested lifetime distributions we are interested in: (i) Gamma distribution with increasing hazard rate, (ii) Gamma distribution with decreasing hazard rate, (ii) Weibull distribution with increasing hazard rate, (iv) Weibull distribution with decreasing hazard rate, (v) Pareto distribution with decreasing hazard rate, (vi) Convolution of two exponential distributions, (vii) Convolution of two Weibull distributions, (viii) Gompertz-Makeham distribution, and (ix) Makeham’s second extension of the Gompertz distribution.

Our experiments demonstrate that the fitted results are promising for all tested distributions, except for the tail portion of heavy-tailed distributions. As expected, it is problematic to use light-tailed distributions, which include the PTAM, to approximate heavy-tailed distributions, because there is always a distinct tail requiring a large number of mixtures (states in the PTAM) to approximate well the tail. For example, Figures S.6 and S.8 of the Supplementary Material depict the typical fitted results for the heavy-tailed and light-tailed distributions, respectively. More discussion and fitted results are presented in Appendix B.

Appendix B: Experiments showing the flexibility of the resulting distribution from a PTAM

In this Appendix, we detail the analysis of PTAM’s flexibility as described in Appendix A. This is based on the concept of goodness-of-fit to the target distribution by the calibrated PTAM.

1.1 Appendix B.1: Some distributions for lifetime modelling

The Gamma distribution has two parameters: \(\alpha\) and \(\beta\). Its associated pdf is

$$\begin{aligned} f(t)=\frac{\beta ^{\alpha }}{\Gamma (\alpha )}t^{\alpha -1}\exp (-\beta t), \ \alpha>0, \ \beta >0, \end{aligned}$$

where \(\Gamma (x)=\int _{z=0}^{\infty }z^{x-1}e^{-z}dz\) is the Gamma function. When \(\alpha >1\), the Gamma distribution has an increasing hazard rate, whilst the hazard rate for a Gamma distribution is decreasing when \(\alpha <1\). When \(\alpha =1\), the Gamma distribution reduces to an exponential distribution with a rate \(\lambda =\beta\). Since the family of Gamma distributions is rich and capable of reproducing a variety of distribution shapes, it is extensively used in lifetime modelling. The Gamma distribution is one of the potential lifetime distributions utilised to capture lifetime processes of humans and some other living things.

The Weibull distribution has two parameters: k and \(\lambda .\) Its associated pdf is

$$\begin{aligned} f(t)=\frac{k}{\lambda }\left( \frac{t}{\lambda }\right) ^{k-1}\exp \left( -\left( \frac{t}{\lambda }\right) ^k\right) , \ \lambda>0, \ k>0. \end{aligned}$$

When \(k>1\), the Weibull distribution has an increasing hazard rate, whilst the hazard rate is decreasing when \(k<1\). A Weibull distribution where \(k=1\) reduces to an exponential distribution with rate \(\lambda\). The Weibull distribution is popular in survival analysis and affords a sound basis for modelling lifetime distributions.

The Pareto distribution has two parameters: k and \(\sigma\). Its associated survival function is

$$\begin{aligned} S(t)=\left( 1+\frac{kt}{\sigma }\right) ^{-\frac{1}{k}}, \ k>0, \ \sigma >0. \end{aligned}$$

It is well-known that Pareto distributions are heavy-tailed.

A convolution of two exponential distributions has two parameters, and the random variable that represents a convolution of two exponential distributions is

$$\begin{aligned} Y=X_1+X_2, \end{aligned}$$

where \(X_1\) and \(X_2\) follow the exponential distributions with rates \(\lambda _1\) and \(\lambda _2\), respectively. The convolutions of different distributions have a physical interpretation; i.e, the process could be decomposed into several components and the time spent on each component follows a certain distribution. Specifically, the physical interpretation for the convolution mechanism is consistent with our cognizance of ageing, being a progressive process. Furthermore, it is often assumed that the time spent in each component follows an exponential distribution because an exponential distribution matches most observed data. Therefore, a convolution of exponential distributions is a further extension capable of accomplishing more flexibility.

A convolution of two Weibull distributions has four parameters, and the random variable representing a convolution of two Weibull distributions is

$$\begin{aligned} Z=W_1+W_2, \end{aligned}$$

where \(W_1\) and \(W_2\) follow the Weibull distribution with parameters \((\lambda _1, k_1)\) and \((\lambda _2, k_2)\) respectively. This is a simple extension of the previous example by changing the distributions of each component, similar to Huzurbazar (1999). Since the Weibull distribution is utilised in survival analysis, it is plausible for the ageing process that the time spent in each component follows a Weibull distribution as well. It is then worth exploring whether one could use the PTAM to approximate well a convolution of Weibull distributions.

The Gompertz-Makeham distribution has three parameters, and its associated hazard rate is

$$\begin{aligned} h(t)=\zeta +\xi \exp (\lambda t), \ \zeta \ge -\xi , \ \xi>0, \ \lambda >0{,} \end{aligned}$$

where \(\xi\) is the growth rate of mortality and \(\zeta\) represents the background mortality with additional constraint \(\zeta > 0\).

The Gompertz-Makeham distribution were proposed by Makeham (1860) and Gompertz (1825), and it assumes a shifted exponential increasing hazard rate with respect to age. Such a distribution fits human mortality rates from age 30 to 80 well. When we calibrate the PTAM with human lifetime observations only, the PTAM is essentially approximating the Gompertz-Makeham distribution.

The Makeham’s second extension of the Gompertz distribution has four parameters, and its survival function is

$$\begin{aligned} S(t)=\exp \left( -\xi (\exp (\lambda t)-1)-\xi \theta \lambda t-\xi \alpha (\lambda t)^2\right) , \ \xi>0, \ \lambda>0, \ \theta>0, \ \alpha >0. \end{aligned}$$

Makeham (1889) extended the assumptions on the differential equation for the hazard rate to higher-order derivatives. This was motivated by the observation that the third-order derivatives of the empirical hazard rates were much closer to a geometric progression. One may verify that the third-order derivatives, with respect to time t, of the hazard rate arising from the Makeham’s second extension of the Gompertz distribution is \(\xi \lambda ^3(\exp (\lambda ))^t\), which is a geometric progression with initial value \(\xi \lambda ^3\) and common ratio \(\exp (\lambda )\).

Once again, it is likely to observe human lifetimes following the extended Gompertz distribution. For each example, we select appropriate parameter values to achieve a particular shape of probability density function or hazard function. Once the parameter values are chosen, the target lifetime distributions are completely determined.

1.2 Appendix B.2: Calibration criterion

We shall use the Kullback-Leibler (KL) divergence (Kullback and Leibler 1951; Kullback 1997) to summarise the information loss when using the PTAM to approximate the target lifetime distribution, which is known. When calibrating the PTAM, we minimise the KL divergence to ensure minimal lost information.

Definition B.1

For continuous distributions P and Q with respective pdf’s p(x) and q(x) defined on a sample probability space, the KL divergence \(D_{KL}\) of Q from P is

$$\begin{aligned} D_{KL}(P\ || \ Q){:=}\int _{-\infty }^{+\infty }p(x)\log \left( \frac{p(x)}{q(x)}\right) \textrm{d}x{.} \end{aligned}$$

Usually, the distribution P represents the actual model, and the distribution Q represents the distribution used to approximate P. In our case, the distribution P is the target lifetime distribution in each example, and the distribution Q is the PTAM.

The KL divergence’s non-negativity is a nice property as discussed further below. Specifically,

$$\begin{aligned} D_{KL}(P\ || \ Q) \ge 0, \end{aligned}$$

with equality if and only if \(P=Q\). This result is known as the Gibbs’ inequality (Gibbs 1902). Thus, the divergence value could be interpreted as the extra information required for Q to represent P. Therefore, the smaller the divergence, the less extra information is required when using Q to approximate P and the better goodness of fit is for Q. Furthermore, by its non-negative property, the closer the divergence value to 0, the better the approximation.

In our case, we calibrate the PTAM by minimising the KL divergence of the PTAM from the target lifetime distribution. The analytical formula for the PTAM’s resulting distribution is quite involved. However, the resulting distribution is fairly easy to evaluate numerically for a given set of parameter values. Therefore, we numerically search for the minimum of \(D_{KL}(\text {lifetime distribution} \ || \ \text {proposed PTAM})\), where p(x) is the pdf of the target lifetime distribution, q(x) is the pdf of the PTAM, and

$$\begin{aligned} D_{KL}(\text {lifetime distribution} \ || \ \text {proposed PTAM})=\int _{0}^{+\infty }p(x)\log \left( \frac{p(x)}{q(x)}\right) \textrm{d}x. \end{aligned}$$

The numerical search utilises the optimisation function fmincon in MATLAB. The calibration process is similar to the model calibration in the Le Bras model simulation in Cheng et al. (2021), except for switching the optimisation criterion from negative log-likelihood to the KL divergence. The Tail Value-at-Risk (TVaR) with risk level 0.999 denoted by \(\text {TVaR}_{0.999}(T)\) is calculated in each example, and the calculated value is used to estimate \(\psi\). The estimated values for the other four parameters are the values that minimise the divergence. For the fitted results, refer to Tables S.1 – S.10 and Figures S.1 – S.10 of the Supplementary Material.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, B., Mamon, R. Examining the identifiability and estimability of the phase-type ageing model. Comput Stat 39, 963–1004 (2024). https://doi.org/10.1007/s00180-023-01329-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-023-01329-5

Keywords