Mixtures of Hidden Truncation Hyperbolic Factor Analyzers | Journal of Classification
Skip to main content

Mixtures of Hidden Truncation Hyperbolic Factor Analyzers

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

The mixture of factor analyzers model was first introduced over 20 years ago and, in the meantime, has been extended to several non-Gaussian analogs. In general, these analogs account for situations with heavy tailed and/or skewed clusters. An approach is introduced that unifies many of these approaches into one very general model: the mixture of hidden truncation hyperbolic factor analyzers (MHTHFA) model. In the process of doing this, a hidden truncation hyperbolic factor analysis model is also introduced. The MHTHFA model is illustrated for clustering as well as semi-supervised classification using two real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitken, A.C. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh, 45, 14–22.

    MATH  Google Scholar 

  • Andrews, J.L., & McNicholas, P.D. (2011a). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.

    MathSciNet  MATH  Google Scholar 

  • Andrews, J.L., & McNicholas, P.D. (2011b). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.

    MathSciNet  MATH  Google Scholar 

  • Arellano-Valle, R.B., & Genton, M.G. (2005). On fundamental skew distributions. Journal of Multivariate Analysis, 96(1), 93–116.

    MathSciNet  MATH  Google Scholar 

  • Baek, J., McLachlan, G.J., Flack, L.K. (2010). Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1298–1309.

    Google Scholar 

  • Bhattacharya, S., & McNicholas, P.D. (2014). A LASSO-penalized BIC for mixture model selection. Advances in Data Analysis and Classification, 8(1), 45–61.

    MathSciNet  MATH  Google Scholar 

  • Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Computational Statistics and Data Analysis, 71, 52–78.

    MathSciNet  MATH  Google Scholar 

  • Browne, R.P., & McNicholas, P.D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176–198.

    MathSciNet  MATH  Google Scholar 

  • Franczak, B.C., Browne, R.P., McNicholas, P.D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.

    Google Scholar 

  • Gallaugher, M.P.B., & McNicholas, P.D. (2017). A matrix variate skew-t distribution. Stat, 6, 160–170.

    MathSciNet  Google Scholar 

  • Gallaugher, M.P.B., & McNicholas, P.D. (2018). Finite mixtures of skewed matrix variate distributions. Pattern Recognition, 80, 83–93.

    Google Scholar 

  • Gallaugher, M.P.B., & McNicholas, P.D. (2019a). On fractionally-supervised classification: weight selection and extension to the multivariate t-distribution. Journal of Classification 36. In press.

  • Gallaugher, M.P.B., & McNicholas, P.D. (2019b). Three skewed matrix variate distributions. Statistics and Probability Letters, 145, 103–109.

    MathSciNet  MATH  Google Scholar 

  • Ghahramani, Z., & Hinton, G.E. (1997). The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1 University of Toronto, Toronto, Canada.

  • Gorman, R.P., & Sejnowski, T.J. (1988). Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks, 1, 75–89.

    Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    MATH  Google Scholar 

  • Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.

    MathSciNet  Google Scholar 

  • Lawley, D.N., & Maxwell, A.E. (1962). Factor analysis as a statistical method. Journal of the Royal Statistical Society: Series D, 12(3), 209–229.

    Google Scholar 

  • Lee, S., & McLachlan, G.J. (2014). Finite mixtures of multivariate skew t-distributions: some recent and new results. Statistics and Computing, 24, 181–202.

    MathSciNet  MATH  Google Scholar 

  • Lee, S.X., & McLachlan, G.J. (2016). Finite mixtures of canonical fundamental skew t-distributions: the unification of the restricted and unrestricted skew t-mixture models. Statistics and Computing, 26(3), 573–589.

    MathSciNet  MATH  Google Scholar 

  • Lichman, M. (2013). UCI machine learning repository. University of California, Irvine. School of Information and Computer Sciences.

  • Lin, T.-I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100, 257–265.

    MathSciNet  MATH  Google Scholar 

  • Lin, T.-I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.

    MathSciNet  Google Scholar 

  • Lin, T.-I., McNicholas, P.D., Hsiu, J.H. (2014). Capturing patterns via parsimonious t mixture models. Statistics and Probability Letters, 88, 80–87.

    MathSciNet  MATH  Google Scholar 

  • Lin, T., McLachlan, G.J., Lee, S.X. (2016). Extending mixtures of factor models using the restricted multivariate skew-normal distribution. Journal of Multivariate Analysis, 143, 398–413.

    MathSciNet  MATH  Google Scholar 

  • Lindsay, B.G. (1995). Mixture models: theory, geometry and applications. In NSF-CBMS regional conference series in probability and statistics, Vol. 5. Hayward: Institute of Mathematical Statistics.

  • McLachlan, G.J. (1992). Discriminant analysis and statistical pattern recognition. Hoboken: Wiley.

    MATH  Google Scholar 

  • McLachlan, G.J., & Peel, D. (2000a). Finite mixture models. New York: Wiley.

    MATH  Google Scholar 

  • McLachlan, G.J., & Peel, D. (2000b). Mixtures of factor analyzers. In Proceedings of the seventh international conference on machine learning (pp. 599–606). San Francisco: Morgan Kaufmann.

  • McNicholas, P.D. (2010). Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference, 140(5), 1175–1181.

    MathSciNet  MATH  Google Scholar 

  • McNicholas, P.D. (2016a). Mixture model-based classification. Boca Raton: Chapman & Hall/CRC Press.

    MATH  Google Scholar 

  • McNicholas, P.D. (2016b). Model-based clustering. Journal of Classification, 33 (3), 331–373.

    MathSciNet  MATH  Google Scholar 

  • McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.

    MathSciNet  Google Scholar 

  • McNicholas, P.D., & Murphy, T.B. (2010). Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 26(21), 2705–2712.

    Google Scholar 

  • McNicholas, S.M., McNicholas, P.D., Browne, R.P. (2017). A mixture of variance-gamma factor analyzers. In Ahmed, S.E. (Ed.) Big and complex data analysis: methodologies and applications (pp. 369–385). Cham: Springer International Publishing.

  • Meng, X.-L., & Rubin, D.B. (1993). Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 80, 267–278.

    MathSciNet  MATH  Google Scholar 

  • Murray, P.M., Browne, R.P., McNicholas, P.D. (2014a). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326–335.

    MathSciNet  MATH  Google Scholar 

  • Murray, P.M., McNicholas, P.D., Browne, R.B. (2014b). A mixture of common skew-t factor analyzers. Stat, 3(1), 68–82.

    MathSciNet  MATH  Google Scholar 

  • Murray, P.M., Browne, R.P., McNicholas, P.D. (2017a). Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. Journal of Multivariate Analysis, 161, 141–156.

    MathSciNet  MATH  Google Scholar 

  • Murray, P.M., Browne, R.P., McNicholas, P.D. (2017b). A mixture of SDB skew-t factor analyzers. Econometrics and Statistics, 3, 160–168.

    MathSciNet  Google Scholar 

  • Murray, P.M., Browne, R.P., McNicholas, P.D. (2019). Note of Clarification on ‘Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering, by Murray, Browne, and McNicholas, J. Multivariate Analysis 161 (2017) 141–156.’ Journal of Multivariate Analysis, 171, 475–476.

  • Peel, D., & McLachlan, G.J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.

    Google Scholar 

  • R Core Team. (2018). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

    Google Scholar 

  • Sahu, K., Dey, D.K., Branco, M.D. (2003). A new class of multivariate skew distributions with applications to Bayesian regression models. Canadian Journal of Statistics, 31(2), 129–150. Corrigendum: vol. 37 (2009), 301-?302.

    MathSciNet  MATH  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

    MathSciNet  MATH  Google Scholar 

  • Steane, M.A., McNicholas, P.D., Yada, R. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics – Simulation and Computation, 41(4), 510–523.

    MathSciNet  MATH  Google Scholar 

  • Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386–396.

    Google Scholar 

  • Subedi, S., & McNicholas, P.D. (2014). Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Advances in Data Analysis and Classification, 8(2), 167–193.

    MathSciNet  MATH  Google Scholar 

  • Tang, Y., Browne, R.P., McNicholas, P.D. (2018). Flexible clustering of high-dimensional data via mixtures of joint generalized hyperbolic distributions. Stat, 7 (1), e177.

    MathSciNet  Google Scholar 

  • Tipping, M.E., & Bishop, C.M. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.

    Google Scholar 

  • Tortora, C., McNicholas, P.D., Browne, R.P. (2016). A mixture of generalized hyperbolic factor analyzers. Advances in Data Analysis and Classification, 10(4), 423–440.

    MathSciNet  MATH  Google Scholar 

  • Tortora, C., Franczak, B.C., Browne, R.P., McNicholas, P.D. (2019). A mixture of coalesced generalized hyperbolic distributions. Journal of Classification, 36. To appear.

  • Vrbik, I., & McNicholas, P.D. (2012). Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Statistics and Probability Letters, 82(6), 1169–1174.

    MathSciNet  MATH  Google Scholar 

  • Vrbik, I., & McNicholas, P.D. (2014). Parsimonious skew mixture models for model-based clustering and classification. Computational Statistics and Data Analysis, 71, 196–210.

    MathSciNet  MATH  Google Scholar 

  • Vrbik, I., & McNicholas, P.D. (2015). Fractionally-supervised classification. Journal of Classification, 32(3), 359–381.

    MathSciNet  MATH  Google Scholar 

  • Yoshida, R., Higuchi, T., Imoto, S. (2004). A mixed factors model for dimension reduction and extraction of a group structure in gene expression data. In Proceedings of the 2004 IEEE computational systems bioinformatics conference (pp. 161–172).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul D. McNicholas.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: E-Step Calculations

Appendix: E-Step Calculations

Herein, we present the expectations required for the E-step of the ECM algorithm for the mixtures of HTH factor analyzers model.

1.1 A.1 \(\mathbb {E}[W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[1/W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\)

To derive the expectations \(\mathbb {E}[W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[1/W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) as well as \(\mathbb {E}[\log W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) in the following section, first note that

$$ \begin{array}{lll} f(\!\!\!\!&w_{ig}\mid\mathbf{x}_{i},z_{ig}=1) =\frac{\boldsymbol{w}_{ig}^{\lambda_{g}-p/2-1}}{2K_{\lambda_{g}-p/2}(\sqrt{\omega_{g}(\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g}))})}\left[ \frac{\omega_{g}}{\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right]^{(\lambda_{g}-p/2)/2}\\ &\times\exp\left\{\omega_{g} {w}_{ig}+\frac{\omega_{g}+\delta_{g}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}{{w}_{ig}}\right\}{\Phi}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})/\sqrt{{w}_{ig}}\mid\boldsymbol{\Delta}_{g}\right)\\ &\div H_{r}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})\left( \frac{\omega_{g}}{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right)^{1/4}|\mathbf{0},\boldsymbol{\Delta}_{g},\lambda_{g}-(p/2),\gamma_{g},\gamma_{g}\right). \end{array} $$
(8)

Therefore,

$$ \begin{array}{lll} \mathbb{E}&\left[W_{ig}\mid\mathbf{x}_{i},z_{ig}=1 \right]\\ &={\int}^{\infty}_{0}\frac{w^{\lambda_{g}-p/2}}{2K_{\lambda_{g}-p/2}(\sqrt{\omega_{g}(\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g}))})}\left[ \frac{\omega_{g}}{\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right]^{(\lambda_{g}-p/2)/2}\\ &\qquad\times\exp\left\{\omega_{g} w+\frac{\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}{w}\right\}{\Phi}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})/\sqrt{w}\mid\boldsymbol{\Delta}_{g}\right)\\ &\qquad\div H_{r}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})\left( \frac{\omega_{g}}{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right)^{1/4}|\mathbf{0},\boldsymbol{\Delta}_{g},\lambda_{g}-(p/2),\gamma_{g},\gamma_{g}\right)dw\\ &=\frac{K_{\lambda_{g}-p/2+1}(\sqrt{\omega_{g}(\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g}))})}{K_{\lambda_{g}-p/2}(\sqrt{\omega_{g}(\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g}))})} \left[ \frac{\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}{\omega_{g}}\right]^{1/2}\\ &\qquad\times H_{r}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})\left( \frac{\omega_{g}}{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right)^{1/4}|\mathbf{0},\boldsymbol{\Delta}_{g},\lambda_{g}-(p/2)+1,\gamma_{g},\gamma_{g}\right)\\ &\qquad\div H_{q}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})\left( \frac{\omega_{g}}{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right)^{1/4}|\mathbf{0},\boldsymbol{\Delta}_{g},\lambda_{g}-(p/2),\gamma_{g},\gamma_{g}\right), \end{array} $$
$$ \begin{array}{lll} \mathbb{E}&\left[1/W_{ig}\mid\mathbf{x}_{i},z_{ig}=1 \right]\\ &={\int}^{\infty}_{0}\frac{w^{\lambda_{g}-p/2-2}}{2K_{\lambda_{g}-p/2}(\sqrt{\omega_{g}(\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g}))})}\left[ \frac{\omega_{g}}{\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right]^{(\lambda_{g}-p/2)/2}\\ &\qquad\times\exp\left\{\omega_{g} w+\frac{\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}{w}\right\}{\Phi}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})/\sqrt{w}\mid\boldsymbol{\Delta}_{g}\right)\\ &\qquad\div H_{r}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})\left( \frac{\omega_{g}}{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right)^{1/4}|\mathbf{0},\boldsymbol{\Delta}_{g},\lambda_{g}-(p/2),\gamma_{g},\gamma_{g}\right)dw\\ &=\frac{K_{\lambda_{g}-p/2-1}(\sqrt{\omega_{g}(\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g}))})}{K_{\lambda_{g}-p/2}(\sqrt{\omega_{g}(\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g}))})} \left[ \frac{\omega_{g}+\delta(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}{\omega_{g}}\right]^{-1/2}\\ &\qquad\times H_{r}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})\left( \frac{\omega_{g}}{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right)^{1/4}|\mathbf{0},\boldsymbol{\Delta}_{g},\lambda_{g}-(p/2)-1,\gamma_{g},\gamma_{g}\right)\\ &\qquad\div H_{q}\left( \boldsymbol{\alpha}_{g}^{\prime}\boldsymbol{\Omega}_{g}^{-1}(\mathbf{x}_{i}-\mathbf{r}_{g})\left( \frac{\omega_{g}}{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right)^{1/4}|\mathbf{0},\boldsymbol{\Delta}_{g},\lambda_{g}-(p/2),\gamma_{g},\gamma_{g}\right). \end{array} $$

1.2 A.2 \(\mathbb {E}[\log W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\)

To update \(\mathbb {E}[\log W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\), where Wig ∼ GIG(ψg, χg, λg), first note that

$$ \mathbb{E}[ \log W_{ig}\mid z_{ig}=1] = \frac{ \mathrm{d} }{ \mathrm{d} \lambda } \log K_{\lambda} \left( \sqrt{ \chi_{g} \psi_{g} } \right) + \log \left( \sqrt{ \frac{\chi_{g}}{\psi_{g}} } \right). $$

We can show that

$$ W_{ig}\mid\mathbf{x}_{i},\mathbf{v}_{ig},z_{ig}=1\sim \text{GIG}(\omega_{g},\omega_{g} + (\mathbf{v}_{ig}-\mathbf{k}_{g})^{\prime}\boldsymbol{\Delta}_{g}^{-1}(\mathbf{v}_{ig}-\mathbf{k}_{g})+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g}),{\lambda},_{g}-(p+r)/2)), $$

where \(\mathbf {r}_{g}=\boldsymbol {\mu }_{g}-\boldsymbol {\alpha }_{g}\mathbf {a}_{\lambda _{g}}\) and \(\mathbf {k}_{g}=\boldsymbol {\Lambda }^{\prime }_{g}\boldsymbol {\Omega }_{g}^{-1}(\mathbf {x}_{i}-\boldsymbol {\mu }_{g})\). Therefore,

$$ \mathbb{E}[\log W_{ig}\mid\mathbf{x}_{i},\mathbf{v}_{ig},z_{ig}=1] = \frac{ \mathrm{d} }{ \mathrm{d} \tau } \log K_{\tau} \left( \sqrt{ \chi^{*} \psi^{*} } \right) + \log \left( \sqrt{ \frac{\chi^{*}}{\psi^{*}} } \right). $$

Let

$$ \zeta_{ig} = \sqrt{ 1 + \frac{\delta\left( \mathbf{x}_{i}\mid \boldsymbol{\mu}_{g}, \boldsymbol{\Sigma}_{g}\right) + (\mathbf{v}_{ig} - \mathbf{k}_{g})^{\prime} \boldsymbol{\Delta}_{g}^{-1} (\mathbf{v}_{ig} - \mathbf{k}_{g}) }{ \omega_{g} } }, $$

then ζig ≥ 1 and \(W_{ig}\mid \mathbf {x}_{i}, \mathbf {v}_{ig},z_{ig}=1\sim \text {GIG}(\omega _{g}, \omega _{g} \zeta _{ig}^{2}, \tau )\). Consequently,

$$ \mathbb{E}[ \log W_{ig} \mid \mathbf{x}_{i}, \mathbf{v}_{ig},z_{ig}=1] = \frac{ \mathrm{d} }{ \mathrm{d} \tau } \log K_{\tau} \left( \omega_{g} \zeta_{ig}\right) + \log\zeta_{ig}. $$

The reader is directed to the supplementary material in Murray et al. (2017a) for details on a method for estimating this expectation via a series expansion.

1.3 A.3 \(\mathbb {E}[(1/W_{ig})\mathbf {V}_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[(1/W_{ig})\mathbf {V}_{ig}\mathbf {V}_{ig}^{\prime }\mid \mathbf {x}_{i},z_{ig}=1]\)

Recall that Vigwig, zig = 1 ∼ HNr(wigIr). We can show that

$$ \begin{array}{lll} f(\mathbf{v}_{ig}\mid\mathbf{x}_{i},z_{ig}=1)=\frac{1}{c_{\lambda}} h_{r}\left( \mathbf{v}_{ig}~|~\mathbf{k}_{g},\sqrt{\frac{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}{\omega_{g}}}\boldsymbol{\Delta}_{g},\lambda_{g}-\frac{p}{2},\gamma_{g},\gamma_{g}\right), \end{array} $$
(9)

where the support of Vig is \(\mathbb {R}_{+}^{r}\), i.e., the positive plane of \(\mathbb {R}_{r}\) and

$$c_{\lambda}= H_{r}\left( \mathbf{k}\left( \frac{\omega}{\omega+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}\right)^{1/4}|\mathbf{0},\boldsymbol{\Delta}_{g},\lambda_{g}-\frac{p}{2},\gamma_{g},\gamma_{g} \right).$$

It follows that

$$\mathbf{V}_{ig}\mid w_{ig},\mathbf{x}_{i},z_{ig}=1 \sim \text{TH}_{r}\left( \mathbf{k}_{g}, \sqrt{\frac{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}{\omega_{g}}}\boldsymbol{\Delta}_{g},\lambda_{g}-\frac{p}{2}, \gamma_{g},\gamma_{g});\mathbb{R}_{+}^{r}\right).$$

Here, \(\text {TH}_{r}(\boldsymbol {\mu },\mathbf {\Sigma }, \lambda ,\psi ,\chi ;\mathbb {R}_{+}^{r})\) denotes the r-dimensional symmetric truncated hyperbolic distribution with density

$$f_{\text{TH}}(\mathbf{v}\mid\boldsymbol{\mu},\boldsymbol{\Sigma},\lambda,\psi,\chi;\mathbb{R}_{+}^{r})= \frac{h_{r}(\mathbf{v}\mid\boldsymbol{\mu},\boldsymbol{\Sigma},\lambda,\psi,\chi)}{{\int}^{\infty}_{0}\ldots {\int}^{\infty}_{0}h_{r}(\mathbf{v}\mid\boldsymbol{\mu},\boldsymbol{\Sigma},\lambda,\psi,\chi)d\mathbf{v}}\mathbb{I}_{\mathbb{R}_{+}^{r}}(\mathbf{v}),$$

and \(\mathbb {I}_{\mathbb {R}_{+}^{r}}(\mathbf {u})=1\) when \(\mathbf {u}\in \mathbb {R}_{+}^{r}\) and 0 otherwise. In this way, the symmetric hyperbolic distribution is truncated to exist only within with region \(\mathbb {R}_{+}^{r}\). To update \(\mathbb {E}[(1/W_{ig})\mathbf {V}_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[(1/W_{ig})\mathbf {V}_{ig}\mathbf {V}_{ig}^{\prime }\mid \mathbf {x}_{i},z_{ig}=1]\), we can make use of the fact that

$$\mathbb{E}[(1/W_{ig})\mathbf{V}_{ig}\mid \mathbf{x}_{i},z_{ig}=1]=\mathbb{E}[(1/W_{ig})\mid \mathbf{x}_{i},z_{ig}=1]\mathbb{E}[\mathbf{Y}_{ig}\mid \mathbf{x}_{i},z_{ig}=1]$$

and

$$\mathbb{E}[(1/W_{ig})\mathbf{V}_{ig}\mathbf{V}_{ig}^{\prime}\mid \mathbf{x}_{i},z_{ig}=1]=\mathbb{E}[(1/W_{ig})\mid \mathbf{x}_{i},z_{ig}=1]\mathbb{E}[\mathbf{Y}_{ig}\mathbf{Y}_{ig}^{\prime}\mid \mathbf{x}_{i},z_{ig}=1],$$

where

$$\mathbf{Y}_{ig}\mid w_{ig},\mathbf{x}_{i} ,z_{ig}=1\sim \text{TH}_{r}\left( \mathbf{k}_{g}, \sqrt{\frac{\omega_{g}+\boldsymbol{\delta}(\mathbf{x}_{i}\mid\mathbf{r}_{g},\boldsymbol{\Omega}_{g})}{\omega_{g}}}\boldsymbol{\Delta}_{g},\lambda_{g}-\frac{p}{2}-1, \gamma_{g},\gamma_{g};\mathbb{R}_{+}^{r}\right).$$

The expectations \(\mathbb {E}[\mathbf {Y}_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[\mathbf {Y}_{ig}\mathbf {Y}_{ig}^{\prime }\mid \mathbf {x}_{i},z_{ig}=1]\) can easily be estimated using the moments of the truncated symmetric hyperbolic distribution defined in Murray et al. (2017a).

1.4 A.4 \(\mathbb {E}[(1/W_{ig})\tilde {\mathbf {U}}_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[(1/W_{ig})\tilde {\mathbf {U}}_{ig}\tilde {\mathbf {U}}_{ig}^{\prime }\mid \mathbf {x}_{i},z_{ig}=1]\)

Note that \(\tilde {\mathbf {U}}_{ig}\mid \mathbf {x}_{i},\mathbf {v}_{ig},w_{ig},z_{ig}=1\sim \mathcal {N}_{q}(\mathbf {q},w_{ig}\mathbf {C})\) where \(\mathbf {q}=\mathbf {C}[\mathbf {d}+\mathbf {{\Lambda }}_{g}(\mathbf {V}_{ig}-\mathbf {a}_{\lambda _{g}})]\), \(\mathbf {d}=\tilde {\mathbf {B}}_{g}^{\prime }\mathbf {D}_{g}^{-1}(\mathbf {X}_{i}-\boldsymbol {\mu }_{g})\), and \(\mathbf {C}=(\mathbf {I}_{q}+\tilde {\mathbf {B}}_{g}^{\prime }\mathbf {D}_{g}^{-1}\tilde {\mathbf {B}}_{g})^{-1}\). We can show

$$ \begin{array}{lll} &&\mathbb{E}[\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},z_{ig}=1] =\mathbb{E}\{\mathbb{E}[\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},\mathbf{v}_{ig},w_{ig},z_{ig}=1]\mid\mathbf{x}_{i} ,z_{ig}=1\}\\ &=&\mathbb{E}\{\mathbf{C}[ \mathbf{d}+\mathbf{{\Lambda} }_{g}(\mathbf{V}_{ig}-\mathbf{a}_{\lambda_{g}})]\mid\mathbf{x}_{i},z_{ig}=1\} =\mathbf{C}\{ \mathbf{d}+\mathbf{{\Lambda} }_{g}(\mathbb{E}[\mathbf{V}_{ig}\mid\mathbf{x}_{i},z_{ig}=1]-\mathbf{a}_{\lambda_{g}})\}. \end{array} $$

Therefore, it follows that

$$ \begin{array}{lll} \mathbb{E}&[(1/W_{ig})\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},z_{ig}=1] =\mathbb{E}\{\mathbb{E}[(1/W_{ig}) \tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},\mathbf{v}_{ig},w_{ig},z_{ig}=1]\mid\mathbf{x}_{i},z_{ig}=1 \}\\ &=\mathbb{E}\{(1/W_{ig})[ \mathbf{C}\mathbf{d}+\mathbf{C}\mathbf{{\Lambda} }_{g}(\mathbf{V}_{ig}-\mathbf{a}_{\lambda_{g}})]\mid\mathbf{x}_{i},z_{ig}=1\}\\ &=\mathbf{C}\{\mathbf{d}\mathbb{E}[1/W_{ig}\mid\mathbf{x}_{i},z_{ig}=1]+\mathbf{{\Lambda} }_{g}(\mathbb{E}[(1/W_{ig})\mathbf{V}_{ig}\mid \mathbf{x}_{i},z_{ig}=1]-\mathbf{a}_{\lambda_{g}}\mathbb{E}[1/W_{ig}\mid\mathbf{x}_{i},z_{ig}=1])\},\\ \mathbb{E}&[(1/W_{ig})\mathbf{V}_{ig}\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i} ,z_{ig}=1] =\mathbb{E}\{\mathbb{E}[(1/W_{ig})\mathbf{V}_{ig} \tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},\mathbf{v}_{ig},w_{ig},z_{ig}=1]\mid\mathbf{x}_{i},z_{ig}=1\}\\ &=\mathbb{E}\{(1/W_{ig})\mathbf{V}_{ig}[ \mathbf{C}\mathbf{d}+\mathbf{C}\mathbf{{\Lambda} }_{g}(\mathbf{V}_{ig}-\mathbf{a}_{\lambda_{g}})]\mid\mathbf{x}_{i} ,z_{ig}=1\}\\ &=\mathbf{C} \{ \mathbf{d} \mathbb{E}[(1/W_{ig})\mathbf{V}_{ig}\mid\mathbf{x}_{i},z_{ig}=1]+\mathbf{{\Lambda} }_{g}(\mathbb{E}[(1/W_{ig})\mathbf{V}_{ig}\mathbf{V}_{ig}^{\prime}\mid \mathbf{x}_{i},z_{ig}=1]\\&\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad- \mathbf{a}_{\lambda_{g}}\mathbb{E}[(1/W_{ig})\mathbf{V}_{ig}\mid\mathbf{x}_{i},z_{ig}=1])\}, \end{array} $$

and

$$ \begin{array}{lll} \mathbb{E}&[(1/W_{ig})\tilde{\mathbf{U}}_{ig}\tilde{\mathbf{U}}_{ig}^{\prime}\mid\mathbf{x}_{i},z_{ig}=1] =\mathbb{E}\{(1/W_{ig})\mathbb{E}[\tilde{\mathbf{U}}_{ig}\tilde{\mathbf{U}}_{ig}^{\prime}\mid\mathbf{x}_{i},\mathbf{v}_{ig},w_{ig},z_{ig}=1]\mid\mathbf{x}_{i},z_{ig}=1\}\\ &=\mathbb{E}\{(1/W_{ig})(\mathbb{E}[\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},\mathbf{v}_{ig},w_{ig},z_{ig}=1]\mathbb{E}[\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},\mathbf{v}_{ig},w_{ig},z_{ig}=1]^{\prime}+W_{ig}\mathbf{C})\mid\mathbf{x}_{i} ,z_{ig}=1\}\\ &=\mathbb{E}\{(1/W_{ig})(\mathbb{E}[\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},\mathbf{v}_{ig},w_{ig},z_{ig}=1][\mathbf{C}\mathbf{d}+\mathbf{C}\mathbf{{\Lambda} }_{g}(\mathbf{V}_{ig}-\mathbf{a}_{\lambda_{g}})]^{\prime})+\mathbf{C}\mid\mathbf{x}_{i},z_{ig}=1 \}\\ &=\{ (\mathbb{E}[(1/W_{ig})\mathbf{V}_{ig} \tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},z_{ig}=1]-\mathbf{a}_{\lambda_{g}}\mathbb{E}[(1/W_{ig})\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},z_{ig}=1])\mathbf{{\Lambda}}_{g}^{\prime}\\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad+\mathbb{E}[(1/W_{ig})\tilde{\mathbf{U}}_{ig}\mid\mathbf{x}_{i},z_{ig}=1]\mathbf{d}^{\prime}+\mathbf{I}_{q} \}\mathbf{C}. \end{array} $$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Murray, P.M., Browne, R.P. & McNicholas, P.D. Mixtures of Hidden Truncation Hyperbolic Factor Analyzers. J Classif 37, 366–379 (2020). https://doi.org/10.1007/s00357-019-9309-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-9309-y

Keywords