Abstract
The mixture of factor analyzers model was first introduced over 20 years ago and, in the meantime, has been extended to several non-Gaussian analogs. In general, these analogs account for situations with heavy tailed and/or skewed clusters. An approach is introduced that unifies many of these approaches into one very general model: the mixture of hidden truncation hyperbolic factor analyzers (MHTHFA) model. In the process of doing this, a hidden truncation hyperbolic factor analysis model is also introduced. The MHTHFA model is illustrated for clustering as well as semi-supervised classification using two real datasets.
Similar content being viewed by others
References
Aitken, A.C. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh, 45, 14–22.
Andrews, J.L., & McNicholas, P.D. (2011a). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.
Andrews, J.L., & McNicholas, P.D. (2011b). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.
Arellano-Valle, R.B., & Genton, M.G. (2005). On fundamental skew distributions. Journal of Multivariate Analysis, 96(1), 93–116.
Baek, J., McLachlan, G.J., Flack, L.K. (2010). Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1298–1309.
Bhattacharya, S., & McNicholas, P.D. (2014). A LASSO-penalized BIC for mixture model selection. Advances in Data Analysis and Classification, 8(1), 45–61.
Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Computational Statistics and Data Analysis, 71, 52–78.
Browne, R.P., & McNicholas, P.D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176–198.
Franczak, B.C., Browne, R.P., McNicholas, P.D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.
Gallaugher, M.P.B., & McNicholas, P.D. (2017). A matrix variate skew-t distribution. Stat, 6, 160–170.
Gallaugher, M.P.B., & McNicholas, P.D. (2018). Finite mixtures of skewed matrix variate distributions. Pattern Recognition, 80, 83–93.
Gallaugher, M.P.B., & McNicholas, P.D. (2019a). On fractionally-supervised classification: weight selection and extension to the multivariate t-distribution. Journal of Classification 36. In press.
Gallaugher, M.P.B., & McNicholas, P.D. (2019b). Three skewed matrix variate distributions. Statistics and Probability Letters, 145, 103–109.
Ghahramani, Z., & Hinton, G.E. (1997). The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1 University of Toronto, Toronto, Canada.
Gorman, R.P., & Sejnowski, T.J. (1988). Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks, 1, 75–89.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.
Lawley, D.N., & Maxwell, A.E. (1962). Factor analysis as a statistical method. Journal of the Royal Statistical Society: Series D, 12(3), 209–229.
Lee, S., & McLachlan, G.J. (2014). Finite mixtures of multivariate skew t-distributions: some recent and new results. Statistics and Computing, 24, 181–202.
Lee, S.X., & McLachlan, G.J. (2016). Finite mixtures of canonical fundamental skew t-distributions: the unification of the restricted and unrestricted skew t-mixture models. Statistics and Computing, 26(3), 573–589.
Lichman, M. (2013). UCI machine learning repository. University of California, Irvine. School of Information and Computer Sciences.
Lin, T.-I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100, 257–265.
Lin, T.-I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.
Lin, T.-I., McNicholas, P.D., Hsiu, J.H. (2014). Capturing patterns via parsimonious t mixture models. Statistics and Probability Letters, 88, 80–87.
Lin, T., McLachlan, G.J., Lee, S.X. (2016). Extending mixtures of factor models using the restricted multivariate skew-normal distribution. Journal of Multivariate Analysis, 143, 398–413.
Lindsay, B.G. (1995). Mixture models: theory, geometry and applications. In NSF-CBMS regional conference series in probability and statistics, Vol. 5. Hayward: Institute of Mathematical Statistics.
McLachlan, G.J. (1992). Discriminant analysis and statistical pattern recognition. Hoboken: Wiley.
McLachlan, G.J., & Peel, D. (2000a). Finite mixture models. New York: Wiley.
McLachlan, G.J., & Peel, D. (2000b). Mixtures of factor analyzers. In Proceedings of the seventh international conference on machine learning (pp. 599–606). San Francisco: Morgan Kaufmann.
McNicholas, P.D. (2010). Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference, 140(5), 1175–1181.
McNicholas, P.D. (2016a). Mixture model-based classification. Boca Raton: Chapman & Hall/CRC Press.
McNicholas, P.D. (2016b). Model-based clustering. Journal of Classification, 33 (3), 331–373.
McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.
McNicholas, P.D., & Murphy, T.B. (2010). Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 26(21), 2705–2712.
McNicholas, S.M., McNicholas, P.D., Browne, R.P. (2017). A mixture of variance-gamma factor analyzers. In Ahmed, S.E. (Ed.) Big and complex data analysis: methodologies and applications (pp. 369–385). Cham: Springer International Publishing.
Meng, X.-L., & Rubin, D.B. (1993). Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 80, 267–278.
Murray, P.M., Browne, R.P., McNicholas, P.D. (2014a). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326–335.
Murray, P.M., McNicholas, P.D., Browne, R.B. (2014b). A mixture of common skew-t factor analyzers. Stat, 3(1), 68–82.
Murray, P.M., Browne, R.P., McNicholas, P.D. (2017a). Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. Journal of Multivariate Analysis, 161, 141–156.
Murray, P.M., Browne, R.P., McNicholas, P.D. (2017b). A mixture of SDB skew-t factor analyzers. Econometrics and Statistics, 3, 160–168.
Murray, P.M., Browne, R.P., McNicholas, P.D. (2019). Note of Clarification on ‘Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering, by Murray, Browne, and McNicholas, J. Multivariate Analysis 161 (2017) 141–156.’ Journal of Multivariate Analysis, 171, 475–476.
Peel, D., & McLachlan, G.J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.
R Core Team. (2018). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Sahu, K., Dey, D.K., Branco, M.D. (2003). A new class of multivariate skew distributions with applications to Bayesian regression models. Canadian Journal of Statistics, 31(2), 129–150. Corrigendum: vol. 37 (2009), 301-?302.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Steane, M.A., McNicholas, P.D., Yada, R. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics – Simulation and Computation, 41(4), 510–523.
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386–396.
Subedi, S., & McNicholas, P.D. (2014). Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Advances in Data Analysis and Classification, 8(2), 167–193.
Tang, Y., Browne, R.P., McNicholas, P.D. (2018). Flexible clustering of high-dimensional data via mixtures of joint generalized hyperbolic distributions. Stat, 7 (1), e177.
Tipping, M.E., & Bishop, C.M. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.
Tortora, C., McNicholas, P.D., Browne, R.P. (2016). A mixture of generalized hyperbolic factor analyzers. Advances in Data Analysis and Classification, 10(4), 423–440.
Tortora, C., Franczak, B.C., Browne, R.P., McNicholas, P.D. (2019). A mixture of coalesced generalized hyperbolic distributions. Journal of Classification, 36. To appear.
Vrbik, I., & McNicholas, P.D. (2012). Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Statistics and Probability Letters, 82(6), 1169–1174.
Vrbik, I., & McNicholas, P.D. (2014). Parsimonious skew mixture models for model-based clustering and classification. Computational Statistics and Data Analysis, 71, 196–210.
Vrbik, I., & McNicholas, P.D. (2015). Fractionally-supervised classification. Journal of Classification, 32(3), 359–381.
Yoshida, R., Higuchi, T., Imoto, S. (2004). A mixed factors model for dimension reduction and extraction of a group structure in gene expression data. In Proceedings of the 2004 IEEE computational systems bioinformatics conference (pp. 161–172).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: E-Step Calculations
Appendix: E-Step Calculations
Herein, we present the expectations required for the E-step of the ECM algorithm for the mixtures of HTH factor analyzers model.
1.1 A.1 \(\mathbb {E}[W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[1/W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\)
To derive the expectations \(\mathbb {E}[W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[1/W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) as well as \(\mathbb {E}[\log W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) in the following section, first note that
Therefore,
1.2 A.2 \(\mathbb {E}[\log W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\)
To update \(\mathbb {E}[\log W_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\), where Wig ∼ GIG(ψg, χg, λg), first note that
We can show that
where \(\mathbf {r}_{g}=\boldsymbol {\mu }_{g}-\boldsymbol {\alpha }_{g}\mathbf {a}_{\lambda _{g}}\) and \(\mathbf {k}_{g}=\boldsymbol {\Lambda }^{\prime }_{g}\boldsymbol {\Omega }_{g}^{-1}(\mathbf {x}_{i}-\boldsymbol {\mu }_{g})\). Therefore,
Let
then ζig ≥ 1 and \(W_{ig}\mid \mathbf {x}_{i}, \mathbf {v}_{ig},z_{ig}=1\sim \text {GIG}(\omega _{g}, \omega _{g} \zeta _{ig}^{2}, \tau )\). Consequently,
The reader is directed to the supplementary material in Murray et al. (2017a) for details on a method for estimating this expectation via a series expansion.
1.3 A.3 \(\mathbb {E}[(1/W_{ig})\mathbf {V}_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[(1/W_{ig})\mathbf {V}_{ig}\mathbf {V}_{ig}^{\prime }\mid \mathbf {x}_{i},z_{ig}=1]\)
Recall that Vig∣wig, zig = 1 ∼ HNr(wigIr). We can show that
where the support of Vig is \(\mathbb {R}_{+}^{r}\), i.e., the positive plane of \(\mathbb {R}_{r}\) and
It follows that
Here, \(\text {TH}_{r}(\boldsymbol {\mu },\mathbf {\Sigma }, \lambda ,\psi ,\chi ;\mathbb {R}_{+}^{r})\) denotes the r-dimensional symmetric truncated hyperbolic distribution with density
and \(\mathbb {I}_{\mathbb {R}_{+}^{r}}(\mathbf {u})=1\) when \(\mathbf {u}\in \mathbb {R}_{+}^{r}\) and 0 otherwise. In this way, the symmetric hyperbolic distribution is truncated to exist only within with region \(\mathbb {R}_{+}^{r}\). To update \(\mathbb {E}[(1/W_{ig})\mathbf {V}_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[(1/W_{ig})\mathbf {V}_{ig}\mathbf {V}_{ig}^{\prime }\mid \mathbf {x}_{i},z_{ig}=1]\), we can make use of the fact that
and
where
The expectations \(\mathbb {E}[\mathbf {Y}_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[\mathbf {Y}_{ig}\mathbf {Y}_{ig}^{\prime }\mid \mathbf {x}_{i},z_{ig}=1]\) can easily be estimated using the moments of the truncated symmetric hyperbolic distribution defined in Murray et al. (2017a).
1.4 A.4 \(\mathbb {E}[(1/W_{ig})\tilde {\mathbf {U}}_{ig}\mid \mathbf {x}_{i},z_{ig}=1]\) and \(\mathbb {E}[(1/W_{ig})\tilde {\mathbf {U}}_{ig}\tilde {\mathbf {U}}_{ig}^{\prime }\mid \mathbf {x}_{i},z_{ig}=1]\)
Note that \(\tilde {\mathbf {U}}_{ig}\mid \mathbf {x}_{i},\mathbf {v}_{ig},w_{ig},z_{ig}=1\sim \mathcal {N}_{q}(\mathbf {q},w_{ig}\mathbf {C})\) where \(\mathbf {q}=\mathbf {C}[\mathbf {d}+\mathbf {{\Lambda }}_{g}(\mathbf {V}_{ig}-\mathbf {a}_{\lambda _{g}})]\), \(\mathbf {d}=\tilde {\mathbf {B}}_{g}^{\prime }\mathbf {D}_{g}^{-1}(\mathbf {X}_{i}-\boldsymbol {\mu }_{g})\), and \(\mathbf {C}=(\mathbf {I}_{q}+\tilde {\mathbf {B}}_{g}^{\prime }\mathbf {D}_{g}^{-1}\tilde {\mathbf {B}}_{g})^{-1}\). We can show
Therefore, it follows that
and
Rights and permissions
About this article
Cite this article
Murray, P.M., Browne, R.P. & McNicholas, P.D. Mixtures of Hidden Truncation Hyperbolic Factor Analyzers. J Classif 37, 366–379 (2020). https://doi.org/10.1007/s00357-019-9309-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-019-9309-y