A mixture of generalized hyperbolic factor analyzers | Advances in Data Analysis and Classification Skip to main content
Log in

A mixture of generalized hyperbolic factor analyzers

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

The mixture of factor analyzers model, which has been used successfully for the model-based clustering of high-dimensional data, is extended to generalized hyperbolic mixtures. The development of a mixture of generalized hyperbolic factor analyzers is outlined, drawing upon the relationship with the generalized inverse Gaussian distribution. An alternating expectation-conditional maximization algorithm is used for parameter estimation, and the Bayesian information criterion is used to select the number of factors as well as the number of components. The performance of our generalized hyperbolic factor analyzers model is illustrated on real and simulated data, where it performs favourably compared to its Gaussian analogue and other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitken A (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edim 46:289–305

    Article  MATH  Google Scholar 

  • Andrews JL, McNicholas PD (2011a) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373

    Article  MathSciNet  MATH  Google Scholar 

  • Andrews JL, McNicholas PD (2011b) Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. J Stat Plan Inference 141(4):1479–1486

    Article  MathSciNet  MATH  Google Scholar 

  • Andrews JL, McNicholas P (2012) Model-based clustering, classification, and discriminant analysis via mixtures of multivariate \(t\)-distributions. Stat Comput 22(5):1021–1029

    Article  MathSciNet  MATH  Google Scholar 

  • Baek J, McLachlan GJM, Flack L (2010) Mixtures of factor analyzers with common factor loadings: Applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309

    Article  Google Scholar 

  • Barndorff-Nielsen O, Halgreen C (1977) Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions. Z. Wahrscheinlichkeitstheor Verw. Geb 38:309–311

    Article  MathSciNet  MATH  Google Scholar 

  • Bergé L, Bouveyron C, Girard S (2013) Hdclassif: high dimensional supervised classification and clustering. R Package Version 1(2):2

    Google Scholar 

  • Bhattacharya S, McNicholas PD (2014) A LASSO-penalized BIC for mixture model selection. Adv Data Anal Classif 8(1):45–61

    Article  MathSciNet  Google Scholar 

  • Blæsild P (1978) The shape of the generalized inverse Gaussian and hyperbolic distributions. In: Research Report 37, Department of Theoretical Statistics. Aarhus University, Denmark

  • Böhning D, Diez E, Scheub R, Schlattmann P, Lindsay B (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46:373–388

    Article  MATH  Google Scholar 

  • Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519

    Article  MathSciNet  MATH  Google Scholar 

  • Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78

    Article  MathSciNet  MATH  Google Scholar 

  • Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Can J Stat. doi:10.1002/cjs.11246

  • Browne RP, McNicholas PD, Sparling MD (2012) Model-based learning using a mixture of mixtures of Gaussian and uniform distributions. IEEE Trans Pattern Anal Mach Intell 34(4):814–817

    Article  Google Scholar 

  • Browne RP, McNicholas PD (2014) Estimating common principal components in high dimensions. Adv Data Anal Classif 8(2):217–226

    Article  MathSciNet  Google Scholar 

  • Campbell JG, Fraley F, Murtagh F, Raftery AE (1997) Linear flaw detection in woven textiles using model-based clustering. Pattern Recogn Lett 18:1539–1548

    Article  Google Scholar 

  • Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai K-M, Ji J, Dudoit S, Ng IO, van de Rijn M, Botstein D, Brown PO (2002) Gene expression patterns in human liver cancers. Mol Biol Cell 13(6):1929–1939

    Article  Google Scholar 

  • Dasgupta A, Raftery AE (1998) Detecting features in spatial point processed with clutter via model-based clustering. J Am Stat Assoc 93:294–302

    Article  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Forina M, Armanino C (1982) Eigenvector projection and simplified non linear mapping of fatty acid content of Italian olive oils. Ann Chim 72:127–141

    Google Scholar 

  • Forina M, Tiscornia E (1982) Pattern recognition methods in the prediction of Italian olive oil origin by their fatty acid content. Ann Chim 72:143–155

    Google Scholar 

  • Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189–201

    Google Scholar 

  • Franczak BC, McNicholas PD, Browne RP, Murray PM (2013) Parsimonious shifted asymmetric Laplace mixtures. ArXiv preprint arXiv:1311.0317

  • Franczak BC, Browne RP, McNicholas PD (2014) Mixtures of shifted asymmetric Laplace distributions. IEEE Trans Pattern Anal Mach Intell 36(6):1149–1157

    Article  Google Scholar 

  • Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. In: Technical Report CRG-TR-96-1. University of Toronto, Toronto

  • Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40:237–260

    Article  MathSciNet  MATH  Google Scholar 

  • Gorman RP, Sejnowski TJ (1988) Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw 1(1):75–89

    Article  Google Scholar 

  • Halgreen C (1979) Self-decomposibility of the generalized inverse Gaussian and hyperbolic distributions. Z. Wahrscheinlichkeitstheor Verw. Geb 47:13–18

    Article  MathSciNet  MATH  Google Scholar 

  • Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Classif 4:3–34

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  MATH  Google Scholar 

  • Jørgensen B (1982) Statistical properties of the generalized inverse Gaussian distribution. Springer, New York

    Book  MATH  Google Scholar 

  • Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19(1):73–83

    Article  MathSciNet  Google Scholar 

  • Lee SX, McLachlan GJ (2013b) On mixtures of skew normal and skew t-distributions. Adv Data Anal Classif 7(3):241–266

    Article  MathSciNet  MATH  Google Scholar 

  • Lee S, McLachlan G (2013a). EMMIXuskew: fitting unrestricted multivariate skew t mixture models. R package version 0.11-5

  • Lin T-I, McLachlan GJ, Lee SX (2013) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. ArXiv preprint arXiv:1307.1748

  • Lin T-I (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100:257–265

    Article  MathSciNet  MATH  Google Scholar 

  • Lin T-I (2010) Robust mixture modeling using multivariate skew t distributions. Stat Comput 20(3):343–356

    Article  MathSciNet  Google Scholar 

  • Lin T-I, McNicholas PD, Hsiu JH (2014) Capturing patterns via parsimonious t mixture models. Stat Probab Lett 88:80–87

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay B (1995). Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics, vol 5. Institute of Mathematical Statistics, Hayward, California

  • Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sin 14:41–67

    MathSciNet  MATH  Google Scholar 

  • Markos A, Iodice D’Enza A, Van de Velden M (2013) clustrd: methods for joint dimension reduction and clustering. R package version 0.1.2

  • Maugis C, Celeux G, Martin-Magniette M (2009) Variable selection in model-based clustering: a general variable role modeling. Comput Stat Data Anal 53(11):3872–3882

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Mixtures of factor analyzers. In: Proceedings of the seventh international conference on machine learning. San Francisco, Morgan Kaufmann, pp 599–606

  • McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41:379–388

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Bean RW, Jones LB-T (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput Stat Data Anal 51(11):5327–5338

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas SM, McNicholas PD, Browne RP (2013) Mixtures of variance-gamma distributions. Arxiv preprint arXiv:1309.2695

  • McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296

    Article  MathSciNet  Google Scholar 

  • McNicholas PD (2010) Model-based classification using latent Gaussian mixture models. J Stat Plan Inference 140(5):1175–1181

    Article  MathSciNet  MATH  Google Scholar 

  • McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705–2712

    Article  Google Scholar 

  • McNicholas PD, Jampani KR, McDaid AF, Murphy TB, Banks L (2014) Pgmm: parsimonious Gaussian mixture models. R Package Version 1:1

    Google Scholar 

  • Meng X, Van Dyk D (1997) The EM algorithm-an old folk song sung to a fast new tune. J R Stat Soc Ser B (Stat Methodol) 59(3):511–567

    Article  MathSciNet  MATH  Google Scholar 

  • Montanari A, Viroli C (2011) Maximum likelihood estimation of mixtures of factor analyzers. Comput Stat Data Anal 55:2712–2723

    Article  MathSciNet  Google Scholar 

  • Morris K, McNicholas PD, Scrucca L (2013) Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Adv Data Anal Classif 7(3):321–338

    Article  MathSciNet  MATH  Google Scholar 

  • Morris K, McNicholas PD (2013) Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions. Stat Probab Lett 83(9):2088–2093

    Article  MathSciNet  MATH  Google Scholar 

  • Murray PM, Browne RB, McNicholas PD (2013) Mixtures of ‘unrestricted’ skew-t factor analyzers. Arxiv preprint arXiv:1310.6224

  • Murray PM, Browne RB, McNicholas PD (2014a) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335

    Article  MathSciNet  Google Scholar 

  • Murray PM, McNicholas PD, Browne RB (2014b) A mixture of common skew-\(t\) factor analyzers. Stat 3(1):68–82

    Article  MathSciNet  Google Scholar 

  • O’Hagan A, Murphy TB, Gormley IC, McNicholas PD, Karlis D (2014) Clustering with the multivariate normal inverse Gaussian distribution. Comput Stat Data Anal. doi:10.1016/j.csda.2014.09.006

  • R Core Team (2014) R: a language and environment for statistical computing. In: R foundation for statistical computing. Vienna, Austria

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850

    Article  Google Scholar 

  • Ritter G (2014) Robust cluster analysis and variable selection. Chapman & Hall, Boca Raton

    MATH  Google Scholar 

  • Rocci R, Gattone SA, Vichi M (2011) A new dimension reduction method: factor discriminant k-means. J Classif 28(2):210–226

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Steane MA, McNicholas PD, Yada R (2012) Model-based classification via mixtures of multivariate t-factor analyzers. Commun Stat-Simul Comput 41(4):510–523

    Article  MathSciNet  MATH  Google Scholar 

  • Subedi S, McNicholas PD (2014) Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Adv Data Anal Classif 8(2):167–193

    Article  MathSciNet  Google Scholar 

  • Tan PJ, Dowe DL (2005) MML inference of oblique decision trees. In: AI 2004: advances in artificial intelligence. Springer, Berlin, Heidelberg, pp 1082–1088

  • Timmerman ME, Ceulemans E, De Roover K, Van Leeuwen K (2013) Subspace K-means clustering. Behav Res Methods 45(4):1011–1023

  • Tortora C, Browne RP, Franczak BC, McNicholas PD (2015) MixGHD: model based clustering and classification using the mixture of generalized hyperbolic distributions. R Package Version 1:4

    Google Scholar 

  • Vichi M, Kiers H (2001) Factorial k-means analysis for two way data. Comput Stat Data Anal 37:29–64

    Article  MathSciNet  MATH  Google Scholar 

  • Vrbik I, McNicholas PD (2012) Analytic calculations for the EM algorithm for multivariate skew-mixture models. Stat Probab Lett 82(6):1169–1174

    Article  MathSciNet  MATH  Google Scholar 

  • Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210

    Article  MathSciNet  Google Scholar 

  • Wang K, Ng A, McLachlan G (2013) EMMIXskew: the EM algorithm and skew mixture distribution. R Package Version 1:1

  • Wei Y, McNicholas PD (2014) Mixture model averaging for clustering. Adv Data Anal Classif. doi:10.1007/s11634-014-0182-6

  • Woodbury M (1950) Inverting modified matrices. In: Technical Report 42. Princeton University, Princeton

Download references

Acknowledgments

The authors are grateful to an associate editor and anonymous reviewers for their very helpful comments and suggestions, the cumulative effect of which has been a stronger manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina Tortora.

Additional information

This work was supported by a grant-in-aid from Compusense Inc. as well as a Collaborative Research and Development grant from the Natural Sciences and Engineering Research Council of Canada.

Appendix: Updates for component covariance parameters

Appendix: Updates for component covariance parameters

At the second stage for our AECM algorithm, the (conditional) expected value of complete-data log-likelihood is given by

$$\begin{aligned} {Q}_2= & {} C-\frac{1}{2}\sum _{i=1}^n \sum _{g=1}^G\hat{z}_{ig}\log |\varvec{\varPsi }_g|-\frac{1}{2}\sum _{i=1}^n \sum _{g=1}^G\hat{z}_{ig} \\&\times \left[ b_{ig}\,\text{ tr }\left\{ (\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)(\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)'\varvec{\varPsi }_g^{-1}\right\} -2\,\text{ tr }\left\{ (\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)\hat{\varvec{\alpha }}_g'\varvec{\varPsi }_g^{-1}\right\} \right. \\&+a_{ig}\,\text{ tr }\left\{ \hat{\varvec{\alpha }}_g\hat{\varvec{\alpha }}_g'\varvec{\varPsi }_g^{-1}\right\} -2\,\text{ tr }\left\{ (\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)'\varvec{\varPsi }_g^{-1}\varvec{\varLambda }_g\varvec{E}_{2ig}\right\} \\&\left. +2\,\text{ tr }\left\{ \hat{\varvec{\alpha }}_g'\varvec{\varPsi }_g^{-1}\varvec{\varLambda }_g\varvec{E}_{1ig}\right\} +\,\text{ tr }\left\{ \varvec{\varLambda }_g\varvec{E}_{3ig}\varvec{\varLambda }_g'\varvec{\varPsi }_g^{-1}\right\} \right] , \end{aligned}$$

where \(C\) is constant with respect to \(\varvec{\varLambda }_g\) and \(\varvec{\varPsi }_g\). Differentiating \({Q}_2\) with respect to \(\varvec{\varLambda }_g\) gives

$$\begin{aligned} S_1(\varvec{\varLambda }_g,\varvec{\varPsi }_g)= & {} \frac{\partial {Q}_2}{\partial \varvec{\varLambda }_g}=-\frac{1}{2}\sum _{i=1}^n\hat{z}_{ig}\left[ -2\varvec{\varPsi }_g^{-1}(\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)\varvec{E}_{2ig}'+2\varvec{\varPsi }_g^{-1}\hat{\varvec{\alpha }}_g\varvec{E}_{1ig}' \right. \\&\left. +\varvec{\varPsi }_g^{-1}\varvec{\varLambda }_g\left( \varvec{E}_{3ig}'+\varvec{E}_{3ig}\right) \right] \end{aligned}$$

Note that \(\varvec{E}_{3ig}\) is a symmetric matrix. Now, solving \(S_1(\hat{\varvec{\varLambda }}_g,\varvec{\varPsi }_g)=\varvec{0}\) gives the update:

$$\begin{aligned} \hat{\varvec{\varLambda }}_g = \left\{ \sum _{i=1}^n\hat{z}_{ig}\left[ (\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)\varvec{E}_{2ig}'-\hat{\varvec{\alpha }}_g\varvec{E}_{1ig}'\right] \right\} \left\{ \sum _{i=1}^n\hat{z}_{ig}\varvec{E}_{3ig}\right\} ^{-1}. \end{aligned}$$

Differentiating \({Q}_2\) with respect to \(\varvec{\varPsi }_g^{-1}\) gives

$$\begin{aligned}&S_2(\varvec{\varLambda }_g,\varvec{\varPsi }_g) =\frac{\partial {Q}_2}{\partial \varvec{\varPsi }_g^{-1}}\!=\!\frac{1}{2}\sum _{i=1}^n\hat{z}_{ig}\varvec{\varPsi }_g \!-\!\frac{1}{2}\sum _{i=1}^n\hat{z}_{ig}\left[ b_{ig}(\mathbf {x}_i\!-\!\hat{{\varvec{\mu }}}_g)(\mathbf {x}_i\!-\!\hat{{\varvec{\mu }}}_g)' \right. \\&\quad \left. -2\hat{\varvec{\alpha }}_g(\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)'+a_{ig}\hat{\varvec{\alpha }}_g\hat{\varvec{\alpha }}_g' -2(\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)\varvec{E}_{2ig}'\varvec{\varLambda }_g' +2\hat{\varvec{\alpha }}_g\varvec{E}_{1ig}'\varvec{\varLambda }_g'+\varvec{\varLambda }_g\varvec{E}_{3ig}\varvec{\varLambda }_g'\right] \!. \end{aligned}$$

Now, solving \(\text {diag}\{S_2(\hat{\varvec{\varLambda }}_g,\hat{\varvec{\varPsi }}_g)\}=\varvec{0}\) gives the update:

$$\begin{aligned} \hat{\varvec{\varPsi }}_g= & {} \frac{1}{n_g}\text {diag}\left\{ \sum _{i=1}^n\hat{z}_{ig}\left[ b_{ig}(\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)(\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)'-2\hat{\varvec{\alpha }}_g(\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)'+a_{ig}\hat{\varvec{\alpha }}_g\hat{\varvec{\alpha }}_g' \right. \right. \\&\left. \left. -2(\mathbf {x}_i-\hat{{\varvec{\mu }}}_g)\varvec{E}_{2ig}'\hat{\varvec{\varLambda }}_g'+2\hat{\varvec{\alpha }}_g\varvec{E}_{1ig}'\hat{\varvec{\varLambda }}_g'+\hat{\varvec{\varLambda }}_g\varvec{E}_{3ig}\hat{\varvec{\varLambda }}_g'\right] \right\} . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tortora, C., McNicholas, P.D. & Browne, R.P. A mixture of generalized hyperbolic factor analyzers. Adv Data Anal Classif 10, 423–440 (2016). https://doi.org/10.1007/s11634-015-0204-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-015-0204-z

Keywords

Mathematics Subject Classification

Navigation