Abstract
In mixture model-based clustering applications, it is common to fit several models from a family and report clustering results from only the ‘best’ one. In such circumstances, selection of this best model is achieved using a model selection criterion, most often the Bayesian information criterion. Rather than throw away all but the best model, we average multiple models that are in some sense close to the best one, thereby producing a weighted average of clustering results. Two (weighted) averaging approaches are considered: averaging component membership probabilities and averaging models. In both cases, Occam’s window is used to determine closeness to the best model and weights are computed within a Bayesian model averaging paradigm. In some cases, we need to merge components before averaging; we introduce a method for merging mixture components based on the adjusted Rand index. The effectiveness of our model-based clustering averaging approaches is illustrated using a family of Gaussian mixture models on real and simulated data.
Similar content being viewed by others
References
Anderson E (1935) The irises of the Gaspé peninsula. Bull Am Iris Soc 59:2–5
Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373
Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate t-distributions. Comput Stat Data Anal 55(1):520–529
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821
Basford KE, McLachlan GJ (1985) Estimation of allocation rates in a cluster analysis context. J Am Stat Assoc 80(390):286–293
Baudry J-P, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19(2):332–353
Bhattacharya S, McNicholas PD (2014) A LASSO-penalized BIC for mixture model selection. Adv Data Anal Classif 8(1):45–61
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519
Browne RP, McNicholas PD (2013) Mixture: mixture models for clustering and classification. R package version 1.0
Browne RP, McNicholas PD (2014) Estimating common principal components in high dimensions. Adv Data Anal Classif 8(2):217–226
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781–793
Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do K-A, Müller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, New York
Dasgupta A, Raftery AE (1998) Detecting features in spatial point processes with clutter via model-based clustering. J Am Stat Assoc 93:294–302
Dean N, Murphy TB, Downey G (2006) Using unlabelled data to update classification rules with applications in food authenticity studies. J R Stat Soc: Ser C 55(1):1–14
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc: Ser B 39(1):1–38
Faraway J (2011) Faraway: functions and datasets for books by Julian Faraway. R package version 1.0.5
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Flury B (1997) A first course in multivariate statistics. Springer, New York
Flury B (2012) Flury: data sets from flury, 1997. R package version 0.1-3
Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189–201
Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington, Seattle, WA
Fraley C, Raftery AE, Scrucca L (2013) mclust: normal mixture modeling for model-based clustering, classification, and density estimation. R package version 4.2
Franczak BC, Browne RP, McNicholas PD (2014) Mixtures of shifted asymmetric Laplace distributions. IEEE Trans Pattern Anal Mach Intell 36(6):1149–1157
Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27:835–850
Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc: Ser B 58:155–176
Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Classif 4:3–34
Hjort NL, Claeskens G (2003) Frequentist model average estimators. J Am Stat Assoc 98(464):879–899
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: A tutorial. Stat Sci 14(4):382–401
Hoeting JA, Raftery AE, Madigan D (1999) Bayesian simultaneous variable and transformation selection in linear regression. Technical Report 9905, Department of Statistics, Colorado State University
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā Indian J Stat Ser A 62(1):49–66
Krivitsky PN, Handcock MS, Raftery AE, Hoff PD (2009) Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc Netw 31(3):204–213
Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 1992:1350–1360
Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546
Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43:570–577
MATLAB (2011). version 7.12.0.635 (R2011a). Natick, Massachusetts: The MathWorks Inc.
McNicholas PD (2010) Model-based classification using latent Gaussian mixture models. J Stat Plan Inference 140(5):1175–1181
McNicholas PD, Browne RP (2013) Discussion of How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J R Stat Soc: Ser C 62(3):352–353
McNicholas PD, Jampani KR, McDaid AF, Murphy TB, Banks L (2014) pgmm: Parsimonious Gaussian Mixture Models. R package version 1.1
McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296
McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705–2712
Milligan GW, Cooper MC (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivar Behav Res 21(4):441–458
Molitor J, Papathomas M, Jerrett M, Richardson S (2010) Bayesian profile regression with an application to the national survey of children’s health. Biostatistics 11(3):484–498
Murray PM, Browne RB, McNicholas PD (2014) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335
Qiu W, Joe H (2006) Generation of random clusters with specified degree of separation. J Classif 23:315–334
Qiu W, Joe H (2012) ClusterGeneration: random cluster generation (with specified degree of separation). R package version 1.2.9
R Core Team (2013) R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria
Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251–266
Raftery AE, Madigan D, Hoeting JA (1998) Bayesian model averaging for linear regression models. J Am Stat Assoc 92:179–191
Raftery AE, Madigan D, Volinsky CT (1995) Accounting for model uncertainty in survival analysis improves predictive performance (with discussion). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian Statistics, vol 5. Oxford University Press, Oxford, pp 323–349
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Steinley D (2004) Properties of the Hubert-Arabie adjusted Rand index. Psychol Methods 9:386–396
Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc: Ser B 62:795–809
Strehl A, Ghosh J, Cardie C (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Volinsky CT, Madigan D, Raftery AE, Kronmal RA (1997) Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke. J R Stat Soc: Ser C 46(4):433–448
Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210
Wehrens R, Buydens LM, Fraley C, Raftery AE (2004) Model-based clustering for image segmentation and large datasets via sampling. J Classif 21:231–253
Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
Acknowledgments
The authors gratefully acknowledge the very helpful comments and suggestions of an associate editor and three anonymous reviewers. The authors are grateful to Professor Adrian Raftery and other members of the University of Washington Working Group on Model-Based Clustering for their comments and suggestions on an earlier version of this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by an Ontario Graduate Scholarship, an Early Researcher Award from the Ontario Ministry of Research and Innovation, a grant-in-aid from Compusense Inc., and a Collaborative Research and Development Grant from the Natural Sciences and Engineering Research Council of Canada.
Rights and permissions
About this article
Cite this article
Wei, Y., McNicholas, P.D. Mixture model averaging for clustering. Adv Data Anal Classif 9, 197–217 (2015). https://doi.org/10.1007/s11634-014-0182-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-014-0182-6