Abstract
For decades, Gaussian mixture models have been the most popular mixtures in literature. However, the adequacy of the fit provided by Gaussian components is often in question. Various distributions capable of modeling skewness or heavy tails have been considered in this context recently. In this paper, we propose a novel contaminated transformation mixture model that is constructed based on the idea of transformation to symmetry and can account for skewness, heavy tails, and automatically assign scatter to secondary components.
Similar content being viewed by others
References
Andrews DF, Gnanadesikan R, Warner JL (1971) Transformations of multivariate data. Biometrics 27(4):825–840
Atkinson AC, Riani M, Cerioli A (2003) Exploring multivariate data with the forward search. Clarendon Press, Oxford
Azzalini A, Bowman AW (1990) A look at some data on the Old Faithful Geyser. J R Stat Soc C 39:357–365
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821
Basso R, Lachos V, Cabral C, Ghosh P (2010) Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput Stat Data Anal 54:2926–2941
Box GE, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26(2):211–252
Browne RP, McNicholas PD (2015) A mixture of generalized hyperbolic distributions. Can J Stat 43(2):176–198
Cabral C, Lachos V, Prates M (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56(1):126–142
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Comput Stat Data Anal 28:781–793
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39(1):1–38
Forina M, Leardi R, Armanino C, Lanteri S (1991) PARVUS: an extendible package for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11:317–336
Giorgi E, McNeil AJ (2014) On the computation of multivariate scenario sets for the skew-\(t\) and generalized hyperbolic families. Comput Stat Data Anal 100:205–220
Lee S, McLachlan GJ (2013) On mixtures of skew normal and skew \(t\)-distributions. Adv Data Anal Classif 7(3):241–266
Lee S, McLachlan G J (2014) Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. Stat Comput 24(2):181–202
Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100(2):257–265
Lin T-C, Lin T-I (2009) Supervised learning of multivariate skew normal mixture models with missing information. Comput Stat 25:183–201
Lin TI, Lee JC, Yen SY (2007) Finite mixture modelling using the skew normal distribution. Stat Sin 17:909–927
Lo K, Gottardo R (2012) Flexible mixture modeling via the multivariate \(t\) distribution with the Box-Cox transformation: an alternative to the skew-\(t\) distribution. Stat Comput 22(1):35–52
Maitra R, Melnykov V (2010) Simulating data to study performance of finite mixture modeling and clustering algorithms. J Comput Graph Stat 19(2):354–376
Manly BFJ (1976) Exponential data transformations. J R Stat Soc Ser D 25(1):37–42
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
McNicholas PD (2017) Mixture model-based classification. CRC Press, Boca Raton
McNicholas P, Murphy T (2008) Parsimonious Guassian mixture models. Stat Comput 18:285–296
Melnykov V (2016) Model-based biclustering of clickstream data. Comput Stat Data Anal 93C:31–45
Morris K, Punzo A, McNicholas P, Browne R (2019) Asymmetric clusters and outliers: mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Comput Stat Data Anal 132:145–156
Nelder JA, Mead R (1965) A simplex algorithm for function minimization. Comput J 7(4):308–313
Punzo A, McNicholas P (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58:1506–1537
Schwarz G (1978) Estimating the dimensions of a model. Ann Stat 6(2):461–464
Velilla S (1993) A note on the multivariate Box-Cox transformation to normality. Stat Probab Lett 17(4):259–263
Wang K, Ng A, McLachlan G (2013) EMMIXskew: the EM algorithm and skew mixture distribution. R package version 1.0.1
Yeo I-K, Johnson RA (2000) A new family of power transformations to improve normality or symmetry. Biometrika 87:954–959
Zhu X, Melnykov V (2018) Manly transformation in finite mixture modeling. Comput Stat Data Anal 121:190–208
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Melnykov, Y., Zhu, X. & Melnykov, V. Transformation mixture modeling for skewed data groups with heavy tails and scatter. Comput Stat 36, 61–78 (2021). https://doi.org/10.1007/s00180-020-01009-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01009-8