Latent class model with conditional dependency per modes to cluster categorical data

Marbac, Matthieu; Biernacki, Christophe; Vandewalle, Vincent

doi:10.1007/s11634-016-0250-1

Latent class model with conditional dependency per modes to cluster categorical data

Regular Article
Published: 13 May 2016

Volume 10, pages 183–207, (2016)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Matthieu Marbac¹,
Christophe Biernacki² &
Vincent Vandewalle³

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

We propose a parsimonious extension of the classical latent class model to cluster categorical data by relaxing the conditional independence assumption. Under this new mixture model, named conditional modes model (CMM), variables are grouped into conditionally independent blocks. Each block follows a parsimonious multinomial distribution where the few free parameters model the probabilities of the most likely levels, while the remaining probability mass is uniformly spread over the other levels of the block. Thus, when the conditional independence assumption holds, this model defines parsimonious versions of the standard latent class model. Moreover, when this assumption is violated, the proposed model brings out the main intra-class dependencies between variables, summarizing thus each class with relatively few characteristic levels. The model selection is carried out by an hybrid MCMC algorithm that does not require preliminary parameter estimation. Then, the maximum likelihood estimation is performed via an EM algorithm only for the best model. The model properties are illustrated on simulated data and on three real data sets by using the associated R package CoModes. The results show that this model allows to reduce biases involved by the conditional independence assumption while providing meaningful parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Clustering bivariate mixed-type data via the cluster-weighted model

Article 04 July 2015

Penalized Latent Class Model for Clustering with Application to Variable Selection

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Article 14 January 2022

Notes

Note that the repartition of the variables into blocks is identical between classes. This choice is motivated by reasons of identifiability and interpretation that we will detail later.

References

Agresti A (2002) Categorical data analysis, vol 359. Wiley, New York
Book MATH Google Scholar
Allman E, Matias C, Rhodes J (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37(6A):3099–3132
Article MathSciNet MATH Google Scholar
Bartholomew D, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, vol 899. Wiley, New York
Book MATH Google Scholar
Biernacki C, Celeux G, Govaert G (2010) Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140(11):2991–3002
Article MathSciNet MATH Google Scholar
Bretagnolle V (2007) Personal communication. source: Museum
Celeux G, Govaert G (1991) Clustering criteria for discrete data and latent class models. J Classif 8(2):157–176
Article MATH Google Scholar
Chavent M, Kuentz V, Saracco J (2010) A partitioning method for the clustering of categorical variables. Classification as a tool for research. Springer, Berlin Heidelberg, pp 91–99
Chapter Google Scholar
Choirat C, Seri R (2012) Estimation in discrete parameter models. Stat Sci 27(2):278–293
Article MathSciNet MATH Google Scholar
Czerniak J, Zarzycki H (2003) Application of rough sets in the presumptive diagnosis of urinary system diseases. Artifical intelligence and security in computing systems, ACS’2002 9th International Conference Proceedings, pp 41–51
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Ser B (Method) 39(1):1–38
MathSciNet MATH Google Scholar
Espeland M, Handelman S (1989) Using Latent class models to characterize and assess relative error in discrete measurements. Biometrics 45(2):587–599
Article MATH Google Scholar
Gollini I, Murphy T (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comput 24(4):569–588
Article MathSciNet MATH Google Scholar
Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215–231
Article MathSciNet MATH Google Scholar
Govaert G (2010) Data analysis, vol 136. Wiley, New york
MATH Google Scholar
Hagenaars J (1988) Latent structure models with direct effects between indicators local dependence models. Sociol Methods & Res 16(3):379–405
Article Google Scholar
Hand D, Yu K (2001) Idiot’s bayes not so stupid after all? Int Stat Rev 69(3):385–398
MATH Google Scholar
Huang J, Ng M, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. Pattern Anal Mach Intell, IEEE Trans On 27(5):657–668
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Jajuga K, Sokołowski A, Bock H (2002) Classification, clustering and data analysis: recent advances and applications. Springer, Berlin
Book MATH Google Scholar
Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In Proc Conf ISIS 96:375–384
Google Scholar
Kruskal J (1976) More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika 41(3):281–293
Article MathSciNet MATH Google Scholar
Kruskal J (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl 18(2):95–138
Article MathSciNet MATH Google Scholar
Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2014) Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. J Stat Softw (in press)
McLachlan G, Krishnan T (1997) The EM algorithm. Applied probability and statistics., Probability and statisticsWiley, New York
MATH Google Scholar
McLachlan G, Peel D (2000) Finite mixutre models. Applied probability and statistics., Probability and statisticsWiley, New York
Book MATH Google Scholar
Moran M, Walsh C, Lynch A, Coen R, Coakley D, Lawlor B (2004) Syndromes of behavioural and psychological symptoms in mild alzheimer’s disease. Int J Geriatr Psychiatry 19(4):359–364
Article Google Scholar
Qu Y, Tan M, Kutner M (1996) Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52(3):797–810
Article MathSciNet MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Van Hattum P, Hoijtink H (2009) Market segmentation using brand strategy research: bayesian inference with respect to mixtures of log-linear models. J Classif 26(3):297–328
Article MathSciNet MATH Google Scholar
Vermunt J (2003) Multilevel latent class models. Sociol Method 33(1):213–239
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, McMaster University, Hamilton, Canada
Matthieu Marbac
University Lille 1 and CNRS and Inria Lille, Villeneuve-d’Ascq, France
Christophe Biernacki
EA 2694 University Lille 2 and Inria Lille, Lille, France
Vincent Vandewalle

Authors

Matthieu Marbac
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Biernacki
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Vandewalle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthieu Marbac.

Additional information

Downloadable at https://r-forge.r-project.org/R/?group_id=1809.

Appendices

Appendix: Generic identifiability of CMM

When $d\ge 3$, CMM is generically identifiable (i.e., the parameter space where the model is not identifiable has a Lebesgue measure equal to zero). The demonstration is based on results of Allman et al. (2009) which use the Kruskal theorem (Kruskal 1977, 1976). The demonstration is cut into three steps: reminder of the Kruskal results for the three-way tables, demonstration of the model generic identifiability when $d=3$, then its extension when model has more than three blocks.

Kruskal results For a matrix M, the Kruskal rank of M, denoted by $\text {rank}_K\; M$ is the largest number I such that every set of I rows of M are linearly independent.

Theorem 1

(Kruskal 1977, 1976). Let $I_j=\text {rank}_K\; M_j$. If

$$\begin{aligned} I_1 + I_2 +I_3 \ge 2g+2, \end{aligned}$$

then the tensor $[M_1, M_2, M_3]$ uniquely determines the $M_j$, up to simultaneous permutation and rescaling rows.

Generic identifiability of CMM with three blocks Let $k_0=\underset{k}{\text {argmin }} u_{kj}$ and the matrix $M_j$ where

$$\begin{aligned} M_j(k,h)=\alpha _{kjh}. \end{aligned}$$

(22)

By denoting by $\xi _j=\underset{k}{\min }\; u_{kj} +1$, generically, we have

$$\begin{aligned} \text {rank}_K \; M_j=\min (g,\xi _j). \end{aligned}$$

Corollary 1

The parameters of CMM with three blocs are generically identifiable, up to label swapping, provided:

$$\begin{aligned} \min (g,\xi _1) + \min (g,\xi _2) + \min (g,\xi _3) \ge 2g+2. \end{aligned}$$

Generic identifiability of CMM with more than three blocks In the same way that Allman et al. (2009), we generalize the result with d blocks by observing that d blocks of categorical variables can be combined into three categorical variables. Thus, we can apply the Kruskal theorem.

Corollary 2

We consider a CMM with d blocks where $d\ge 3$. If there exists a tri-partition of the set $\{1,\ldots ,d\}$ into three disjoint non empty subsets $S_1$, $S_2$ and $S_3$, such that $\gamma _i=\prod _{j\in S_i}\xi _j$ with

$$\begin{aligned} \min (g,\gamma _1) + \min (g,\gamma _2) + \min (g,\gamma _3) \ge 2g+2 , \end{aligned}$$

(23)

then the model parameters are generically identifiable up to label swapping.

Approximation of $I(u_{kj})$

First, we define a new parametrization of the block distribution facilitating the integrated complete-data likelihood computation and the prior distribution related to this new block parametrization. Second, we underline the relationship between the embedded models. We conclude by the integrated complete-data likelihood computation, which is the target result.

1.1 New parametrization of the block distribution

Without loss of generality, we assume that the elements of $\varvec{\delta }_{kj}$ are ordered by decreasing values of the probability mass associated to them and we introduce the new parametrization of $\varvec{\alpha }_{kj}$ denoted by $\varvec{\varepsilon }_{kj}=(\varepsilon _{kjh};h=1,\ldots ,u_{kj})$ where $\varvec{\varepsilon }_{kj} \in \mathcal {E}_{kj}=\left[ \frac{1}{m_j};1\right] \times \ldots \times \left[ \frac{1}{\text {m}_j - u_{kj}};1\right] $ and where $\varepsilon _{kjh}$ is defined by

$$\begin{aligned} \varepsilon _{kjh} =&\left\{ \begin{array}{ll} \alpha _{kj1} &{}\quad \text {if } h=1\\ \frac{\alpha _{kjh}}{\prod _{h'=1}^{h-1} (1-\varepsilon _{kjh'})} &{}\quad \text {otherwise}. \end{array} \right. \end{aligned}$$

Each $\varepsilon _{kjh}$ follows a truncated beta distribution on the interval $\left[ \frac{1}{\text {m}_j-h+1},1\right] $, so

$$\begin{aligned} p(\varvec{\varepsilon }_{kj}|\varvec{\delta }_{kj},\varvec{m})=\frac{\text {m}_j}{\text {m}_j - u_{kj}}. \end{aligned}$$

(24)

The conditional probability of $\mathbf x ^j=(\mathbf x _i^j;i=1,\ldots ,n)$ is

$$\begin{aligned} p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star },\varvec{\varepsilon }_{kj})= \prod _{h=1}^{u_{kj}} (\varepsilon _{kjh})^{n^{\star }_{kj(h)}} (1-\varepsilon _{kjh})^{\bar{n}_{kjh}^{\star }}. \end{aligned}$$

(25)

1.2 Relation between embedded models

Let the model with $u_{kj}^{\ominus }$ modes and the parameters $(\varvec{\delta }_{kj}^{\star \ominus },\varvec{\varepsilon }_{kj}^{\ominus })$ and the model with $u_{kj}$ modes and the parameters $(\varvec{\delta }^{\star }_{kj},\varvec{\varepsilon }_{kj})$ such as $u_{kj}^{\ominus }=u_{kj}-1$ and such as the $u_{kj}^{\ominus }$ modes having the largest probabilities have the same locations ($\forall h\in \varvec{\delta }_{kj}^{\star \ominus },\; h\in \varvec{\delta }_{kj}^{\star }$) and the same probability masses $(\varepsilon _{kjh}^{\ominus }=\varepsilon _{kjh},\; h<u_{kj})$. These embedded models follow this relation

$$\begin{aligned} \frac{p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star },\varvec{\varepsilon }_{kj})}{p(\mathbf x ^j|\mathbf z ,u_{kj}^{\ominus },\varvec{\delta }_{kj}^{\star \ominus },\varvec{\varepsilon }_{kj}^{\ominus })} =\frac{ (\text {m}_j - u_{kj} +1)^{\bar{n}_{kju_{kj}-1}^{\star } -1} }{ (\text {m}_j - u_{kj})^{\bar{n}_{kju_{kj}}^{\star }} } (\varepsilon _{u_{kj}})^{n_{kj(u_{kj})}^{\star }} (1-\varepsilon _{u_{kj}})^{\bar{n}_{kju_{kj}}^{\star }} . \end{aligned}$$

(26)

1.3 Integrated complete-data likelihood

The integrated complete-data likelihood is finally approximated, by neglecting the sum over the discrete parameters of the modes locations and by performing the exact computation on the continuous parameters, by

$$\begin{aligned} I_{kj}(u_{kj}) \approx \left( \frac{1}{m_j-u_{kj}} \right) ^{\bar{n}_{kj}^{\star }} \prod _{h=1}^{u_{kj}} \frac{Bi\left( \frac{1}{\text {m}_j-h+1};n_{kj(h)}^{\star }+1;\bar{n}_{kjh}^{\star }+1\right) }{\text {m}_j- h}, \end{aligned}$$

(27)

Proof

If, for the model with $u_{kj}-1$ modes, the best modes locations are known and given by $\varvec{\delta }_{kj}^{\star \ominus }$ then the conditional probability of $\mathbf x ^j$ for a model with $u_{kj}$ modes is

$$\begin{aligned} p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star \ominus },\varvec{\varepsilon }_{kj})= \frac{1}{\text {m}_j - u_{kj} +1 } \sum _{\varvec{\delta }_{kj} \in \Delta _{kj}} p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj},\varvec{\varepsilon }_{kj}), \end{aligned}$$

(28)

where $\Delta _{kj}=\{\varvec{\delta }_{kj}: \varvec{\delta }_{kj}^{\star \ominus } \subset \varvec{\delta }_{kj} \text { and } \text {card}(\varvec{\delta }_{kj})= u_{kj} \}$. Thus, by approximating this sum by its maximum element, we obtain that

$$\begin{aligned} p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star \ominus },\varvec{\varepsilon }_{kj})= \frac{1}{\text {m}_j - u_{kj} +1 } p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star },\varvec{\varepsilon }_{kj}), \end{aligned}$$

(29)

As $p(\mathbf x ^{j}|\mathbf z ,u_{kj}=0)=(m_j)^{-n_k}$, by applying recursively (26), we obtain that

$$\begin{aligned} p(\mathbf x ^{j}|\mathbf z ,u_{kj},\varvec{\varepsilon }_{kj})\approx \left( \frac{1}{\text {m}_j-u_{kj}} \right) ^{\bar{n}_{kju_{kj}}^{\star }} \prod _{h=1}^{u_{kj}} \frac{(\varepsilon _{kjh})^{n_{kj(h)}^{\star }} ( 1 - \varepsilon _{kjh})^{\bar{n}_{kjh}^{\star }}}{\text {m}_j - h +1} . \end{aligned}$$

(30)

$\square $

Details on the model sampling

The sampling of $(\varvec{s}^{[s+1]},\varvec{u}^{[s+1]})$ from (10) is performed in two steps. Firstly, a new repartition of the variables into blocks and the mode number of the modified blocks, respectively denoted by $ \varvec{s}^{[s+1]}$ and $\varvec{u}^{[s+1/2]}$, are sampled by one iteration of a Metropolis-Hastings algorithm. Secondly, the mode number of each block is sampled by one MCMC iteration. Thus, the sampling of $\varvec{m}^{[s+1]}$ is decomposed into the two following steps

$$\begin{aligned} (\varvec{s}^{[s+1]},\varvec{u}^{[s+1/2]})&\sim p(\varvec{s},\varvec{u} |g,\varvec{s}^{[s]},\varvec{u}^{[s]},\mathbf x ,\mathbf z ^{[s+1]}) \end{aligned}$$

(31)

$$\begin{aligned} \varvec{u}^{[s+1]}&\sim p(\varvec{u} |\varvec{s}^{[s+1]},\varvec{u}^{[s+1/2]},\mathbf x ,\mathbf z ^{[s+1]}). \end{aligned}$$

(32)

Thus, this chain has $p(\varvec{s},\varvec{u}|g,\mathbf x ,\mathbf z ^{[s+1]})$ as stationary distribution.

1.1 Metropolis-Hastings algorithm to sample from (31)

This sampling is performed by one iteration of the Metropolis-Hastings algorithm divided into two steps. Firstly, the proposal distribution $q(.;\varvec{m}^{[s]})$ generates a candidate $\varvec{m}^{\star }=(g,\varvec{s}^{\star },\varvec{u}^{\star })$. Secondly $\varvec{m}^{[s+1]}$ is sampled according to the acceptance probability $\mu ^{[s]}$ defined by

$$\begin{aligned} \mu ^{[s]}= 1 \wedge \frac{ p(\mathbf x ,\mathbf z ^{[s+1]}|\varvec{m}^{\star }) }{ p(\mathbf x ,\mathbf z ^{[s+1]}|\varvec{m}^{[s]})} \frac{ q(\varvec{m}^{[s]};\varvec{m}^{\star } ) }{ q(\varvec{m}^{\star };\varvec{m}^{[s]} ) }. \end{aligned}$$

(33)

Note that the computation of $\mu ^{[s]}$ involves to compute the integrated complete-data likelihood defined by (17). The sampling of $\varvec{m}^{[s+1/2]}$ is written as

$$\begin{aligned} \varvec{m}^{\star }&\sim q(.;\varvec{m}^{[s]}) \\ \varvec{m}^{[s+1/2]}&=\left\{ \begin{array}{rl} \varvec{m}^{\star }&{} \text { with a probability } \mu ^{[s]}\\ \varvec{m}^{[s]}&{} \text { otherwise.} \end{array} \right. \nonumber \end{aligned}$$

(34)

The proposal distribution $ q(.;\varvec{m}^{[s]})$ samples $\varvec{m}^{\star }$ in two steps. The first step changes the block affectation of one variable. In practice, $\varvec{s}^{\star }$ is uniformly sampled in $V(\varvec{s}^{[s]})=\{\varvec{s}: \exists ! b \text { as } b\in \varvec{s}_j^{[s]} \text { and } b \notin \varvec{s}_j\}$. The second step uniformly samples the mode numbers among all its possible values for the modified blocks while $u^{\star }_{kj}=u^{[s]}_{kj}$ for non-modified blocks (i.e., j as $\varvec{s}_j^{[s]}=\varvec{s}_j^{\star }$).

1.2 MCMC algorithm to sample $\varvec{u}^{[s+1]}$

This step allows to increase or decrease the mode number of each block by one at each iteration. So, $u_{kj}^{[s+1]}$ is sampled according to $p(u_{kj} | \varvec{m}^{[s+1/2]},\mathbf x ,\mathbf z ^{[s+1]})$ defined by

$$\begin{aligned} p(u_{kj} |g,\varvec{m}^{[s+1]}, u_{kj}^{[s+1/2]},\mathbf x ,\mathbf z ^{[s+1]}) \propto \left\{ \begin{array}{ll} p(\mathbf x ^{j}|\mathbf z ^{[s+1]},u_{kj}) &{} \text { if } |u_{kj}-u_{kj}^{[s+1/2]}|<2\\ {} &{} \text { and } u_{kj} \notin \{0,\text {m}_j\}. \\ 0 &{} \text { otherwise.} \end{array} \right. \end{aligned}$$

(35)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marbac, M., Biernacki, C. & Vandewalle, V. Latent class model with conditional dependency per modes to cluster categorical data. Adv Data Anal Classif 10, 183–207 (2016). https://doi.org/10.1007/s11634-016-0250-1

Download citation

Received: 22 May 2015
Revised: 02 May 2016
Accepted: 03 May 2016
Published: 13 May 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11634-016-0250-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Latent class model with conditional dependency per modes to cluster categorical data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering bivariate mixed-type data via the cluster-weighted model

Penalized Latent Class Model for Clustering with Application to Variable Selection

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix: Generic identifiability of CMM

Theorem 1

Corollary 1

Corollary 2

Approximation of \(I(u_{kj})\)

1.1 New parametrization of the block distribution

1.2 Relation between embedded models

1.3 Integrated complete-data likelihood

Proof

Details on the model sampling

1.1 Metropolis-Hastings algorithm to sample from (31)

1.2 MCMC algorithm to sample \(\varvec{u}^{[s+1]}\)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Latent class model with conditional dependency per modes to cluster categorical data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Clustering bivariate mixed-type data via the cluster-weighted model

Penalized Latent Class Model for Clustering with Application to Variable Selection

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix: Generic identifiability of CMM

Theorem 1

Corollary 1

Corollary 2

Approximation of \(I(u_{kj})\)

1.1 New parametrization of the block distribution

1.2 Relation between embedded models

1.3 Integrated complete-data likelihood

Proof

Details on the model sampling

1.1 Metropolis-Hastings algorithm to sample from (31)

1.2 MCMC algorithm to sample \(\varvec{u}^{[s+1]}\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation