Latent class model with conditional dependency per modes to cluster categorical data | Advances in Data Analysis and Classification Skip to main content
Log in

Latent class model with conditional dependency per modes to cluster categorical data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

We propose a parsimonious extension of the classical latent class model to cluster categorical data by relaxing the conditional independence assumption. Under this new mixture model, named conditional modes model (CMM), variables are grouped into conditionally independent blocks. Each block follows a parsimonious multinomial distribution where the few free parameters model the probabilities of the most likely levels, while the remaining probability mass is uniformly spread over the other levels of the block. Thus, when the conditional independence assumption holds, this model defines parsimonious versions of the standard latent class model. Moreover, when this assumption is violated, the proposed model brings out the main intra-class dependencies between variables, summarizing thus each class with relatively few characteristic levels. The model selection is carried out by an hybrid MCMC algorithm that does not require preliminary parameter estimation. Then, the maximum likelihood estimation is performed via an EM algorithm only for the best model. The model properties are illustrated on simulated data and on three real data sets by using the associated R package CoModes. The results show that this model allows to reduce biases involved by the conditional independence assumption while providing meaningful parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Note that the repartition of the variables into blocks is identical between classes. This choice is motivated by reasons of identifiability and interpretation that we will detail later.

References

  • Agresti A (2002) Categorical data analysis, vol 359. Wiley, New York

    Book  MATH  Google Scholar 

  • Allman E, Matias C, Rhodes J (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37(6A):3099–3132

    Article  MathSciNet  MATH  Google Scholar 

  • Bartholomew D, Knott M, Moustaki I (2011) Latent variable models and factor analysis: a unified approach, vol 899. Wiley, New York

    Book  MATH  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2010) Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J Stat Plan Inference 140(11):2991–3002

    Article  MathSciNet  MATH  Google Scholar 

  • Bretagnolle V (2007) Personal communication. source: Museum

  • Celeux G, Govaert G (1991) Clustering criteria for discrete data and latent class models. J Classif 8(2):157–176

    Article  MATH  Google Scholar 

  • Chavent M, Kuentz V, Saracco J (2010) A partitioning method for the clustering of categorical variables. Classification as a tool for research. Springer, Berlin Heidelberg, pp 91–99

    Chapter  Google Scholar 

  • Choirat C, Seri R (2012) Estimation in discrete parameter models. Stat Sci 27(2):278–293

    Article  MathSciNet  MATH  Google Scholar 

  • Czerniak J, Zarzycki H (2003) Application of rough sets in the presumptive diagnosis of urinary system diseases. Artifical intelligence and security in computing systems, ACS’2002 9th International Conference Proceedings, pp 41–51

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc. Ser B (Method) 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Espeland M, Handelman S (1989) Using Latent class models to characterize and assess relative error in discrete measurements. Biometrics 45(2):587–599

    Article  MATH  Google Scholar 

  • Gollini I, Murphy T (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comput 24(4):569–588

    Article  MathSciNet  MATH  Google Scholar 

  • Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215–231

    Article  MathSciNet  MATH  Google Scholar 

  • Govaert G (2010) Data analysis, vol 136. Wiley, New york

    MATH  Google Scholar 

  • Hagenaars J (1988) Latent structure models with direct effects between indicators local dependence models. Sociol Methods & Res 16(3):379–405

    Article  Google Scholar 

  • Hand D, Yu K (2001) Idiot’s bayes not so stupid after all? Int Stat Rev 69(3):385–398

    MATH  Google Scholar 

  • Huang J, Ng M, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. Pattern Anal Mach Intell, IEEE Trans On 27(5):657–668

    Article  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  MATH  Google Scholar 

  • Jajuga K, Sokołowski A, Bock H (2002) Classification, clustering and data analysis: recent advances and applications. Springer, Berlin

    Book  MATH  Google Scholar 

  • Jorgensen M, Hunt L (1996) Mixture model clustering of data sets with categorical and continuous variables. In Proc Conf ISIS 96:375–384

    Google Scholar 

  • Kruskal J (1976) More factors than subjects, tests and treatments: An indeterminacy theorem for canonical decomposition and individual differences scaling. Psychometrika 41(3):281–293

    Article  MathSciNet  MATH  Google Scholar 

  • Kruskal J (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl 18(2):95–138

    Article  MathSciNet  MATH  Google Scholar 

  • Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2014) Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. J Stat Softw (in press)

  • McLachlan G, Krishnan T (1997) The EM algorithm. Applied probability and statistics., Probability and statisticsWiley, New York

    MATH  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixutre models. Applied probability and statistics., Probability and statisticsWiley, New York

    Book  MATH  Google Scholar 

  • Moran M, Walsh C, Lynch A, Coen R, Coakley D, Lawlor B (2004) Syndromes of behavioural and psychological symptoms in mild alzheimer’s disease. Int J Geriatr Psychiatry 19(4):359–364

    Article  Google Scholar 

  • Qu Y, Tan M, Kutner M (1996) Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52(3):797–810

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Van Hattum P, Hoijtink H (2009) Market segmentation using brand strategy research: bayesian inference with respect to mixtures of log-linear models. J Classif 26(3):297–328

    Article  MathSciNet  MATH  Google Scholar 

  • Vermunt J (2003) Multilevel latent class models. Sociol Method 33(1):213–239

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthieu Marbac.

Additional information

Downloadable at https://r-forge.r-project.org/R/?group_id=1809.

Appendices

Appendix: Generic identifiability of CMM

When \(d\ge 3\), CMM is generically identifiable (i.e., the parameter space where the model is not identifiable has a Lebesgue measure equal to zero). The demonstration is based on results of Allman et al. (2009) which use the Kruskal theorem (Kruskal 1977, 1976). The demonstration is cut into three steps: reminder of the Kruskal results for the three-way tables, demonstration of the model generic identifiability when \(d=3\), then its extension when model has more than three blocks.

Kruskal results For a matrix M, the Kruskal rank of M, denoted by \(\text {rank}_K\; M\) is the largest number I such that every set of I rows of M are linearly independent.

Theorem 1

(Kruskal 1977, 1976). Let \(I_j=\text {rank}_K\; M_j\). If

$$\begin{aligned} I_1 + I_2 +I_3 \ge 2g+2, \end{aligned}$$

then the tensor \([M_1, M_2, M_3]\) uniquely determines the \(M_j\), up to simultaneous permutation and rescaling rows.

Generic identifiability of CMM with three blocks Let \(k_0=\underset{k}{\text {argmin }} u_{kj}\) and the matrix \(M_j\) where

$$\begin{aligned} M_j(k,h)=\alpha _{kjh}. \end{aligned}$$
(22)

By denoting by \(\xi _j=\underset{k}{\min }\; u_{kj} +1\), generically, we have

$$\begin{aligned} \text {rank}_K \; M_j=\min (g,\xi _j). \end{aligned}$$

Corollary 1

The parameters of CMM with three blocs are generically identifiable, up to label swapping, provided:

$$\begin{aligned} \min (g,\xi _1) + \min (g,\xi _2) + \min (g,\xi _3) \ge 2g+2. \end{aligned}$$

Generic identifiability of CMM with more than three blocks In the same way that Allman et al. (2009), we generalize the result with d blocks by observing that d blocks of categorical variables can be combined into three categorical variables. Thus, we can apply the Kruskal theorem.

Corollary 2

We consider a CMM with d blocks where \(d\ge 3\). If there exists a tri-partition of the set \(\{1,\ldots ,d\}\) into three disjoint non empty subsets \(S_1\), \(S_2\) and \(S_3\), such that \(\gamma _i=\prod _{j\in S_i}\xi _j\) with

$$\begin{aligned} \min (g,\gamma _1) + \min (g,\gamma _2) + \min (g,\gamma _3) \ge 2g+2 , \end{aligned}$$
(23)

then the model parameters are generically identifiable up to label swapping.

Approximation of \(I(u_{kj})\)

First, we define a new parametrization of the block distribution facilitating the integrated complete-data likelihood computation and the prior distribution related to this new block parametrization. Second, we underline the relationship between the embedded models. We conclude by the integrated complete-data likelihood computation, which is the target result.

1.1 New parametrization of the block distribution

Without loss of generality, we assume that the elements of \(\varvec{\delta }_{kj}\) are ordered by decreasing values of the probability mass associated to them and we introduce the new parametrization of \(\varvec{\alpha }_{kj}\) denoted by \(\varvec{\varepsilon }_{kj}=(\varepsilon _{kjh};h=1,\ldots ,u_{kj})\) where \(\varvec{\varepsilon }_{kj} \in \mathcal {E}_{kj}=\left[ \frac{1}{m_j};1\right] \times \ldots \times \left[ \frac{1}{\text {m}_j - u_{kj}};1\right] \) and where \(\varepsilon _{kjh}\) is defined by

$$\begin{aligned} \varepsilon _{kjh} =&\left\{ \begin{array}{ll} \alpha _{kj1} &{}\quad \text {if } h=1\\ \frac{\alpha _{kjh}}{\prod _{h'=1}^{h-1} (1-\varepsilon _{kjh'})} &{}\quad \text {otherwise}. \end{array} \right. \end{aligned}$$

Each \(\varepsilon _{kjh}\) follows a truncated beta distribution on the interval \(\left[ \frac{1}{\text {m}_j-h+1},1\right] \), so

$$\begin{aligned} p(\varvec{\varepsilon }_{kj}|\varvec{\delta }_{kj},\varvec{m})=\frac{\text {m}_j}{\text {m}_j - u_{kj}}. \end{aligned}$$
(24)

The conditional probability of \(\mathbf x ^j=(\mathbf x _i^j;i=1,\ldots ,n)\) is

$$\begin{aligned} p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star },\varvec{\varepsilon }_{kj})= \prod _{h=1}^{u_{kj}} (\varepsilon _{kjh})^{n^{\star }_{kj(h)}} (1-\varepsilon _{kjh})^{\bar{n}_{kjh}^{\star }}. \end{aligned}$$
(25)

1.2 Relation between embedded models

Let the model with \(u_{kj}^{\ominus }\) modes and the parameters \((\varvec{\delta }_{kj}^{\star \ominus },\varvec{\varepsilon }_{kj}^{\ominus })\) and the model with \(u_{kj}\) modes and the parameters \((\varvec{\delta }^{\star }_{kj},\varvec{\varepsilon }_{kj})\) such as \(u_{kj}^{\ominus }=u_{kj}-1\) and such as the \(u_{kj}^{\ominus }\) modes having the largest probabilities have the same locations (\(\forall h\in \varvec{\delta }_{kj}^{\star \ominus },\; h\in \varvec{\delta }_{kj}^{\star }\)) and the same probability masses \((\varepsilon _{kjh}^{\ominus }=\varepsilon _{kjh},\; h<u_{kj})\). These embedded models follow this relation

$$\begin{aligned} \frac{p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star },\varvec{\varepsilon }_{kj})}{p(\mathbf x ^j|\mathbf z ,u_{kj}^{\ominus },\varvec{\delta }_{kj}^{\star \ominus },\varvec{\varepsilon }_{kj}^{\ominus })} =\frac{ (\text {m}_j - u_{kj} +1)^{\bar{n}_{kju_{kj}-1}^{\star } -1} }{ (\text {m}_j - u_{kj})^{\bar{n}_{kju_{kj}}^{\star }} } (\varepsilon _{u_{kj}})^{n_{kj(u_{kj})}^{\star }} (1-\varepsilon _{u_{kj}})^{\bar{n}_{kju_{kj}}^{\star }} . \end{aligned}$$
(26)

1.3 Integrated complete-data likelihood

The integrated complete-data likelihood is finally approximated, by neglecting the sum over the discrete parameters of the modes locations and by performing the exact computation on the continuous parameters, by

$$\begin{aligned} I_{kj}(u_{kj}) \approx \left( \frac{1}{m_j-u_{kj}} \right) ^{\bar{n}_{kj}^{\star }} \prod _{h=1}^{u_{kj}} \frac{Bi\left( \frac{1}{\text {m}_j-h+1};n_{kj(h)}^{\star }+1;\bar{n}_{kjh}^{\star }+1\right) }{\text {m}_j- h}, \end{aligned}$$
(27)

Proof

If, for the model with \(u_{kj}-1\) modes, the best modes locations are known and given by \(\varvec{\delta }_{kj}^{\star \ominus }\) then the conditional probability of \(\mathbf x ^j\) for a model with \(u_{kj}\) modes is

$$\begin{aligned} p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star \ominus },\varvec{\varepsilon }_{kj})= \frac{1}{\text {m}_j - u_{kj} +1 } \sum _{\varvec{\delta }_{kj} \in \Delta _{kj}} p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj},\varvec{\varepsilon }_{kj}), \end{aligned}$$
(28)

where \(\Delta _{kj}=\{\varvec{\delta }_{kj}: \varvec{\delta }_{kj}^{\star \ominus } \subset \varvec{\delta }_{kj} \text { and } \text {card}(\varvec{\delta }_{kj})= u_{kj} \}\). Thus, by approximating this sum by its maximum element, we obtain that

$$\begin{aligned} p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star \ominus },\varvec{\varepsilon }_{kj})= \frac{1}{\text {m}_j - u_{kj} +1 } p(\mathbf x ^j|\mathbf z ,u_{kj},\varvec{\delta }_{kj}^{\star },\varvec{\varepsilon }_{kj}), \end{aligned}$$
(29)

As \(p(\mathbf x ^{j}|\mathbf z ,u_{kj}=0)=(m_j)^{-n_k}\), by applying recursively (26), we obtain that

$$\begin{aligned} p(\mathbf x ^{j}|\mathbf z ,u_{kj},\varvec{\varepsilon }_{kj})\approx \left( \frac{1}{\text {m}_j-u_{kj}} \right) ^{\bar{n}_{kju_{kj}}^{\star }} \prod _{h=1}^{u_{kj}} \frac{(\varepsilon _{kjh})^{n_{kj(h)}^{\star }} ( 1 - \varepsilon _{kjh})^{\bar{n}_{kjh}^{\star }}}{\text {m}_j - h +1} . \end{aligned}$$
(30)

\(\square \)

Details on the model sampling

The sampling of \((\varvec{s}^{[s+1]},\varvec{u}^{[s+1]})\) from (10) is performed in two steps. Firstly, a new repartition of the variables into blocks and the mode number of the modified blocks, respectively denoted by \( \varvec{s}^{[s+1]}\) and \(\varvec{u}^{[s+1/2]}\), are sampled by one iteration of a Metropolis-Hastings algorithm. Secondly, the mode number of each block is sampled by one MCMC iteration. Thus, the sampling of \(\varvec{m}^{[s+1]}\) is decomposed into the two following steps

$$\begin{aligned} (\varvec{s}^{[s+1]},\varvec{u}^{[s+1/2]})&\sim p(\varvec{s},\varvec{u} |g,\varvec{s}^{[s]},\varvec{u}^{[s]},\mathbf x ,\mathbf z ^{[s+1]}) \end{aligned}$$
(31)
$$\begin{aligned} \varvec{u}^{[s+1]}&\sim p(\varvec{u} |\varvec{s}^{[s+1]},\varvec{u}^{[s+1/2]},\mathbf x ,\mathbf z ^{[s+1]}). \end{aligned}$$
(32)

Thus, this chain has \(p(\varvec{s},\varvec{u}|g,\mathbf x ,\mathbf z ^{[s+1]})\) as stationary distribution.

1.1 Metropolis-Hastings algorithm to sample from (31)

This sampling is performed by one iteration of the Metropolis-Hastings algorithm divided into two steps. Firstly, the proposal distribution \(q(.;\varvec{m}^{[s]})\) generates a candidate \(\varvec{m}^{\star }=(g,\varvec{s}^{\star },\varvec{u}^{\star })\). Secondly \(\varvec{m}^{[s+1]}\) is sampled according to the acceptance probability \(\mu ^{[s]}\) defined by

$$\begin{aligned} \mu ^{[s]}= 1 \wedge \frac{ p(\mathbf x ,\mathbf z ^{[s+1]}|\varvec{m}^{\star }) }{ p(\mathbf x ,\mathbf z ^{[s+1]}|\varvec{m}^{[s]})} \frac{ q(\varvec{m}^{[s]};\varvec{m}^{\star } ) }{ q(\varvec{m}^{\star };\varvec{m}^{[s]} ) }. \end{aligned}$$
(33)

Note that the computation of \(\mu ^{[s]}\) involves to compute the integrated complete-data likelihood defined by (17). The sampling of \(\varvec{m}^{[s+1/2]}\) is written as

$$\begin{aligned} \varvec{m}^{\star }&\sim q(.;\varvec{m}^{[s]}) \\ \varvec{m}^{[s+1/2]}&=\left\{ \begin{array}{rl} \varvec{m}^{\star }&{} \text { with a probability } \mu ^{[s]}\\ \varvec{m}^{[s]}&{} \text { otherwise.} \end{array} \right. \nonumber \end{aligned}$$
(34)

The proposal distribution \( q(.;\varvec{m}^{[s]})\) samples \(\varvec{m}^{\star }\) in two steps. The first step changes the block affectation of one variable. In practice, \(\varvec{s}^{\star }\) is uniformly sampled in \(V(\varvec{s}^{[s]})=\{\varvec{s}: \exists ! b \text { as } b\in \varvec{s}_j^{[s]} \text { and } b \notin \varvec{s}_j\}\). The second step uniformly samples the mode numbers among all its possible values for the modified blocks while \(u^{\star }_{kj}=u^{[s]}_{kj}\) for non-modified blocks (i.e., j as \(\varvec{s}_j^{[s]}=\varvec{s}_j^{\star }\)).

1.2 MCMC algorithm to sample \(\varvec{u}^{[s+1]}\)

This step allows to increase or decrease the mode number of each block by one at each iteration. So, \(u_{kj}^{[s+1]}\) is sampled according to \(p(u_{kj} | \varvec{m}^{[s+1/2]},\mathbf x ,\mathbf z ^{[s+1]})\) defined by

$$\begin{aligned} p(u_{kj} |g,\varvec{m}^{[s+1]}, u_{kj}^{[s+1/2]},\mathbf x ,\mathbf z ^{[s+1]}) \propto \left\{ \begin{array}{ll} p(\mathbf x ^{j}|\mathbf z ^{[s+1]},u_{kj}) &{} \text { if } |u_{kj}-u_{kj}^{[s+1/2]}|<2\\ {} &{} \text { and } u_{kj} \notin \{0,\text {m}_j\}. \\ 0 &{} \text { otherwise.} \end{array} \right. \end{aligned}$$
(35)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marbac, M., Biernacki, C. & Vandewalle, V. Latent class model with conditional dependency per modes to cluster categorical data. Adv Data Anal Classif 10, 183–207 (2016). https://doi.org/10.1007/s11634-016-0250-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0250-1

Keywords

Mathematics Subject Classification

Navigation