Marginal and simultaneous predictive classification using stratified graphical models

Nyman, Henrik; Xiong, Jie; Pensar, Johan; Corander, Jukka

doi:10.1007/s11634-015-0199-5

Marginal and simultaneous predictive classification using stratified graphical models

Regular Article
Published: 11 February 2015

Volume 10, pages 305–326, (2016)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Henrik Nyman¹,
Jie Xiong²,
Johan Pensar¹ &
…
Jukka Corander²

290 Accesses
1 Altmetric
Explore all metrics

Abstract

An inductive probabilistic classification rule must generally obey the principles of Bayesian predictive inference, such that all observed and unobserved stochastic quantities are jointly modeled and the parameter uncertainty is fully acknowledged through the posterior predictive distribution. Several such rules have been recently considered and their asymptotic behavior has been characterized under the assumption that the observed features or variables used for building a classifier are conditionally independent given a simultaneous labeling of both the training samples and those from an unknown origin. Here we extend the theoretical results to predictive classifiers acknowledging feature dependencies either through graphical models or sparser alternatives defined as stratified graphical models. We show through experimentation with both synthetic and real data that the predictive classifiers encoding dependencies have the potential to substantially improve classification accuracy compared with both standard discriminative classifiers and the predictive classifiers based on solely conditionally independent features. In most of our experiments stratified graphical models show an advantage over ordinary graphical models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Eigen-stratified models

Article 06 January 2021

Statistical comparison of classifiers through Bayesian hierarchical modelling

Article 18 May 2017

Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes

References

Bishop CM (2007) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Cerquides J, De Mántaras RL (2005) TAN classifiers based on decomposable distributions. Mach Learn 59(3):323–354
Article MATH Google Scholar
Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9(4):309–347
MATH Google Scholar
Corander J, Marttinen P (2006) Bayesian identification of admixture events using multi-locus molecular markers. Mol Ecol 15(10):2833–2843
Article MathSciNet Google Scholar
Corander J, Marttinen P, Sirén J, Tang J (2008) Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinform 9:539
Article Google Scholar
Corander J, Cui Y, Koski T (2013a) Inductive inference and partition exchangeability in classification. In: Dowe DL (ed) Solomonoff Festschrift, Springer Lecture Notes in Artificial Intelligence (LNAI), vol 7070, pp 91–105
Corander J, Cui Y, Koski T, Sirén J (2013b) Have I seen you before? Principles of Bayesian predictive classification revisited. Stat Comput 23(1):59–73
Article MathSciNet MATH Google Scholar
Corander J, Xiong J, Cui Y, Koski T (2013c) Optimal Viterbi Bayesian predictive classification for data from finite alphabets. J Stat Plan Infer 143(2):261–275
Article MathSciNet MATH Google Scholar
Dawid A, Lauritzen S (1993) Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann Stat 21:1272–1317
Article MathSciNet MATH Google Scholar
Dawyndt P, Thompson FL, Austin B, Swings J, Koski T, Gyllenberg M (2005) Application of sliding-window discretization and minimization of stochastic complexity for the analysis of fAFLP genotyping fingerprint patterns of Vibrionaceae. Int J Syst Evol Microbiol 55(1):57–66
Article Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163
Article MATH Google Scholar
Geisser S (1964) Posterior odds for multivariate normal classifications. J R Stat Soc B 26:69–76
MathSciNet MATH Google Scholar
Geisser S (1966) Predictive discrimination. In: Krishnajah PR (ed) Multivariate analysis. Academic Press, New York
Google Scholar
Geisser S (1993) Predictive inference: an introduction. Chapman & Hall, London
Book MATH Google Scholar
Golumbic MC (2004) Algorithmic graph theory and perfect graphs, 2nd edn. Elsevier, Amsterdam
MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
Book MATH Google Scholar
Helsingin Sanomat (2011) HS:n vaalikone 2011. http://www2.hs.fi/extrat/hsnext/HS-vaalikone2011.xls, visited 15 Oct 2013
Holmes DE, Jain LC (2008) Innovations in Bayesian networks: theory and applications, vol 156. Springer, Berlin
Book MATH Google Scholar
Huo Q, Lee CH (2000) A Bayesian predictive classification approach to robust speech recognition. IEEE Trans Speech Audio Process 8(2):200–204
Article Google Scholar
Keogh EJ, Pazzani MJ (1999) Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, pp 225–230
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. The MIT Press, London
MATH Google Scholar
Lauritzen SL (1996) Graphical models. Oxford University Press, Oxford
MATH Google Scholar
Madden MG (2009) On the classification performance of TAN and general Bayesian networks. Knowl Based Syst 22(7):489–495
Article Google Scholar
Maina CW, Walsh JM (2011) Joint speech enhancement and speaker identification using approximate Bayesian inference. IEEE Trans Audio Speech Lang Process 19(6):1517–1529
Article Google Scholar
Nádas A (1985) Optimal solution of a training problem in speech recognition. IEEE Trans Acoustics Speech Signal Process 33(1):326–329
Article Google Scholar
Nyman H, Pensar J, Koski T, Corander J (2014) Stratified graphical models—context-specific independence in graphical models. Bayesian Anal 9(4):883–908
Pernkopf F, Bilmes J (2005) Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In: Proceedings of the 22nd international conference on machine learning, pp 657–664
Ripley BD (1988) Statistical inference for spatial processes. Cambridge University Press, Cambridge
Book MATH Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Book MATH Google Scholar
Su J, Zhang H (2006) Full Bayesian network classifiers. In: Proceedings of the 23rd international conference on machine learning, pp 897–904
Whittaker J (1990) Graphical models in applied multivariate statistics. Wiley, Chichester
MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their constructive comments and suggestions on the original version of this paper. H.N. and J.P. were supported by the Foundation of Åbo Akademi University, as part of the grant for the Center of Excellence in Optimization and Systems Engineering. J.P. was also supported by the Magnus Ehrnrooth foundation. J.X. and J.C. were supported by the ERC Grant No. 239784 and Academy of Finland Grant No. 251170. J.X. was also supported by the FDPSS graduate school.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Åbo Akademi University, Turku, Finland
Henrik Nyman & Johan Pensar
Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
Jie Xiong & Jukka Corander

Authors

Henrik Nyman
View author publications
You can also search for this author in PubMed Google Scholar
Jie Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Johan Pensar
View author publications
You can also search for this author in PubMed Google Scholar
Jukka Corander
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henrik Nyman.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 47 KB)

Supplementary material 2 (xls 291 KB)

Appendices

Appendix A: Proof of Theorem 1

To prove Theorem 1 it suffices to consider a single class $k$ and a single maximal clique in $G_L^k$. If the scores for the marginal and simultaneous classifiers are asymptotically equivalent for an arbitrary maximal clique and class it automatically follows that the scores for the whole system are asymptotically equivalent. We start by considering the simultaneous classifier. The training data ${\mathbf {X}}^R$ and test data ${\mathbf {X}}^T$ are now assumed to cover only one maximal clique of an SG in one class. Looking at $\log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)$ using (2) we get

$$\begin{aligned} \log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)&= \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \log \frac{\varGamma (\sum _{i = 1}^{k_{j}} \beta _{jil})}{\varGamma (n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil})} \\&+ \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \log \frac{\varGamma (n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil})}{\varGamma (\beta _{jil})}. \end{aligned}$$

Using Stirling’s approximation, $\log \varGamma (x)=(x - 0.5) \log (x) - x$, this equals

$$\begin{aligned}&\sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( \left( \sum _{i = 1}^{k_{j}} \beta _{jil} - 0.5 \right) \log \left( \sum _{i = 1}^{k_{j}} \beta _{jil} \right) - \sum _{i = 1}^{k_{j}} \beta _{jil} \right) \\&\qquad - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( \left( n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil} - 0.5 \right) \log \left( n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil} \right) - n(\pi _{j}^{l}) - \sum _{i = 1}^{k_{j}} \beta _{jil} \right) \\&\qquad + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \left( \left( n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil} - 0.5 \right) \log \left( n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil} \right) - n(x_{j}^{i} \mid \pi _{j}^{l}) - \beta _{jil} \right) \\&\qquad - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \left( (\beta _{jil} - 0.5) \log (\beta _{jil}) - \beta _{jil} \right) \\&\quad = - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( \left( \sum _{i = 1}^{k_{j}} \beta _{jil} - 0.5 \right) \log \left( 1 + \frac{n(\pi _{j}^{l})}{\sum _{i = 1}^{k_{j}} \beta _{jil}} \right) + n(\pi _{j}^{l}) \log \left( n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil} \right) \right) \\&\qquad + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \left( (\beta _{jil} - 0.5) \log \left( 1 + \frac{n(x_{j}^{i} \mid \pi _{j}^{l})}{\beta _{jil}} \right) \!+\! n(x_{j}^{i} \mid \pi _{j}^{l}) \log \left( n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil} \right) \right) . \end{aligned}$$

When looking at the marginal classifier we need to summarize over each single observation ${\mathbf {X}}_h^T$. We use $h(\pi _{j}^{l})$ to denote if the outcome of the parents of variable $X_j$ belongs to group $l$ and $h(x_{j}^{i} \mid \pi _{j}^{l})$ to denote if the outcome of $X_j$ is $i$ given that the observed outcome of the parents belongs to $l$. Observing that $h(\pi _{j}^{l})$ and $h(x_{j}^{i} \mid \pi _{j}^{l})$ are either 0 or 1 we get the result

$$\begin{aligned}&\log S_{\text {mar}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) = \sum _{h=1}^n \log P({\mathbf {X}}_h^T \mid {\mathbf {X}}^R) \\&\quad = \sum _{h=1}^n \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \log \frac{\varGamma (\sum _{i = 1}^{k_{j}} \beta _{jil})}{\varGamma (h(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil})} + \sum _{h=1}^n \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \log \frac{\varGamma (h(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil})}{\varGamma (\beta _{jil})} \\&\quad = - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} n(\pi _{j}^{l}) \log \left( \sum _{i = 1}^{k_{j}} \beta _{jil} \right) + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} n(x_{j}^{i} \mid \pi _{j}^{l}) \log (\beta _{jil}). \end{aligned}$$

Considering the difference $\log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {mar}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)$ results in

$$\begin{aligned}&\log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {mar}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) \\&= - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( \sum _{i = 1}^{k_{j}} \beta _{jil} - 0.5 \right) \log \left( 1 + \frac{n(\pi _{j}^{l})}{\sum _{i = 1}^{k_{j}} \beta _{jil}} \right) \\&\quad \,\, + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} (\beta _{jil} - 0.5) \log \left( 1 + \frac{n(x_{j}^{i} \mid \pi _{j}^{l})}{\beta _{jil}}\right) \\&\quad \,\, + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \left( n(\pi _{j}^{l}) \log \left( \sum _{i = 1}^{k_{j}} \beta _{jil} \right) - n(\pi _{j}^{l}) \log \left( n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil} \right) \right) \\&\quad \,\, + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} \left( n(x_{j}^{i} \mid \pi _{j}^{l}) \log \left( n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil} \right) - n(x_{j}^{i} \mid \pi _{j}^{l}) \log (\beta _{jil}) \right) . \end{aligned}$$

We now make the assumption that all the limits of relative frequencies of feature values are strictly positive under an infinitely exchangeable sampling process of the training data, i.e. all hyperparameters $\beta _{jil} \rightarrow \infty $ when the size of the training data $m \rightarrow \infty $. Using the standard limit $\lim _{y \rightarrow \infty } (1+x/y)^y = e^x$ results in

$$\begin{aligned}&\lim _{m \rightarrow \infty } \log S_{\text {sim}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {mar}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) \\&\quad = - \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} n(\pi _{j}^{l}) + \sum _{j = 1}^{d} \sum _{l = 1}^{q_{j}} \sum _{i = 1}^{k_{j}} n(x_{j}^{i} \mid \pi _{j}^{l}) = 0. \end{aligned}$$

Appendix B: Proof of Theorem 2

This proof follows largely the same structure as the proof of Theorem 1 and covers the simultaneous score. It is assumed that the underlying graph of the SGM coincides with the GM, this is a fair assumption since when the size of the training data goes to infinity this property will hold for the SGM and GM maximizing the marginal likelihood. Again we consider only a single class $k$ and a single maximal clique in $G_L^k$, using the same reasoning as in the proof above. Additionally, it will suffice to consider the score for the last variable $X_d$ in the ordering, the variable corresponding to the node associated with all of the stratified edges, and a specific parent configuration $l$ of the parents $\Pi _d$ of $X_d$. The equation for calculating the score for variables $X_1, \ldots , X_{d-1}$ will be identical using either the GM or the SGM. If the asymptotic equivalence holds for an arbitrary parent configuration it automatically holds for all parent configurations. Under this setting we start by looking at the score for the SGM

$$\begin{aligned} \log S_{\text {SGM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)= \log \frac{\varGamma (\sum _{i = 1}^{k_{j}} \beta _{jil})}{\varGamma (n(\pi _{j}^{l}) + \sum _{i = 1}^{k_{j}} \beta _{jil})} + \sum _{i = 1}^{k_{j}} \log \frac{\varGamma (n(x_{j}^{i} \mid \pi _{j}^{l}) + \beta _{jil})}{\varGamma (\beta _{jil})}, \end{aligned}$$

which using Stirling’s approximation and the same techniques as in the previous proof equals

$$\begin{aligned}&- \left( \sum _{i=1}^{k_j}\beta _{jil}-0.5\right) \log \left( 1 + \frac{n(\pi _j^l)}{\sum _{i=1}^{k_j}\beta _{jil}}\right) - n(\pi _j^l) \log \left( n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil}\right) \\&\quad + \sum _{i=1}^{k_j} n(x_j^i \mid \pi _j^l) \log (n(x_j^i \mid \pi _j^l) + \beta _{jil}) + \sum _{i=1}^{k_j} (\beta _{jil} - 0.5) \log \left( 1+\frac{n(x_j^i \mid \pi _j^l)}{\beta _{jil}}\right) . \end{aligned}$$

When studying the GM score we need to separately consider each outcome in the parent configuration $l$. Let $h$ denote such an outcome in $l$ with the total number of outcomes in $l$ totaling $q_l$. We then get the following score for the GM,

$$\begin{aligned} \log S_{\text {GM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)&= \sum _{h=1}^{q_l} \log \frac{\varGamma (\sum _{i = 1}^{k_{j}} \beta _{jih})}{\varGamma (n(\pi _{j}^{h}) + \sum _{i = 1}^{k_{j}} \beta _{jih})} \\&+ \sum _{h=1}^{q_l} \sum _{i = 1}^{k_{j}} \log \frac{\varGamma (n(x_{j}^{i} \mid \pi _{j}^{h}) + \beta _{jih})}{\varGamma (\beta _{jih})}. \end{aligned}$$

Which, using identical calculations as before, equals

$$\begin{aligned} -&\sum _{h=1}^{q_l} \left( \!\sum _{i=1}^{k_j}\beta _{jih}-0.5\!\right) \log \left( \!1 + \frac{n(\pi _j^h)}{\sum _{i=1}^{k_j}\beta _{jih}}\!\right) \!-\!\sum _{h=1}^{q_l} n(\pi _j^h) \log \left( n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih}\right) \\&\quad + \sum _{h=1}^{q_l}\sum _{i=1}^{k_j} n(x_j^i \mid \pi _j^h) \log (n(x_j^i \mid \pi _j^h) + \beta _{jih}) \\&\quad + \sum _{h=1}^{q_l}\sum _{i=1}^{k_j} (\beta _{jih} - 0.5) \log \left( 1 + \frac{n(x_j^i \mid \pi _j^h)}{\beta _{jih}}\right) . \end{aligned}$$

Considering the difference $\log S_{\text {SGM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {GM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R)$ we get

$$\begin{aligned}&- \left( \sum _{i=1}^{k_j}\beta _{jil}-0.5\right) \log \left( 1 {+} \frac{n(\pi _j^l)}{\sum _{i=1}^{k_j}\beta _{jil}}\right) {+}\sum _{i{=}1}^{k_j} (\beta _{jil}{-}0.5) \log \left( 1+\frac{n(x_j^i \mid \pi _j^l)}{\beta _{jil}}\right) \\&\quad - n(\pi _j^l) \log \left( n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil}\right) +\sum _{h=1}^{q_l} n(\pi _j^h) \log \left( n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih}\right) \\&\quad + \sum _{i{=}1}^{k_j} n(x_j^i \mid \pi _j^l) \log (n(x_j^i \mid \pi _j^l) \!{+}\! \beta _{jil}) \!-\!\sum _{h{=}1}^{q_l}\sum _{i=1}^{k_j} n(x_j^i \mid \pi _j^h) \log (n(x_j^i \mid \pi _j^h) \!+\! \beta _{jih}) \\&\quad + \sum _{h=1}^{q_l} \left( \sum _{i=1}^{k_j}\beta _{jih}-0.5\right) \log \left( 1 + \frac{n(\pi _j^h)}{\sum _{i=1}^{k_j}\beta _{jih}}\right) \\&\quad -\sum _{h=1}^{q_l}\sum _{i=1}^{k_j} (\beta _{jih} - 0.5) \log \left( 1 + \frac{n(x_j^i \mid \pi _j^h)}{\beta _{jih}}\right) . \end{aligned}$$

Under the assumption that $\beta _{jil} \rightarrow \infty $ as $m \rightarrow \infty $, the terms in rows one and four will sum to 0 as $m \rightarrow \infty $. The remaining terms can be written

$$\begin{aligned} \log \frac{\prod _{h=1}^{q_l} (n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih})^{n(\pi _j^h)}}{(n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil})^{n(\pi _j^l)}} - \sum _{i=1}^{k_j} \log \frac{\prod _{h=1}^{q_l} (n(x_j^i \mid \pi _j^h) + \beta _{jih})^{n(x_j^i \mid \pi _j^h)}}{(n(x_j^i \mid \pi _j^l) + \beta _{jil})^{n(x_j^i \mid \pi _j^l)}}. \end{aligned}$$

Noting that $n(\pi _j^l) = \sum _{h=1}^{q_l} n(\pi _j^h)$ and $n(x_j^i \mid \pi _j^l) = \sum _{h=1}^{q_l} n(x_j^i \mid \pi _j^h)$ we get

$$\begin{aligned} \sum _{h=1}^{q_l} n(\pi _j^h) \log \frac{ n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih}}{n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil}} - \sum _{i=1}^{k_j} \sum _{h=1}^{q_l} n(x_j^i \mid \pi _j^h) \log \frac{n(x_j^i \mid \pi _j^h) + \beta _{jih}}{n(x_j^i \mid \pi _j^l) + \beta _{jil}}. \end{aligned}$$

By investigating the definition of the $\beta $ parameters in (3), in combination with the fact that the probabilities of observing the value $i$ for variable $X_j$ given that the outcome of the parents is $h$ are identical for any outcome $h$ comprising the group $l$, we get the limits

$$\begin{aligned} \lim _{m \rightarrow \infty } \frac{ n(\pi _j^h) + \sum _{i=1}^{k_j}\beta _{jih}}{n(\pi _j^l) + \sum _{i=1}^{k_j}\beta _{jil}} = \lim _{m \rightarrow \infty } \frac{n(x_j^i \mid \pi _j^h) + \beta _{jih}}{n(x_j^i \mid \pi _j^l) + \beta _{jil}} = \zeta _{jh}. \end{aligned}$$

And subsequently as $m \rightarrow \infty $ the difference $\log S_{\text {SGM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) - \log S_{\text {GM}}({\mathbf {X}}^T \mid {\mathbf {X}}^R) \rightarrow $

$$\begin{aligned}&\sum _{h=1}^{q_l} \left( n(\pi _j^h) \log \zeta _{jh} - \sum _{i=1}^{k_j} n(x_j^i \mid \pi _j^h) \log \zeta _{jh} \right) \\&\quad = \sum _{h=1}^{q_l} \left( n(\pi _j^h) \log \zeta _{jh} - n(\pi _j^h) \log \zeta _{jh} \right) = 0. \end{aligned}$$

Appendix C: List of political parties in the Finnish parliament

Table 4 contains a list of political parties elected to the Finnish parliament in the parliament elections of 2011.

Table 4 List of political parties that are members of the Finnish parliament

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nyman, H., Xiong, J., Pensar, J. et al. Marginal and simultaneous predictive classification using stratified graphical models. Adv Data Anal Classif 10, 305–326 (2016). https://doi.org/10.1007/s11634-015-0199-5

Download citation

Received: 26 November 2013
Revised: 07 January 2015
Accepted: 28 January 2015
Published: 11 February 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11634-015-0199-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Marginal and simultaneous predictive classification using stratified graphical models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Eigen-stratified models

Statistical comparison of classifiers through Bayesian hierarchical modelling

Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 47 KB)

Supplementary material 2 (xls 291 KB)

Appendices

Appendix A: Proof of Theorem 1

Appendix B: Proof of Theorem 2

Appendix C: List of political parties in the Finnish parliament

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Marginal and simultaneous predictive classification using stratified graphical models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Eigen-stratified models

Statistical comparison of classifiers through Bayesian hierarchical modelling

Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 47 KB)

Supplementary material 2 (xls 291 KB)

Appendices

Appendix A: Proof of Theorem 1

Appendix B: Proof of Theorem 2

Appendix C: List of political parties in the Finnish parliament

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation