Abstract
Mixtures of product components assume independence of variables given the index of the component. They can be efficiently estimated from data by means of EM algorithm and have some other useful properties. On the other hand, by considering mixtures of dependence trees, we can explicitly describe the statistical relationship between pairs of variables at the level of individual components and therefore approximation power of the resulting mixture may essentially increase. However, we have found in application to classification of numerals that both models perform comparably and the contribution of dependence-tree structures to the log-likelihood criterion decreases in the course of EM iterations. Thus the optimal estimate of dependence-tree mixture tends to reduce to a simple product mixture model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Borůvka, O.: On a minimal problem. Trans. Moravian Soc. Nat. Sci. (in Czech) No. 3 (1926)
Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans. Image Process. 13(11), 1533–1543 (2004)
Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Info. Theory IT-14(3), 462–467 (1968)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, l–38 (1977)
Day, N.E.: Estimating the components of a mixture of normal distributions. Biometrika 56, 463–474 (1969)
Grim, J.: On numerical evaluation of maximum—likelihood estimates for finite mixtures of distributions. Kybernetika l8(3), 173–190 (1982). http://dml.cz/dmlcz/124132
Grim, J.: On structural approximating multivariate discrete probability distributions. Kybernetika 20(1), 1–17 (1984). http://dml.cz/dmlcz/125676
Grim, J.: Multivariate statistical pattern recognition with nonreduced dimensionality. Kybernetika 22(2), 142–157 (1986). http://dml.cz/dmlcz/125022
Grim, J.: Design of multilayer neural networks by information preserving transforms. In: Pessa, E., Penna, M.P., Montesanto A. (eds.) Third European Congress on Systems Science. Edizioni Kappa, Roma, pp. 977–982 (1996)
Grim, J.: Information approach to structural optimization of probabilistic neural networks. In: Ferrer, L. et al. (eds.) Proceedings of the 4th System Science European Congress. Valencia, Soc. Espanola de Sistemas Generales, pp. 527–540 (1999)
Grim, J.: A sequential modification of EM algorithm. In: Gaul, W., Locarek-Junge, H. (eds.) Studies in Classification, Data Analysis and Knowledge Organization, pp. 163–170. Springer (1999)
Grim, J.: EM cluster analysis for categorical data. In: Yeung, D.Y., Kwok, J.T., Fred, A. (eds.) Structural, Syntactic and Statistical Pattern Recognition, LNCS 4109, pp. 640–648. Springer, Berlin (2006)
Grim, J.: Neuromorphic features of probabilistic neural networks. Kybernetika 43(5), 697–712 (2007). http://dml.cz/dmlcz/135807
Grim, J.: Sequential pattern recognition by maximum conditional informativity. Pattern Recogn. Lett. 45C, 39–45 (2014). doi:10.1016/j.patrec.2014.02.024
Grim, J.: Approximating probability densities by mixtures of gaussian dependence trees. In: Hobza, T. (ed.) Proceedings of the SPMS 2014 Stochastic and Physical Monitoring Systems, pp. 43–56. Czech Technical University Prague (2014)
Grim, J., Hora, J.: Iterative principles of recognition in probabilistic neural networks. Neural Netw. 21(6), 838–846 (2008)
Grim, J., Hora, J.: Computational properties of probabilistic neural networks. In: Artificial Neural Networks—ICANN 2010 Part II, LNCS 5164, pp. 52–61. Springer, Berlin (2010)
Grim, J., Hora, J., Boček P., Somol, P., Pudil, P.: Statistical model of the 2001 Czech census for interactive presentation. J. Official Stat. 26(4), 673–694 (2010). http://ro.utia.cas.cz/dem.html
Grim, J., Kittler, J., Pudil, P., Somol, P.: Multiple classifier fusion in probabilistic neural networks. Pattern Anal. Appl. 5(7), 221–233 (2002)
Grim, J., Pudil, P.: Pattern recognition by probabilistic neural networks—mixtures of product components versus mixtures of dependence trees. In: Proceedings of the International Conference on Neural Computation Theory and Applications NCTA2014. Rome, SCITEPRESS, 2014, s. 65–75 (2014)
Grim, J., Pudil, P., Somol, P.: Recognition of handwritten numerals by structural probabilistic neural networks. In: Bothe, H., Rojas, R. (eds.) Proceedings of the Second ICSC Symposium on Neural Computation, pp. 528–534. Berlin ICSC, Wetaskiwin (2000)
Grim, J., Pudil, P., Somol, P.: Boosting in probabilistic neural networks. In: Kasturi, R., Laurendeau, D., Suen, C. (eds.) Proceedings of the 16th International Conference on Pattern Recognition, pp. 136–139. IEEE Computer Society, Los Alamitos (2002b)
Grim, J., Somol, P., Haindl, M., Daneš, J.: Computer-aided evaluation of screening mammograms based on local texture models. IEEE Trans. Image Process. 18(4), 765–773 (2009)
Hasselblad, V.: Estimation of prameters for a mixture of normal distributions. Technometrics 8, 431–444 (1966)
Hasselblad, V.: Estimation of finite mixtures of distributions from the exponential family. J. Am. Stat. Assoc. 58, 1459–1471 (1969)
Hosmer Jr, D.W.: A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample. Biometrics 761–770 (1973)
Jarník, V.: About a certain minimal problem. Trans. Moravian Soc. Nat. Sci. (in Czech) No. 6, 57–63 (1930)
Kruskal, J.B.: On the shortest spanning sub-tree of a graph. Proc. Am. Math. Soc. 7, 48–50 (1956)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Lowd, D., Domingos, P.: Naive Bayes models for probability estimation. In: Proceedings of the 22nd International Conference on Machine Learning, ACM 2005, pp. 529–536 (2005)
Markley, S.C., Miller, D.J.: Joint parsimonious modeling and model order selection for multivariate Gaussian mixtures. IEEE J. Sel. Top. Sign. Process. 4(3), 548–559 (2010)
Meila, M., Jordan, M.I.: Estimating dependency structure as a hidden variable. In: Proceedings of the 1997 Conference on Avances in Neural Information Processing Systems, vol. 10, pp. 584–590 (1998)
Meila, M., Jaakkola T.: Tractable bayesian learning of tree belief networks. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 380–388 (2000)
Meila, M., Jordan, M.I.: Learning with mixtures of trees. J. Mach. Learn. Res. 1(9), 1–48 (2001)
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36, 1389–1401 (1957)
Schlesinger, M.I.: Relation between learning and self learning in pattern recognition (in Russian). Kibernetika (Kiev) No. 2, 81–88 (1968)
Vajda, I.: Theory of Statistical Inference and Information. Kluwer Academic Publishers, Dordrecht and Boston (1989)
Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5, 329–350 (1970)
Acknowledgments
This work was supported by the Czech Science Foundation Projects No. 14-02652S and P403/12/1557.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Maximum-Weight Spanning Tree
Appendix: Maximum-Weight Spanning Tree
The algorithm of Kruskal (cf. [3, 28]) assumes ordering of all \(N(N-1)/2\) edge weights in descending order. The maximum-weight spanning tree is then constructed sequentially, starting with the first two (heaviest) edges. The next edges are added sequentially in descending order if they do not form a cycle with the previously chosen edges. Multiple solutions are possible if several edge weights are equal, but they are ignored as having the same maximum weight. Obviously, in case of dependence-tree mixtures with many components, the application of the Kruskal algorithm may become prohibitive in high-dimensional spaces because of the repeated ordering of the edge-weights.
The algorithm of Prim [35] does not need any ordering of edge weights. Starting from any variable we choose the neighbor with the maximum edge weight. This first edge of the maximum-weight spanning tree is then sequentially extended by adding the maximum-weight neighbors of the currently chosen subtree. Again, any ties may be decided arbitrarily since we are not interested in multiple solutions.
Both Kruskal and Prim refer to an “obscure Czech paper” of Otakar Borůvka [1] from the year 1926 giving an alternative construction of the minimum-weight spanning tree and the corresponding proof of uniqueness. Moreover, the Prim’s algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník (cf. [27], in Czech). The algorithm of Prim can be summarized as follows (in C-code):
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Grim, J., Pudil, P. (2016). Mixtures of Product Components Versus Mixtures of Dependence Trees. In: Merelo, J.J., Rosa, A., Cadenas, J.M., Dourado, A., Madani, K., Filipe, J. (eds) Computational Intelligence. IJCCI 2014. Studies in Computational Intelligence, vol 620. Springer, Cham. https://doi.org/10.1007/978-3-319-26393-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-26393-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26391-5
Online ISBN: 978-3-319-26393-9
eBook Packages: EngineeringEngineering (R0)