Abstract
In this paper, we address the Bayesian classification with incomplete data. The common approach in the literature is to simply ignore the samples with missing values or impute missing values before classification. However, these methods are not effective when a large portion of the data have missing values and the acquisition of samples is expensive. Motivated by these limitations, the expectation maximization algorithm for learning a multivariate Gaussian mixture model and a multiple kernel density estimator based on the propensity scores are proposed to avoid listwise deletion (LD) or mean imputation (MI) for solving classification tasks with incomplete data. We illustrate the effectiveness of our proposed algorithms on some artificial and benchmark UCI data sets by comparing with LD and MI methods. We also apply these algorithms to solve the practical classification tasks on the lithology identification of hydrothermal minerals and license plate character recognition. The experimental results demonstrate their good performance with high classification accuracies.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Boukharouba A, Bennia A. Recognition of handwritten Arabic literal amounts using a hybrid approach. Cogn Comput. 2011; 3(2): 382–393.
Tay NW, Loo CK, Perus M. Face recognition with quantum associative networks using overcomplete Gabor wavelet. Cogn Comput. 2010; 2(4): 297–302.
Salberg AB. Land cover classification of cloud-contaminated multitemporal high-resolution images. IEEE Trans Geosci Remote Sens. 2011; 49(1): 377–387.
Loizou A and Laouris Y. Developing prognosis tools to identify learning difficulties in children using machine learning technologies. Cogn Comput. 2011; 3(3): 50–490.
Ng AY, Jordan MI. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in neural information processing systems. Cambridge: MIT Press; 2002. p. 841–848.
Guo L, Wu YX, Zhao L, et al. Classification of mental task from EEG signals using immune feature weighted support vector machines. IEEE Trans Magn. 2011; 47(5): 866–869.
Zhang HG, Liu JH, Ma DZ, et al. Data-core-based fuzzy min-max neural network for pattern classification. IEEE Trans Neural Netw. 2011; 22(12): 2339–2352.
Raina R, Shen YR, Ng AY, et al. Classification with hybrid generative/discriminative models. In: Advances in neural information processing systems. Cambridge: MIT Press; 2004. p. 545–552.
Dalton LA, Dougherty ER. Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error—part I: representation. IEEE Trans Signal Process. 2012; 60(5): 2575–2587.
Baram Y. Bayesian classification by iterated weighting. Neurocomputing 1999; 25(1–3): 73–79.
Hoare Z. Landscapes of naive Bayes classifiers. Pattern Anal Appl. 2008; 11(1): 59–72.
Garcia-Laencina PJ, Sancho-Gomez JL, Figueiras-Vidal AR. Pattern classification with missing data: a review. Neural Comput Appl. 2010; 19(2): 263–282.
Schafer JL, Graham JW. Missing data: Our view of the state of the art. Psychol Methods. 2002; 7(2): 147–177.
Williams D, Liao XJ, Xue Y, et al. On classification with incomplete data. IEEE Trans Pattern Anal Mach Intell. 2007; 29(3): 427–436.
Little RJA, Rubin DB. Statistical analysis with missing data, 2nd ed. New Jersey: Wiley; 2002.
Jerez JM, Molina I, Garcia-Laencina PJ, et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med. 2010; 50(2): 105–115.
Gheyas IA, Smith LS. A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 2010; 73(16–18): 3039–3065.
Silva-Ramirez EL, Pino-Mejias R, Lopez-Coello M, et al. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw. 2011; 24(1): 121–129.
Ghannad-Rezaie M, Soltanian-Zadeh H, Ying H, et al. Selection-fusion approach for classification of datasets with missing values. Pattern Recognit. 2010; 43(6): 2340–2350.
Parthasarathy S, Aggarwal CC. On the use of conceptual reconstruction for mining massively incomplete data sets. IEEE Trans Knowl Data Eng. 2003; 15(6): 1512–1521.
Bilmes JA. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden markov models. Technical report, Berkeley: University of Berkeley, TR-97-021, 1998.
Bishop CM. Pattern recognition and machine learning (Information Science and Statistics). Secaucus: Springer; 2006.
Simonoff JS. Smoothing methods in statistic. Berlin: Springer; 1996.
Gang F, Shih FY, Haimin W. A kernel-based parametric method for conditional density estimation. Pattern Recognit. 2011; 44(2): 284–294.
Duda RO. Hart PE, Stork DG. Pattern classification, 2nd ed. Lonon: Wiley-Interscience; 2000.
Dutta S. Estimation of the MISE and the optimal bandwidth vector of a product kernel density estimate. J Stat Plan Inference. 2011; 141(5): 1817–1831.
Nasios N, Bors AG. Kernel-based classification using quantum mechanics. Pattern Recognit. 2007; 40(3): 875–889.
Tang W, He H, Gunzler D. Kernel smoothing density estimation when group membership is subject to missing. J Stat Plan Inference. 2012; 142(3): 685–694.
Jones MC, Marron JS, Sheather SJ. A brief survey of bandwidth selection for density estimation. J Am Stat Assoc. 1996; 91(433): 401–407.
Kristan M, Leonardis A, Skocaj D. Multivariate online kernel density estimation with Gaussian kernels. Pattern Recognit. 2011; 44(10–11): 2630–2642.
Lin TI, Lee JC, Ho HJ. On fast supervised learning for normal mixture models with missing information. Pattern Recognit. 2006; 39(6): 1177–1187.
Hathaway RJ, Bezdek JC. Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B-Cybern. 2001; 31(5): 735–744.
Wang QH. Probability density estimation with data missing at random when covariables are present. J Stat Plan Inference. 2008; 138(3): 568–587.
Dubnicka SR. Kernel density estimation with missing data and auxiliary variables. Aust. N Z J Stat. 2009; 51(3): 247–270.
UCI machine learning repository, 2012. [Online]. Available: http://archive.ics.uci.edu/ml/datasets.html.
Zhang XN, Song SJ, Li JB, et al. LS-SVR method of ore grade estimation in Solwara 1 region with missing data. J Central S Univ. 2011; 42(suppl.2): 147–155.
Chang SL, Chen LS, Chung YC, et al. Automatic license plate recognition. IEEE Trans Intell Transp Syst. 2004; 5(1): 42–53.
Abolghasemi V and Ahmadyfard A. An edge-based color-aided method for license plate detection. Image Vis Comput. 2009; 27(8): 1134–1142.
Shivaswamy PK, Bhattacharyya C, Smola AJ. Second order cone programming approaches for handling missing and uncertain data. J Mach Learn Res. 2006; 7: 1283–1314.
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant 61273233, the Project of China Ocean Association under Grant DYXM-125-25-02, and Tsinghua University Initiative Scientific Research Program under Grants 2010THZ07002 and 2011THZ07132.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, X., Song, S. & Wu, C. Robust Bayesian Classification with Incomplete Data. Cogn Comput 5, 170–187 (2013). https://doi.org/10.1007/s12559-012-9188-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-012-9188-6