Robust Bayesian Classification with Incomplete Data | Cognitive Computation
Skip to main content

Robust Bayesian Classification with Incomplete Data

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

In this paper, we address the Bayesian classification with incomplete data. The common approach in the literature is to simply ignore the samples with missing values or impute missing values before classification. However, these methods are not effective when a large portion of the data have missing values and the acquisition of samples is expensive. Motivated by these limitations, the expectation maximization algorithm for learning a multivariate Gaussian mixture model and a multiple kernel density estimator based on the propensity scores are proposed to avoid listwise deletion (LD) or mean imputation (MI) for solving classification tasks with incomplete data. We illustrate the effectiveness of our proposed algorithms on some artificial and benchmark UCI data sets by comparing with LD and MI methods. We also apply these algorithms to solve the practical classification tasks on the lithology identification of hydrothermal minerals and license plate character recognition. The experimental results demonstrate their good performance with high classification accuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Boukharouba A, Bennia A. Recognition of handwritten Arabic literal amounts using a hybrid approach. Cogn Comput. 2011; 3(2): 382–393.

    Article  Google Scholar 

  2. Tay NW, Loo CK, Perus M. Face recognition with quantum associative networks using overcomplete Gabor wavelet. Cogn Comput. 2010; 2(4): 297–302.

    Article  Google Scholar 

  3. Salberg AB. Land cover classification of cloud-contaminated multitemporal high-resolution images. IEEE Trans Geosci Remote Sens. 2011; 49(1): 377–387.

    Article  Google Scholar 

  4. Loizou A and Laouris Y. Developing prognosis tools to identify learning difficulties in children using machine learning technologies. Cogn Comput. 2011; 3(3): 50–490.

    Article  Google Scholar 

  5. Ng AY, Jordan MI. On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in neural information processing systems. Cambridge: MIT Press; 2002. p. 841–848.

  6. Guo L, Wu YX, Zhao L, et al. Classification of mental task from EEG signals using immune feature weighted support vector machines. IEEE Trans Magn. 2011; 47(5): 866–869.

    Article  Google Scholar 

  7. Zhang HG, Liu JH, Ma DZ, et al. Data-core-based fuzzy min-max neural network for pattern classification. IEEE Trans Neural Netw. 2011; 22(12): 2339–2352.

    Article  PubMed  Google Scholar 

  8. Raina R, Shen YR, Ng AY, et al. Classification with hybrid generative/discriminative models. In: Advances in neural information processing systems. Cambridge: MIT Press; 2004. p. 545–552.

  9. Dalton LA, Dougherty ER. Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error—part I: representation. IEEE Trans Signal Process. 2012; 60(5): 2575–2587.

    Article  Google Scholar 

  10. Baram Y. Bayesian classification by iterated weighting. Neurocomputing 1999; 25(1–3): 73–79.

    Article  Google Scholar 

  11. Hoare Z. Landscapes of naive Bayes classifiers. Pattern Anal Appl. 2008; 11(1): 59–72.

    Article  Google Scholar 

  12. Garcia-Laencina PJ, Sancho-Gomez JL, Figueiras-Vidal AR. Pattern classification with missing data: a review. Neural Comput Appl. 2010; 19(2): 263–282.

    Article  Google Scholar 

  13. Schafer JL, Graham JW. Missing data: Our view of the state of the art. Psychol Methods. 2002; 7(2): 147–177.

    Article  PubMed  Google Scholar 

  14. Williams D, Liao XJ, Xue Y, et al. On classification with incomplete data. IEEE Trans Pattern Anal Mach Intell. 2007; 29(3): 427–436.

    Article  PubMed  Google Scholar 

  15. Little RJA, Rubin DB. Statistical analysis with missing data, 2nd ed. New Jersey: Wiley; 2002.

    Google Scholar 

  16. Jerez JM, Molina I, Garcia-Laencina PJ, et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med. 2010; 50(2): 105–115.

    Article  PubMed  Google Scholar 

  17. Gheyas IA, Smith LS. A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 2010; 73(16–18): 3039–3065.

    Article  Google Scholar 

  18. Silva-Ramirez EL, Pino-Mejias R, Lopez-Coello M, et al. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw. 2011; 24(1): 121–129.

    Article  PubMed  Google Scholar 

  19. Ghannad-Rezaie M, Soltanian-Zadeh H, Ying H, et al. Selection-fusion approach for classification of datasets with missing values. Pattern Recognit. 2010; 43(6): 2340–2350.

    Article  PubMed  Google Scholar 

  20. Parthasarathy S, Aggarwal CC. On the use of conceptual reconstruction for mining massively incomplete data sets. IEEE Trans Knowl Data Eng. 2003; 15(6): 1512–1521.

    Article  Google Scholar 

  21. Bilmes JA. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden markov models. Technical report, Berkeley: University of Berkeley, TR-97-021, 1998.

  22. Bishop CM. Pattern recognition and machine learning (Information Science and Statistics). Secaucus: Springer; 2006.

    Google Scholar 

  23. Simonoff JS. Smoothing methods in statistic. Berlin: Springer; 1996.

    Book  Google Scholar 

  24. Gang F, Shih FY, Haimin W. A kernel-based parametric method for conditional density estimation. Pattern Recognit. 2011; 44(2): 284–294.

    Article  Google Scholar 

  25. Duda RO. Hart PE, Stork DG. Pattern classification, 2nd ed. Lonon: Wiley-Interscience; 2000.

    Google Scholar 

  26. Dutta S. Estimation of the MISE and the optimal bandwidth vector of a product kernel density estimate. J Stat Plan Inference. 2011; 141(5): 1817–1831.

    Article  Google Scholar 

  27. Nasios N, Bors AG. Kernel-based classification using quantum mechanics. Pattern Recognit. 2007; 40(3): 875–889.

    Article  Google Scholar 

  28. Tang W, He H, Gunzler D. Kernel smoothing density estimation when group membership is subject to missing. J Stat Plan Inference. 2012; 142(3): 685–694.

    Article  PubMed  Google Scholar 

  29. Jones MC, Marron JS, Sheather SJ. A brief survey of bandwidth selection for density estimation. J Am Stat Assoc. 1996; 91(433): 401–407.

    Article  Google Scholar 

  30. Kristan M, Leonardis A, Skocaj D. Multivariate online kernel density estimation with Gaussian kernels. Pattern Recognit. 2011; 44(10–11): 2630–2642.

    Article  Google Scholar 

  31. Lin TI, Lee JC, Ho HJ. On fast supervised learning for normal mixture models with missing information. Pattern Recognit. 2006; 39(6): 1177–1187.

    Article  Google Scholar 

  32. Hathaway RJ, Bezdek JC. Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B-Cybern. 2001; 31(5): 735–744.

    Article  CAS  Google Scholar 

  33. Wang QH. Probability density estimation with data missing at random when covariables are present. J Stat Plan Inference. 2008; 138(3): 568–587.

    Article  Google Scholar 

  34. Dubnicka SR. Kernel density estimation with missing data and auxiliary variables. Aust. N Z J Stat. 2009; 51(3): 247–270.

    Article  Google Scholar 

  35. UCI machine learning repository, 2012. [Online]. Available: http://archive.ics.uci.edu/ml/datasets.html.

  36. Zhang XN, Song SJ, Li JB, et al. LS-SVR method of ore grade estimation in Solwara 1 region with missing data. J Central S Univ. 2011; 42(suppl.2): 147–155.

    Google Scholar 

  37. Chang SL, Chen LS, Chung YC, et al. Automatic license plate recognition. IEEE Trans Intell Transp Syst. 2004; 5(1): 42–53.

    Article  Google Scholar 

  38. Abolghasemi V and Ahmadyfard A. An edge-based color-aided method for license plate detection. Image Vis Comput. 2009; 27(8): 1134–1142.

    Article  Google Scholar 

  39. Shivaswamy PK, Bhattacharyya C, Smola AJ. Second order cone programming approaches for handling missing and uncertain data. J Mach Learn Res. 2006; 7: 1283–1314.

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant 61273233, the Project of China Ocean Association under Grant DYXM-125-25-02, and Tsinghua University Initiative Scientific Research Program under Grants 2010THZ07002 and 2011THZ07132.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiji Song.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Song, S. & Wu, C. Robust Bayesian Classification with Incomplete Data. Cogn Comput 5, 170–187 (2013). https://doi.org/10.1007/s12559-012-9188-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-012-9188-6

Keywords