Abstract
Semi-supervised classification methods aim to exploit labeled and unlabeled examples to train a predictive model. Most of these approaches make assumptions on the distribution of classes. This article first proposes a new semi-supervised discretization method, which adopts very low informative prior on data. This method discretizes the numerical domain of a continuous input variable, while keeping the information relative to the prediction of classes. Then, an in-depth comparison of this semi-supervised method with the original supervised MODL approach is presented. We demonstrate that the semi-supervised approach is asymptotically equivalent to the supervised approach, improved with a post-optimization of the intervals bounds location.
Similar content being viewed by others
References
Berger J (2006) The case of objective Bayesian analysis. Bayesian Anal 1(3): 385–402
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: COLT ’98: Proceedings of the eleventh annual conference on Computational learning theory. ACM Press, New York, pp 92–100
Boullé M (2005) A Bayes optimal approach for partitioning the values of categorical attributes. J Mach Learn Res 6: 1431–1452
Boullé M (2006) MODL: a Bayes optimal discretization method for continuous attributes. Mach Learn 65(1): 131–165
Catlett J (1991) On changing continuous attributes into ordered discrete attributes. In: EWSL-91: Proceedings of the European working session on learning on machine learning. Springer, New York, pp 164–178
Chapelle O, Schölkopf B, Zien A (2007) Semi-supervised learning. MIT Press, Cambridge
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: International conference on machine learning, pp 194–202
Fawcett T (2003) Roc graphs: notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, HP Labs. http://citeseer.ist.psu.edu/fawcett03roc.html
Fayyad U, Irani K (1992) On the handling of continuous-valued attributes in decision tree generation. Mach Learn 8: 87–102
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery: an overview. Adv Knowl Discov Data Min 1–34
Fujino A, Ueda N, Saito K (2007) A hybrid generative/discriminative approach to text classification with additional information. Inf Process Manage 43: 379–392
Holte R (1993) Very simple classification rules perform well on most commonly used datasets. Mach Learn 11: 63–91
Jin R, Breitbart Y, Muoh C. (2009) Data discretization unification. Knowl Inf Syst 19: 1–29
Kohavi R, Sahami M (1996) Error-based and entropy-based discretization of continuous features. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 114–119
Langley P, Iba W, Thomas K (1992) An analysis of Bayesian classifiers. In: Press A (ed) Tenth national conference on artificial intelligence, pp 223–228
Liu H, Hussain F, Tan C, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6(4): 393–423
Maeireizo B, Litman D, Hwa R (2004) Analyzing the effectiveness and applicability of co-training. In: ACL ’04: the companion proceedings of the 42nd annual meeting of the association for computational linguistics
Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
Pyle D (1999) Data preparation for data mining. Morgan Kaufmann , San Francisco, p 19
Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471
Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. In: Seventh IEEE workshop on applications of computer vision
Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison
Shannon C (1948) A mathematical theory of communication. Key papers in the development of information theory
Sugiyama M, Krauledat M, Müller K (2007) Covariate shift adaptation by importance weighted cross validation. J Mach Learn Res 8: 985–1005
Sugiyama M, Müller K (2005) Model selection under covariate shift. In: ICANN, International conference on computational on artificial neural networks: formal models and their applications
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PY, Zhou Z, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1)
Zhou ZH, Li M (2009) Semi-supervised learning by disagreement. Knowl Inf Syst doi:10.1007/s10115-009-0209-z
Zighed D, Rakotomalala R (2000) Graphes d’induction. Hermes, France
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bondu, A., Boullé, M. & Lemaire, V. A non-parametric semi-supervised discretization method. Knowl Inf Syst 24, 35–57 (2010). https://doi.org/10.1007/s10115-009-0230-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0230-2