Abstract
Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large number of techniques have been developed based on Artificial Intelligence (Logic-based techniques, Perceptron-based techniques) and Statistics (Bayesian Networks, Instance-based techniques). The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various classification algorithms and the recent attempt for improving classification accuracy—ensembles of classifiers.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Acid S and de Campos LM (2003). Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490
Aha D (1997). Lazy learning. Kluwer Academic Publishers, Dordrecht
An A, Cercone N (1999) Discretization of continuous attributes for learning classification rules. Third Pacific-Asia conference on methodologies for knowledge discovery & data mining, 509–514
An A, Cercone N (2000) Rule quality measures improve the accuracy of rule induction: an experimental approach. Lecture notes in computer science, vol. 1932, pp 119–129
Auer P and Warmuth M (1998). Tracking the best disjunction. Machine Learning 32: 127–150
Baik S, Bala J (2004) A decision tree algorithm for distributed data mining: towards network intrusion detection. Lecture notes in computer science, vol. 3046, pp 206–212
Batista G and Monard MC (2003). An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell 17: 519–533
Basak J and Kothari R (2004). A classification paradigm for distributed vertically partitioned data. Neural Comput 16(7): 1525–1544
Bauer E and Kohavi R (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36: 105–139
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Irvine. (http://www.ics.uci.edu/~mlearn/MLRepository.html). Accessed 10 Oct 2005
Blockeel H and De Raedt L (1998). Top-down induction of first order logical decision trees. Artif Intell 101(1–2): 285–297
Blum A (1997). Empirical support for winnow and weighted-majority algorithms: results on a calendar scheduling domain. Mach Learn 26(1): 5–23
Bonarini A (2000). An introduction to learning fuzzy classifier systems. Lect Notes Comput Sci 1813: 83–92
Bouckaert R (2003) Choosing between two learning algorithms based on calibrated tests. In: Proceedings of 20th International Conference on Machine Learning. Morgan Kaufmann, pp 51–58
Bouckaert R (2004). Naive Bayes classifiers that perform well with continuous variables. Lect Notes Comput Sci 3339: 1089–1094
Brazdil P, Soares C and Da Costa J (2003). Ranking learning algorithms: using IBL and meta-learning on accuracy and time results. Mach Learn 50: 251–277
Breiman L (1996). Bagging predictors. Mach Learn 24: 123–140
Breslow LA and Aha DW (1997). Simplifying decision trees: a survey. Knowl Eng Rev 12: 1–40
Brighton H and Mellish C (2002). Advances in instance selection for instance-based learning algorithms. Data Min Knowl Disc 6: 153–172
Burges C (1998). A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2(2): 1–47
Camargo LS and Yoneyama T (2001). Specification of training sets and the number of hidden neurons for multilayer perceptrons. Neural Comput 13: 2673–2680
Castellano G, Fanelli A and Pelillo M (1997). An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8: 519–531
Cheng J, Greiner R (2001) Learning Bayesian belief network classifiers: algorithms and system. In: Stroulia E, Matwin S (eds) AI 2001, LNAI 2056, pp 141–151
Cheng J, Greiner R, Kelly J, Bell D and Liu W (2002). Learning Bayesian networks from data: an information-theory based approach. Artif Intell 137: 43–90
Chickering DM (2002). Optimal structure identification with greedy search. J Mach Learn Res 3: 507–554
Cohen W (1995) Fast effective rule induction. In: Proceedings of ICML-95, pp 115–123
Cowell RG (2001) Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of 17th International Conference on Uncertainty in Artificial Intelligence
Crammer K and Singer Y (2002). On the learnability and design of output codes for multiclass problems. Mach Learn 47: 201–233
Cristianini N and Shawe-Taylor J (2000). An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge
Dantsin E, Eiter T, Gottlob G and Voronkov A (2001). Complexity and expressive power of logic programming. ACM Comput Surveys 33: 374–425
De Raedt L (1996) Advances in inductive logic programming. IOS Press
De Mantaras RL and Armengol E (1998). Machine learning from examples: inductive and Lazy methods. Data Knowl Eng 25: 99–123
Dietterich TG (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7): 1895–1924
Dietterich TG (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Mach Learn 40: 139–157
Domeniconi C and Gunopulos D (2001). Adaptive nearest neighbor classification using support vector machines. Adv Neural Inf Process Syst 14: 665–672
Domingos P and Pazzani M (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29: 103–130
Dutton D and Conroy G (1996). A review of machine learning. Knowl Eng Rev 12: 341–367
Dzeroski S and Lavrac N (2001). Relational data mining. Springer, Berlin
Elomaa T and Rousu J (1999). General and efficient multisplitting of numerical attributes. Mach Learn 36: 201–244
Elomaa T (1999). The biases of decision tree pruning strategies. Lect Notes Comput Sci 1642: 63–74, Springer
Fidelis MV, Lopes HS, Freitas AA (2000) Discovering comprehensible classification rules using a genetic algorithm. In: Proceedings of CEC-2000, conference on evolutionary computation La Jolla, USA, v. 1, pp 805–811
Flach PA, Lavrac N (2000) The role of feature construction in inductive rule learning. In: De Raedt L, Kramer S (eds) Proceedings of the ICML2000 workshop on attribute-value learning and relational learning: bridging the Gap. Stanford University
Frank E and Witten I (1998). Generating accurate rule sets without global optimization. In: Shavlik, J (eds) Machine learning: proceedings of the fifteenth international conference, pp. Morgan Kaufmann Publishers, San Francisco
Freund Y and Schapire R (1997). A decision-theoretic generalization of on-line learning and an application to boosting. JCSS 55(1): 119–139
Freund Y and Schapire R (1999). Large margin classification using the perceptron algorithm. Mach Learn 37: 277–296
Friedman N, Geiger D and Goldszmidt M (1997). Bayesian network classifiers. Mach Learn 29: 131–163
Friedman N and Koller D (2003). Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Mach Learn 50(1): 95–125
Furnkranz J (1997). Pruning algorithms for rule learning. Mach Learn 27: 139–171
Furnkranz J (1999). Separate-and-conquer rule learning. Artif Intell Rev 13: 3–54
Furnkranz J (2001) Round robin rule learning. In: Proceedings of the 18th international conference on machine learning (ICML-01), pp 146–153
Gama J and Brazdil P (1999). Linear tree. Intelligent Data Anal 3: 1–22
Gama J and Brazdil P (2000). Cascade generalization. Mach Learn 41: 315–343
Gehrke J, Ramakrishnan R and Ganti V (2000). RainForest—a framework for fast decision tree construction of large datasets. Data Min Knowl Disc 4(2–3): 127–162
Genton M (2001). Classes of Kernels for machine learning: a statistics perspective. J Mach Learn Res 2: 299–312
Guo G, Wang H, Bell D, Bi Y and Greer K (2003). KNN model-based approach in classification. Lect Notes Comput Sci 2888: 986–996
Guyon I and Elissee A (2003). An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
Hall L, Bowyer K, Kegelmeyer W, Moore T, Chao C (2000) Distributed learning on very large data sets. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 79–84
Heckerman D, Meek C, Cooper G (1999) A Bayesian approach to causal discovery. In: Glymour C, Cooper G (eds) Computation, causation, and discovery, MIT Press, pp 141–165
Ho TK (1998). The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20: 832–844
Hodge V and Austin J (2004). A survey of outlier detection methodologies. Artif Intell Rev 22(2): 85–126
Japkowicz N and Stephen S (2002). The class imbalance problem: a systematic study. Intell Data Anal 6(5): 429–450
Jain AK, Murty MN and Flynn P (1999). Data clustering: a review. ACM Comput Surveys 31(3): 264–323
Jensen F (1996) An introduction to Bayesian networks. Springer
Jordan MI (1998). Learning in graphical models. MIT Press, Cambridge
Kalousis A and Gama G (2004). On data and algorithms: understanding inductive performance. Mach Learn 54: 275–312
Keerthi S and Gilbert E (2002). Convergence of a generalized SMO algorithm for SVM classifier design. Mach Learn 46: 351–360
Kivinen J (2002) Online learning of linear classifiers. In: Advanced Lectures on Machine Learning: Machine Learning Summer School 2002, Australia, February 11–22, pp 235–257, ISSN: 0302–9743
Klusch M, Lodi S, Moro G (2003) Agent-based distributed data mining: the KDEC scheme. In: Intelligent information agents: the agentlink perspective, LNAI 2586, Springer, pp 104–122
Kon M and Plaskota L (2000). Information complexity of neural networks. Neural Netw 13: 365–375
Kotsiantis S, Pintelas P (2004) Selective voting. In: Proceedings of the 4th International Conference on Intelligent Systems Design and Applications (ISDA 2004), August 26–28, Budapest, pp 397–402
Kuncheva L and Whitaker C (2001). Feature subsets for classifier combination: an enumerative experiment. Lect Notes Comput Sci 2096: 228–237
Lazkano E and Sierra B (2003). BAYES-NEAREST: a new hybrid classifier combining Bayesian network and distance based algorithms. Lect Notes Comput Sci 2902: 171–183
LiMin W, SenMiao Y, Ling L and HaiJun L (2004). Improving the performance of decision tree: a hybrid approach. Lect Notes Comput Sci 3288: 327–335
Lindgren T (2004). Methods for rule conflict resolution. Lect Notes Comput Sci 3201: 262–273
Littlestone N and Warmuth M (1994). The weighted majority algorithm. Informa Comput 108(2): 212–261
Liu H and Metoda H (2001). Instance selection and constructive data mining. Kluwer, Boston
Maclin R, Shavlik J (1995) Combining the prediction of multiple classifiers: using competitive learning to initialize ANNs. In: Proceedings of the 14th International joint conference on AI, pp 524–530
Madden M (2003) The performance of Bayesian network classifiers constructed using different techniques. In: Proceedings of European conference on machine learning, workshop on probabilistic graphical models for classification, pp 59–70
Markovitch S and Rosenstein D (2002). Feature generation using general construction functions. Mach Learn 49: 59–98
McSherry D (1999). Strategic induction of decision trees. Knowl Based Syst 12(5–6): 269–275
Melville P, Mooney R (2003) Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the IJCAI-2003, Acapulco, Mexico, pp 505–510
Mitchell T (1997) Machine learning. McGraw Hill
Muggleton S (1995). Inverse entailment and Progol. New Generat Comput Special issue on Inductive Logic Programming 13(3–4): 245–286
Muggleton S (1999). Inductive logic programming: issues, results and the challenge of learning language in logic. Artif Intell 114: 283–296
Murthy SK (1998). Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Disc 2: 345–389
Nadeau C and Bengio Y (2003). Inference for the generalization error. Mach Learn 52: 239–281
Neocleous C, Schizas C (2002) Artificial neural network learning: a comparative review, LNAI 2308, Springer-Verlag, pp 300–313
Okamoto S and Yugami N (2003). Effects of domain characteristics on instance-based learning algorithms. Theoret Comput Sci 298: 207–233
Opitz D and Maclin R (1999). Popular ensemble methods: an empirical study. J Artif Intell Res (JAIR) 11: 169–198
Parekh R, Yang J and Honavar V (2000). Constructive neural network learning algorithms for pattern classification. IEEE Trans Neural Netw 11(2): 436–451
Platt J (1999) Using sparseness and analytic QP to speed training of support vector machines. In: Kearns M, Solla S, Cohn D (eds) Advances in neural information processing systems, MIT Press
Quinlan JR (1993). C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco
Quinlan JR (1995). Induction of logic programs: FOIL and related systems. New Generat Comput 13: 287–312
Quinlan JR (1996) Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial intelligence. AAAI Press and the MIT Press, Menlo Park, CA, pp 725–730
Ratanamahatana C and Gunopulos D (2003). Feature selection for the naive Bayesian classifier using decision trees. Appl Artif Intell 17(5–6): 475–487
Reinartz T (2002) A unifying view on instance selection, data mining and knowledge discovery, vol. 6. Kluwer Academic Publishers, pp 191–210
Reeves CR, Rowe JE (2003) Genetic algorithms—principles and perspectives: a guide to GA theory. Kluwer Academic
Roli F, Giacinto G and Vernazza G (2001). Methods for designing multiple classifier systems. Lect Notes Comput Sci 2096: 78–87
Robert J, Howlett LCJ (2001) Radial basis function networks 2: new advances in design. Physica-Verlag Heidelberg, ISBN: 3790813680
Roy A (2000). On connectionism, rule extraction and brain-like learning. IEEE Trans Fuzzy Syst 8(2): 222–227
Saad D (1998). Online learning in neural networks. Cambridge University Press, London
Saitta L and Neri F (1998). Learning in the ‘Real World’. Mach Learn 30(2–3): 313–163
Sanchez J, Barandela R and Ferri F (2002). On filtering the training prototypes in nearest neighbour classification. Lect Notes Comput Sci 2504: 239–248
Schapire RE, Singer Y, Singhal A (1998) Boosting and Rocchio applied to text filtering. In SIGIR ’98: Proceedings of the 21st Annual International Conference on Research and Development in Information Retrieval, pp 215–223
Scholkopf C, Burges JC, Smola AJ (1999) Advances in Kernel Methods. MIT Press
Setiono R and Loew WK (2000). FERNN: an algorithm for fast extraction of rules from neural networks. Appl Intell 12: 15–25
Siddique MNH and Tokhi MO (2001). Training neural networks: backpropagation vs. genetic algorithms. IEEE Int Joint Conf Neural Netw 4: 2673–2678
Ting K and Witten I (1999). Issues in stacked generalization. Artif Intell Res 10: 271–289
Tjen-Sien L, Wei-Yin L and Yu-Shan S (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40: 203–228
Todorovski L and Dzeroski S (2003). Combining classifiers with meta decision trees. Mach Learn 50: 223–249
Utgoff P, Berkman N and Clouse J (1997). Decision tree induction based on efficient tree restructuring. Mach Learn 29(1): 5–44
Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on artificial intelligence (IJCAI99)
Villada R and Drissi Y (2002). A perspective view and survey of meta-learning. Artif Intell Rev 18: 77–95
Vivarelli F and Williams C (2001). Comparing Bayesian neural network algorithms for classifying segmented outdoor images. Neural Netw 14: 427–437
Wall R, Cunningham P, Walsh P and Byrne S (2003). Explaining the output of ensembles in medical decision support on a case by case basis. Artif Intell Med 28(2): 191–206
Webb IG (2000). Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2): 159–196
Wettschereck D, Aha DW and Mohri T (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 10: 1–37
Wilson DR and Martinez T (2000). Reduction techniques for instance-based learning algorithms. Mach Learn 38: 257–286
Witten I and Frank E (2005). Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Yam J and Chow W (2001). Feedforward networks training speed enhancement by optimal initialization of the synaptic coefficients. IEEE Trans Neural Netw 12: 430–434
Yang Y and Webb G (2003). On why discretization works for Naive-Bayes classifiers. Lect Notes Comput Sci 2903: 440–452
Yen GG, Lu H (2000) Hierarchical genetic algorithm based neural network design. In: IEEE symposium on combinations of evolutionary computation and neural networks, pp 168–175
Yu L and Liu H (2004). Efficient feature selection via analysis of relevance and redundancy. JMLR 5: 1205–1224
Zhang G (2000). Neural networks for classification: a survey. IEEE Trans Syst Man Cy C 30(4): 451–462
Zhang S, Zhang C and Yang Q (2002). Data preparation for data mining. Appl Artif Intell 17: 375–381
Zheng Z (1998). Constructing conjunctions using systematic search on decision trees. Knowl Based Syst J 10: 421–430
Zheng Z (2000). Constructing X-of-N attributes for decision tree learning. Mach Learn 40: 35–75
Zheng Z and Webb G (2000). Lazy learning of Bayesian rules. Mach Learn 41(1): 53–84
Zhou Z and Chen Z (2002). Hybrid decision tree. Knowl Based Syst 15(8): 515–528
Zhou Z (2004). Rule extraction: using neural networks or for neural networks?. J Comput Sci Technol 19(2): 249–253
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kotsiantis, S.B., Zaharakis, I.D. & Pintelas, P.E. Machine learning: a review of classification and combining techniques. Artif Intell Rev 26, 159–190 (2006). https://doi.org/10.1007/s10462-007-9052-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-007-9052-3