Abstract
This paper proposes, describes and evaluates T3C, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the size of the tree reasonably small. T3C is an improvement over algorithm T3 in the way it performs splits on continuous attributes. When run against publicly available data sets, T3C achieved lower generalisation error than T3 and the popular C4.5, and competitive results compared to Random Forest and Rotation Forest.
Similar content being viewed by others
References
Aba DW, Breslow LA (1998) Comparing simplification procedures for decision trees on an economics classification. Technical report, DTIC Document
Auer P, Holte RC, Maass W (1995) Theory and applications of agnostic PAC-learning with small decision trees. In: Theory and applications of agnostic PAC-learning with small decision trees. Morgan Kaufmann, San Francisco, pp 21–29
Berry MJ, Linoff GS (2004) Data mining techniques: for marketing, sales, and customer relationship management. Wiley, New York
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory. Springer, Berlin, pp 23–37 (1995)
Gehrke J, Ramakrishnan R, Ganti V (1998) Rainforest-a framework for fast decision tree construction of large datasets. In: VLDB, vol 98, pp 416–427
Han J, Kamber M, Pei J (2010) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Hubert M, Van der Veeken S (2010) Robust classification for skewed data. Adv Data Anal Classif 4(4):239–254
Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
Mozharovskyi P, Mosler K, Lange T (2015) Classifying real-world data with the DD\(\alpha \)-procedure. Adv Data Anal Classif, 9(3):287–314
Murthy S, Salzberg S (1995) Decision tree induction: how effective is the greedy heuristic? In: Proceedings of the first international conference on knowledge discovery and data mining. Morgan Kaufmann, San Francisco, pp 222–227
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Quinlan JR (1993) C4.5: programs for machine learning, vol 1. Morgan Kaufmann, San Francisco
Quinlan JR (1996) Improved use of continuous attributes in c4.5. J Artif Intell Res 4:77–90
Rodriguez J, Kuncheva L, Alonso C (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
RuleQuest (2013). http://www.rulequest.com. Last Accessed April 2016
Tatsis VA, Tjortjis C, Tzirakis P (2013) Evaluating data mining algorithms using molecular dynamics trajectories. Int J Data Min Bioinform 8(2):169–187
Tjortjis C, Keane JA (2002) T3: an improved classification algorithm for data mining. Lect Notes Comput Sci 2412:50–55
Tjortjis C, Saraee M, Theodoulidis B, Keane JA (2007) Using t3, an improved decision tree classifier, for mining stroke related medical data. Methods Inf Med 46(5):523–529
Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tzirakis, P., Tjortjis, C. T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes. Adv Data Anal Classif 11, 353–370 (2017). https://doi.org/10.1007/s11634-016-0246-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-016-0246-x