Abstract
Several algorithms for induction of decision trees have been developed to solve problems with large datasets, however some of them have spatial and/or runtime problems using the whole training sample for building the tree and others do not take into account the whole training set. In this paper, we introduce a new algorithm for inducing decision trees for large numerical datasets, called IIMDT, which builds the tree in an incremental way and therefore it is not necesary to keep in main memory the whole training set. A comparison between IIMDT and ICE, an algorithm for inducing decision trees for large datasets, is shown.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dunham, M.: Data Mining, Introductory and Advanced Topics. Prentice Hall, New Jersey (2003)
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Boston (2006)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Pao, H.-K., Chang, S.-C., Lee, Y.-J.: Model trees for classification of hybrid data types. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 32–39. Springer, Heidelberg (2005)
Pérez, J., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.: Combining multiple class distribution modified subsamples in a single tree. Pattern Recognition Letters 28(4), 414–422 (2007)
Utgoff, P.E.: An improved algorithm for incremental induction of decision trees. In: Proc. 11th International Conference on Machine Learning, pp. 318–325 (1994)
Pedrycz, W., Sosnowski: C-fuzzy decision trees. IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and reviews 35(4), 498–511 (2005)
Agrawal, R., Imielinski, T., Swami, A.: Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering 5(6), 914–925 (1993)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996)
Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: Proc. 22nd International Conference Very Large Databases, pp. 544–555 (1996)
Alsabti, K., Ranka, S., Singh, V.: CLOUDS: A decision tree classifier for large datasets. In: Proc. Conference Knowledge Discovery and Data Mining (KDD 1998), pp. 2–8 (1998)
Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest - a framework for fast decision tree classification of large datasets. In: Proc. of VLDB Conference, New York, pp. 416–427 (1998)
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.: BOAT - optimistic decision tree construction. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 169–180 (1999)
Yoon, H., Alsabti, K., Ranka, S.: Tree-based incremental classification for large datasets. Technical Report TR-99-013, CISE Department, University of Florida, Gainesville, FL. 32611 (1999)
UCI machine learning repository, University of California (2007), http://www.ics.uci.edu/mlearn/MLRepository.html
Adelman-McCarthy, J., Agueros, M.A., Allam, S.S.: Data Release 6, ApJS, 175 (in press, 2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Franco-Arcega, A., Carrasco-Ochoa, J.A., Sánchez-Díaz, G., Martínez-Trinidad, J.F. (2008). A New Incremental Algorithm for Induction of Multivariate Decision Trees for Large Datasets. In: Fyfe, C., Kim, D., Lee, SY., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2008. IDEAL 2008. Lecture Notes in Computer Science, vol 5326. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88906-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-88906-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88905-2
Online ISBN: 978-3-540-88906-9
eBook Packages: Computer ScienceComputer Science (R0)