Abstract
For interactive data mining of very large databases a method working with relatively small training data that can be extracted from the target databases by sampling is proposed, because it takes very long time to generate decision trees for the data mining of very large databases that contain many continues data values, and size of decision trees has the tendency of dependency on the size of training data. The method proposes to use samples of confidence in proper size as the training data to generate comprehensible trees as well as to save time. For medium or small databases direct use of original data with some harsh pruning may be used, because the pruning generates trees of similar size with smaller error rates.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Catlett, J.: Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney, Australia (1991)
SPSS: Clementine 8.0 User’s Guide Package. SPSS inc. (2004)
StatSoft, Inc.: Electronic Statistics Textbook. Tulsa, OK, StatSoft (2004), http://www.statsoft.com/textbook/stathome.html
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group (1984)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057. Springer, Heidelberg (1996)
Shafer, J., Agrawal, R., Mehta., M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: Proc. 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp. 544–555 (1996)
Rastogi, R., Shim, K.: PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. Data Mining and Knowledge Discovery 4(4), 315–344 (2002)
Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest: A Framework for Fast Decision Tree Construction of Large Datasets. In: Proc. 1998 Int. Conf. Very Large Data Bases, New York, NY, August 1998, pp. 416–427 (1998)
SAS, Decision Tree Modeling Course Notes. SAS Publishing (2002)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Heidelberg (2002)
Almuallim, H., Dietterich, T.G.: Efficient Algorithms for Identifying Relevant Features. In: Proc. of the 9th Canadian Conference on Artificial Intelligence, pp. 38–45 (1992)
Kononenko, I., et al.: Overcoming the Myopia of Inductive Learning Algorithms with RELIEF. Applied Intelligence 7(1), 39–55 (1997)
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer International, Dordrecht (1998)
Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York, pp. 80–86 (1998)
Liu, B., Hu, M., Hsu, W.: Multi-level Organization and Summarization of the Discovered Rule. In: Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 208–217 (2000)
Wang, K., Zhou, S., He, Y.: Growing Decision Trees on Support-less Association Rules. In: Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 265–269 (2000)
Berzal, F., Cubero, J., Sanchez, D., Serrano, J.M.: ART: A Hybrid Classification Model. Machine Learning 54, 67–92 (2004)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An Enabling Techniques. Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
Witten, I.V., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco (2000)
Russel, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall Inc., Englewood Cliffs (2002)
Hettich, S., Bay, S.D.: The UCI KDD Archive, University of California, Department of Information and Computer Science, Irvine (1999), http://kdd.ics.uci.edu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sug, H. (2005). A Comprehensively Sized Decision Tree Generation Method for Interactive Data Mining of Very Large Databases. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_17
Download citation
DOI: https://doi.org/10.1007/11527503_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27894-8
Online ISBN: 978-3-540-31877-4
eBook Packages: Computer ScienceComputer Science (R0)