A Comprehensively Sized Decision Tree Generation Method for Interactive Data Mining of Very Large Databases | SpringerLink
Skip to main content

A Comprehensively Sized Decision Tree Generation Method for Interactive Data Mining of Very Large Databases

  • Conference paper
Advanced Data Mining and Applications (ADMA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3584))

Included in the following conference series:

  • 2392 Accesses

Abstract

For interactive data mining of very large databases a method working with relatively small training data that can be extracted from the target databases by sampling is proposed, because it takes very long time to generate decision trees for the data mining of very large databases that contain many continues data values, and size of decision trees has the tendency of dependency on the size of training data. The method proposes to use samples of confidence in proper size as the training data to generate comprehensible trees as well as to save time. For medium or small databases direct use of original data with some harsh pruning may be used, because the pruning generates trees of similar size with smaller error rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Catlett, J.: Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney, Australia (1991)

    Google Scholar 

  2. SPSS: Clementine 8.0 User’s Guide Package. SPSS inc. (2004)

    Google Scholar 

  3. StatSoft, Inc.: Electronic Statistics Textbook. Tulsa, OK, StatSoft (2004), http://www.statsoft.com/textbook/stathome.html

  4. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  5. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group (1984)

    Google Scholar 

  6. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  7. Shafer, J., Agrawal, R., Mehta., M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: Proc. 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp. 544–555 (1996)

    Google Scholar 

  8. Rastogi, R., Shim, K.: PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. Data Mining and Knowledge Discovery 4(4), 315–344 (2002)

    Article  Google Scholar 

  9. Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest: A Framework for Fast Decision Tree Construction of Large Datasets. In: Proc. 1998 Int. Conf. Very Large Data Bases, New York, NY, August 1998, pp. 416–427 (1998)

    Google Scholar 

  10. SAS, Decision Tree Modeling Course Notes. SAS Publishing (2002)

    Google Scholar 

  11. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  12. Almuallim, H., Dietterich, T.G.: Efficient Algorithms for Identifying Relevant Features. In: Proc. of the 9th Canadian Conference on Artificial Intelligence, pp. 38–45 (1992)

    Google Scholar 

  13. Kononenko, I., et al.: Overcoming the Myopia of Inductive Learning Algorithms with RELIEF. Applied Intelligence 7(1), 39–55 (1997)

    Article  Google Scholar 

  14. Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer International, Dordrecht (1998)

    MATH  Google Scholar 

  15. Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York, pp. 80–86 (1998)

    Google Scholar 

  16. Liu, B., Hu, M., Hsu, W.: Multi-level Organization and Summarization of the Discovered Rule. In: Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 208–217 (2000)

    Google Scholar 

  17. Wang, K., Zhou, S., He, Y.: Growing Decision Trees on Support-less Association Rules. In: Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 265–269 (2000)

    Google Scholar 

  18. Berzal, F., Cubero, J., Sanchez, D., Serrano, J.M.: ART: A Hybrid Classification Model. Machine Learning 54, 67–92 (2004)

    Article  MATH  Google Scholar 

  19. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An Enabling Techniques. Data Mining and Knowledge Discovery 6(4), 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  20. Witten, I.V., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  21. Russel, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall Inc., Englewood Cliffs (2002)

    Google Scholar 

  22. Hettich, S., Bay, S.D.: The UCI KDD Archive, University of California, Department of Information and Computer Science, Irvine (1999), http://kdd.ics.uci.edu

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sug, H. (2005). A Comprehensively Sized Decision Tree Generation Method for Interactive Data Mining of Very Large Databases. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_17

Download citation

  • DOI: https://doi.org/10.1007/11527503_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27894-8

  • Online ISBN: 978-3-540-31877-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics