Abstract
Classification is a function that matches a new object with one of the predefined classes. Document classification is characterized by the large number of attributes involved in the objects (documents). The traditional method of building a single classifier to do all the classification work would incur a high overhead. Hierarchical classification is a more efficient method — instead of a single classifier, we use a set of classifiers distributed over a class taxonomy, one for each internal node. However, once a misclassification occurs at a high level class, it may result in a class that is far apart from the correct one. An existing approach to coping with this problem requires terms also to be arranged hierarchically. In this paper, instead of overhauling the classifier itself, we propose mechanisms to detect misclassification and take appropriate actions. We then discuss an alternative that masks the misclassification based on a well known software fault tolerance technique. Our experiments show our algorithms represent a good trade-off between speed and accuracy in most applications.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
H. Almualim, Y. Akiba, S. Kaneda, “An efficient algorithm for finding optimal gain-ratio multiple-split tests on hierarchical attributes in decision tree learning”, Proc. of National Conf. on Artificial Intelligence, AAAI 1996, pp 703–708.
R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer and A. Swami, “An interval classifier for database mining applications”, Proc. of VLDB, 1992, pp 560–573.
L. Breiman, J. Friedman, R. Olshen and C. Stone, “Classification and regression trees”, Wadsworth, Belmont, 1984.
S. Chakrabarti, B. Dom, R. Agrawal and P. Raghavan, “Using taxonomy, discriminants, and signatures for navigating in text databases”, Proc. of the 23rd VLDB, 1997, pp 446–455.
K. Cios, W. Pedrycz and r. Swiniarski, “Data mining methods for knowledge discovery”, Kluwer Academic Publishers, 1998.
P. Cheeseman, J. Kelly, M. Self, “AutoClass: a Bayesian classification system”, Proc. of 5th Int’l Conf. on Machine Learning, Morgan Kaufman, June 1988.
N. Friedman and M. Goldszmidt, “Building classifiers using Bayesian networks”, Proc. of AAAI, 1996, 1277–1284.
T. Fukuda, Y. Morimoto and S. Morishita, “Constructing efficient decision trees by using optimized numeric association rules”, Proc. Of VLDB, 1996, pp 146–155.
J. Gehrke, R. Ramakrishnan and V. Ganti, “Rainforest-a framework for fast decision tree construction of large datasets”, Proc. of VLDB, 1998, pp 416–427.
D. Heckerman, “Bayesian networks for data mining”, Data Mining and Knowledge Discovery, 1, 1997, pp 79–119.
D. Koller and M. Sahami, “Toward optimal feature selection”, Proc. of Int’l. Conf. on Machine Learning, Vol. 13, Morgan-Kaufmann, 1996.
D. Koller and M. Sahami, “Hierarchically classifying documents using very few words”, Proc. of the 14th Int’l. Conf. on Machine Learning, 1997, pp 170–178.
M. Mehta, R. Agrawal and J Rissanen, “SLIQ: a fast scalable classifier for data mining”, Proc. of fifth Int’l Conf. on EDBT, March 1996
J. Quinlan, “Induction of decision trees”, Machine Learning, 1986, pp 81–106.
J. Quinlan, “C4.5: programs for machine learning”, Morgan Kaufman, 1993.
G. Salton, “Automatic text processing, the transformation analysis and retrieval of information by computer”, Addison-Wesley, 1989.
J. Shafer, R. Agrawal and M. Mehta, “Sprint: a scalable parallel classifier for data mining”, Proc. of the 22nd VLDB, 1996, pp 544–555.
E.S. Ristad, “A natural law of succession”, Research report CS-TR-495-95, Princeton University, July 1995.
S. Weiss, and C. Kulikowski, “Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning and expert systems”, Morgan Faufman, 1991.
K. Wang, S. Zhou and S.C. Liew, “Building hierarchical classifiers using class proximity”, Proc. of the 25th VLDB, 1999, pp 363–374.
Y. Morimoto, T. Fukuda, H. Matsuzawa, T. Tokuyama and K. Yoda, “Algorithms for mining association rules for binary segmentations of huge categorical databases ”, Proc. of VLDB, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cheng, Ch., Tang, J., Wai-chee Fu, A., King, I. (2001). Hierarchical Classification of Documents with Error Control. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_46
Download citation
DOI: https://doi.org/10.1007/3-540-45357-1_46
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41910-5
Online ISBN: 978-3-540-45357-4
eBook Packages: Springer Book Archive