Abstract
In this paper we propose a new method for training classifiers for multi-class problems when classes are not (necessarily) mutually exclusive and may be related by means of a probabilistic tree structure. It is based on the definition of a Bayesian model relating network parameters, feature vectors and categories. Learning is stated as a maximum likelihood estimation problem of the classifier parameters. The proposed algorithm is specially suited to situations where each training sample is labeled with respect to only one or part of the categories in the tree. Our experiments on information retrieval scenarios show the advantages of the proposed method.
Similar content being viewed by others
Notes
Our analysis in the following is based on the implicit assumption that the class observation process is independent on the class and on the value of the class label. The analysis of more complex observation processes goes beyond the scope of this paper.
References
L. Cai and T. Hoffman, “Hierarchical Document Categorization with Support Vector Machines,” in Proc. of CIKM 2004, Washington DC, USA, Nov. 2004.
J. Keshet, O. Dekel, and Y. Singer, “Large Margin Hierarchical Classification,” in Proc. of the 21st ICML, Banff, Canada, 2004.
E. D. Wiener, A. S. Weigend, and J. O. Pedersen, “Exploiting Hierarchy in Text Categorization,” Inf. Retr., vol. 1, no. 3, October 1999, pp. 193–216.
M. E. Ruiz and P. Srinivasan, “Hierarchical Text Categorization Using Neural Networks,” Inf. Retr., vol. 5, no. 1, 2002, pp. 87–117.
A. Lagreid, T. R. Hvidsten, H. Midelfart, J. Komorowski, and A. K. Sandvik, “Predicting Gene Ontology Biological Process from Temporal Gene Expression Patterns,” Genome Res., vol. 13, no. 5, April 2003, pp. 965–979.
O. D. King, R. E. Foulger, S. S. Dwight, J. V. White, and F. P. Roth, “Predicting Gene Funtion from Patterns of Annotation,” Genome Res., vol. 13, no. 5, April 2003, pp. 896–904.
F. V. Jensen, Bayesian Networks and Decision Graphs, Springer, Berlin Heidelberg New York, 2001.
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, 1988.
M. I. Jordan and R. A. Jacobs, “Hierarchical Mixtures of Experts and the em Algorithm,” Neural Comput., vol. 6, no. 2, March 1994, pp. 181–214.
D. D. Lewis, Reuters-21578 Text Categorization Test Collection, Tech. Rep., AT&T Labs–Research, 1997.
D. D. Lewis, Y. Yang, T. Rose, and F. Li, “Rcv1: A New Benchmark Collection for Text Categorization Research,” J. Mach. Learn. Res., vol. 5, no. 361, 2004, p. 397.
E. Alpaydin, “Combined 5 × 2 cv f Test for Comparing Supervised Classification Learning Algorithms,” Neural Comput., vol. 11, no. 8, 1999, pp. 1885–1892.
D. D. Lewis, “Rcv1-v2/lyrl2004: The Lyrl2004 Distribution of the Rcv1-v2 Text Categorization Test Collection,” Tech. Rep., http://www.ics.uci.edu/~kdd/databases/reuters21578/reuters21578.html, 2004.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper has been partially supported by Spanish MEC grants ref. TIC 2002-03713 and TEC 2005-06766-C03-02/TCM, by Madrid Chamber grant ref. S-0505/TIC/0223 and UC3M-TEC-05-027
Rights and permissions
About this article
Cite this article
Ortega-Moral, M., Gutiérrez-González, D., De-Pablo, M.L. et al. Training Classifiers for Tree-structured Categories with Partially Labeled Data. J VLSI Sign Process Syst Sign Im 48, 53–65 (2007). https://doi.org/10.1007/s11265-006-0008-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-006-0008-7