Abstract
It is a known phenomenon that text document classifiers may benefit from inclusion of hypernyms of the terms in the document. However, this inclusion may be a mixed blessing because it may fuzzify the boundaries between document classes [5, 6, 10].
We have elaborated a new type of document classifiers, so called semantic classifiers, trained not on the original data but rather on the categories assigned to the document by our semantic categorizer [1, 4], that require significantly smaller corpus of training data and outperforms traditional classifiers used in the domain.
With this research we want to clarify what is the advantage/disadvantage of using supercategories of the assigned categories (an analogon of hypernyms) on the quality of classification. In particular we concluded that supercategories should be added with restricted weight, for otherwise they may deteriorate the classification performance. We found also that our technique of aggregating the categories counteracts the fuzzifying of class boundaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Borkowski, P.: Metody semantycznej kategoryzacji w zadaniach analizy dokumentów tekstowych. Ph.D. thesis, Institute of Computer Science of Polish Academy of Sciences (2019)
Borkowski, P., Ciesielski, K., Klopotek, M.A.: Unsupervised aggregation of categories for document labelling. In: Foundations of Intelligent Systems - 21st International Symposium. ISMIS 2014, Roskilde, Denmark, 25–27 June 2014. Proceedings, pp. 335–344 (2014)
Chemudugunta, C., Holloway, A., Smyth, P., Steyvers, M.: Modeling documents by combining semantic concepts with unsupervised statistical learning. In: Sheth, A., et al. (eds.) The Semantic Web - ISWC 2008. LNCS, vol. 5318, pp. 229–244. Springer, Berlin (2008)
Ciesielski, K., Borkowski, P., Klopotek, M.A., Trojanowski, K., Wysocki, K.: Wikipedia-based document categorization. In: SIIS 2011, pp. 265–278 (2011)
Huang, Z., Thint, M., Qin, Z.: Question classification using head words and their hypernyms. In: EMNLP 2008: Proceedings of the Conference on Empirical Methods in Natural Language, pp. 927–936, October 2008
Li, X., Roth, D.: Learning question classifiers. In: The 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7 (2002)
Nguyen, C.T.: Bridging semantic gaps in information retrieval: context-based approaches. ACM VLDB 10 (2010)
Rafi, M., Hassan, S., Shaikh, M.S.: Content-based text categorization using wikitology. CoRR abs/1208.3623 (2012)
Ramakrishna Murty, M., Murthy, J., Prasad Reddy, P., Satapathy, S.: A survey of cross-domain text categorization techniques. In: RAIT 2012, pp. 499–504. IEEE (2012)
Scott, S., Matwin, S.: Text classification using wordnet hypernyms. In: Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, pp. 38–44, 45–52. Association for Computational Linguistics (1998)
Wang, P., Domeniconi, C., Hu, J.: Using Wikipedia for co-clustering based cross-domain text classification. In: ICDM 2008, pp. 1085–1090. IEEE (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Borkowski, P., Ciesielski, K., Kłopotek, M.A. (2021). The Impact of Supercategory Inclusion on Semantic Classifier Performance. In: Stettinger, M., Leitner, G., Felfernig, A., Ras, Z.W. (eds) Intelligent Systems in Industrial Applications. ISMIS 2020. Studies in Computational Intelligence, vol 949. Springer, Cham. https://doi.org/10.1007/978-3-030-67148-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-67148-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67147-1
Online ISBN: 978-3-030-67148-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)