Abstract
Textual CBR systems solve problems by reusing experiences that are in textual form. Knowledge-rich comparison of textual cases remains an important challenge for these systems. However mapping text data into a structured case representation requires a significant knowledge engineering effort. In this paper we look at automated acquisition of the case indexing vocabulary as a two step process involving feature selection followed by feature generalisation. Boosted decision stumps are employed as a means to select features that are predictive and relatively orthogonal. Association rule induction is employed to capture feature co-occurrence patterns. Generalised features are constructed by applying these rules. Essentially, rules preserve implicit semantic relationships between features and applying them has the desired effect of bringing together cases that would have otherwise been overlooked during case retrieval. Experiments with four textual data sets show significant improvement in retrieval accuracy whenever generalised features are used. The results further suggest that boosted decision stumps with generalised features to be a promising combination.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast discovery of association rules. In: Advances in Knowledge Discovery and Data Mining, pp. 307–327. AAAI/MIT Press (1995)
Alvarez, W., Ruiz, C.: Collaborative recommendation via adaptive association rule mining. In: Proceedings of the International Workshop on Web Mining for E-Commerce, pp. 35–41 (2000)
Borgelt, C., Kruse, R.: Induction of association rules: Apriori implementation. In: Proceedings of the 14th Conference on Computational Statistics (2002)
Bruninghaus, S., Ashley, K.: Bootstrapping case base development with annotated case summaries. In: Proceedings of the Second International Conference on Case-Based Reasoning, ICCBR 1999, pp. 59–73 (1999)
Bruninghaus, S., Ashley, K.: The role of information extraction for textual CBR. In: Proceedings of the 4th International Conference on Case-Based Reasoning, ICCBR 2001, pp. 74–89 (2001)
Cai, L., Hofmann, T.: Text categorisation by boosting automatically extracted concepts. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 182–189 (2003)
Chakraborti, S., Ambati, S., Balaraman, V., Khemani, D.: Integrating knowledge sources and acquiring vocabulary for textual CBR. In: Proceedings of the 8th UK-CBR workshop, pp. 74–84 (2003)
Das, S.: Filters, wrappers and a boosting based hybrid for feature selection. In: Proceedings of the 18th International Conference on Machine Learning, pp. 74–81. Morgan Kaufmann, San Francisco (2001)
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning (1996)
Iba, W., Langley, P.: Induction of one-level decision trees. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 233–240 (1992)
Jarmulak, J., Craw, S., Rowe, R.: Genetic algorithms to optimise CBR retrieval. In: Blanzieri, E., Portinale, L. (eds.) EWCBR 2000. LNCS (LNAI), vol. 1898, pp. 136–147. Springer, Heidelberg (2000)
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: IML 1994, pp. 121–129 (1994) Journal version in AIJ
Lenz, M.: Defining knowledge layers for textual case-based reasoning. In: Smyth, B., Cunningham, P. (eds.) EWCBR 1998. LNCS (LNAI), vol. 1488, p. 298. Springer, Heidelberg (1998)
Lenz, M.: Knowledge sources for textual CBR applications. In: Proceedings of the AAAI 1998 Workshop on Textual Case-Based Reasoning, pp. 24–29. AAAI Press, Menlo Park (1998)
Mitchell, T.: Machine Learning. McGraw-Hill International, New York (1997)
Pazzani, M.J., Muramatsu, J., Billsus, D.: Syskill and Webert: Identifying interesting web sites. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, OR, pp. 54–61 (1996)
Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Information Retrieval 6, 49–73 (2003)
Salton, G., McGill, M.J.: An introduction to modern information retrieval. McGraw-Hill, New York (1983)
Weber, R., Aha, D.W., Sandhu, N., Munoz-Avila, H.: A textual case-based reasoning framework for knowledge management applications. In: Proceedings of the 9th German Workshop on Case-Based Reasoning, Shaker Verlag (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wiratunga, N., Koychev, I., Massie, S. (2004). Feature Selection and Generalisation for Retrieval of Textual Cases. In: Funk, P., González Calero, P.A. (eds) Advances in Case-Based Reasoning. ECCBR 2004. Lecture Notes in Computer Science(), vol 3155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28631-8_58
Download citation
DOI: https://doi.org/10.1007/978-3-540-28631-8_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22882-0
Online ISBN: 978-3-540-28631-8
eBook Packages: Springer Book Archive