Abstract
Automatically discovering concepts is not only a fundamental task in knowledge capturing and ontology engineering processes, but also a key step of many applications in information retrieval. For such a task, pattern-based approaches and statistics-based approaches are widely used, between which the former ones eventually turned out to be more precise. However, the effective patterns in such approaches are usually defined manually. It involves much time and human labor, and considers only a limited set of effective patterns. In our research, we accomplish automatically obtaining patterns through frequent sequence mining. A voting approach is then presented that can determine whether a sentence contains a concept and accurately identify it. Our algorithm includes three steps: pattern mining, pattern refining and concept discovery. In our experimental study, we use several traditional measures, precision, recall and F1 value, to evaluate the performance of our approach. The experimental results not only verify the validity of the approach, but also illustrate the relationship between performance and the parameters of the algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
LexiQuest Product White Paper, http://www.lexiquest.fr/products/LexiQuestGuideWhitePaper.pdf
Sakurai, S., Suyama, A.: Rule discovery from textual data based on key phrase patterns. In: Proceedings of the 2004 ACM symposium on Applied computing, pp. 606–612 (2004)
Liu, B., Chin, C.W., Ng, H.T.: Mining Topic-specific Concepts and Definitions on the Web. In: WWW 2003, pp. 251–260 (2003)
Woods, W.: Conceptual indexing: A better way to organize knowledge. Technical Report, Sun Microsystems Laboratories (April 1997)
Loh, S., Wives, L.K., de Oliveira, J.P.M.: Concept-Based Knowledge Discovery in Texts Extracted from the Web. SIGKDD Explorations 2(1), 29–39 (2000)
Bennett, N.A., He, Q., Powell, K., Schatz, B.R.: Extracting noun phrases for all of MEDLINE. In: Proc. American Medical Informatics Assoc. (1999)
Klavans, J., Muresan, S.: DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text. In: Proc. AMIA, CA, pp. 201–202 (2000)
Turney, P.: Learning to extract keyphrases from text. Technical Report, National Research Council, Institute for Information Technology (1999)
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-Specific Keyphrase Extraction. In: IJCAI, pp. 668–673 (1999)
Haav, H.-M., Lubi, T.-L.: A Survey of Concept-based Information Retrieval Tools on the Web. In: Caplinkas, A., Eder, J. (eds.) Advances in Databases and Information Systems, Proc. of 5th East-European Conference ADBIS*2001, vol. 2, pp. 29–41 (2001)
Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. In: WWW7 (1998)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Eleventh International Conference on Data Engineering, Taiwan, pp. 3–14 (1995)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth. In: Proc. of the 17th Int. Conf. on Data Eng., pp. 215–226 (2001)
Cooper, R.J., Ruger, S.M.: A simple question answering system. In: Proceedings of TREC, vol. 9 (2000)
Harabagiu, S., Moldovan, D., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R., Girju, R., Rus, V., Morarescu, P.: FALCON: Boosting knowledge for answer engines. In: Proceedings of TREC, vol. 9 (2000)
Lu, F., Johnsten, T.D., Raghavan, V.V., Traylor, D.: Enhancing internet search engines to achieve concept-based retrieval. In: InForum 1999, Oakridge (May 1999)
Qiu, Y., Frei, H.-P.: Concept-based query expansion. In: Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval, pp. 160–169 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, J., Zhang, Z., Li, Q., Li, X. (2005). A Pattern-Based Voting Approach for Concept Discovery on the Web. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31849-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-31849-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25207-8
Online ISBN: 978-3-540-31849-1
eBook Packages: Computer ScienceComputer Science (R0)