Abstract
We propose a tree-similarity-based unsupervised learning method to extract relations between Named Entities from a large raw corpus. Our method regards relation extraction as a clustering problem on shallow parse trees. First, we modify previous tree kernels on relation extraction to estimate the similarity between parse trees more efficiently. Then, the similarity between parse trees is used in a hierarchical clustering algorithm to group entity pairs into different clusters. Finally, each cluster is labeled by an indicative word and unreliable clusters are pruned out. Evaluation on the New York Times (1995) corpus shows that our method outperforms the only previous work by 5 in F-measure. It also shows that our method performs well on both high-frequent and less-frequent entity pairs. To the best of our knowledge, this is the first work to use a tree similarity metric in relation clustering.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
MUC (1987-1998) The nist MUC website, http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
Miller, S., Fox, H., Ramshaw, L., Weischedel, R.: A novel use of statistical parsing to extract information from text. In: Proceedings of NAACL-2000 (2000)
Zelenko, D., Aone, C., Richardella, A.: Kernel Methods for Relation Extraction. Journal of Machine Learning Research 2003(2), 1083–1106 (2003)
Culotta, A., Sorensen, J.: Dependency Tree Kernel for Relation Extraction. In: Proceeding of ACL-2004 (2004)
Kambhatla, N.: Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations. In: Proceeding of ACL-2004, Poster paper (2004)
Agichtein, E., Gravano, L.: Snow-ball: Extracting Relations from Large Plain-text Collections. In: Proceedings of the Fifth ACM International Conference on Digital Libraries (2000)
Stevenson, M.: An Unsupervised WordNet-based Algorithm for Relation Extraction. In: Proceedings of the 4th LREC workshop Beyond Named Entity: Semantic Labeling for NLP tasks (2004)
Hasegawa, T., Sekine, S., Grishman, R.: Discovering Relations among Named Entities from Large Corpora. In: Proceeding of ACL-2004 (2004)
Vapnik, V.: Statistical Learning Theory. John Wiley, Chichester (1998)
Collins, M., Duffy, N.: Convolution Kernels for Natural Language. In: Proceeding of NIPS-2001 (2001)
Collins, M., Duffy, N.: New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. In: Proceeding of ACL-2002 (2002)
Haussler, D.: Convolution Kernels on Discrete Structures. Technical Report UCS-CRL-99-10, University of California (1999)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernel. Journal of Machine Learning Research 2002(2), 419–444 (2002)
Suzuki, J., Hirao, T., Sasaki, Y., Maeda, E.: Hierarchical Directed Acyclic Graph Kernel: Methods for Structured Natural Language Data. In: Proceedings of ACL-2003 (2003)
Suzuki, J., Isozaki, H., Maeda, E.: Convolution Kernels with Feature Selection for Natural Language Processing Tasks. In: Proceedings of ACL-2004 (2003)
Moschitti, A.: A study on Convolution Kernels for Shallow Semantic Parsing. In: Proceedings of ACL-2004 (2004)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing, pp. 500–527. The MIT Press, Cambridge (1999)
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing. Ph.D. Thesis. University of Pennsylvania (1999)
Fellbaum, C.: WordNet: An Electronic Lexical Database and some of its Applications. MIT Press, Cambridge (1998)
Sekine, S.: OAK System (English Sentence Analysis) (2001), http://nlp.cs.nyu.edu/oak
Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: Proceedings of LREC-2002 (2002)
ACE. The Automatic Content Extraction (ACE) Projects (2004), http://www.ldc.upenn.edu/Projects/ACE/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, M., Su, J., Wang, D., Zhou, G., Tan, C.L. (2005). Discovering Relations Between Named Entities from a Large Raw Corpus Using Tree Similarity-Based Clustering. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_34
Download citation
DOI: https://doi.org/10.1007/11562214_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)