Abstract
Name Search is an important search function in Digital Library systems and various types of information retrieval systems, such as directory search systems, electronic phonebooks and yellow pages. The paper discusses two main approaches to fuzzy name matchingthe natural language processing (NLP) approach and the information retrieval (IR) approachand proposes a hybrid approach. Person names can be considered a (sub-)language, in which case a name search system will be developed using Natural Language Processing apparatus including dictionary, thesaurus and grammatical schema. On the other hand, if names are perceived as (free) text, then an entirely different system may be built incorporating indexing, retrieving, relevance ranking and other Information Retrieval techniques. These two schools of thought, NLP and IR, have somewhat different sets of techniques originating from different theoretical concerns and research traditions. A selective combination of their complementary features is likely to be more effective for fuzzy name matching. Two principles, position attribute identity (PAI) and position transition likelihood (PTL), are proposed to incorporate aspects of both approaches. The two principles have been implemented in an NLP- and IR- hybrid model system called Friendly Name Search (FNS) for real world applications in multilingual directory searches on the Singapore Yellowpages website.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beli, G.B., Sethi, A.: Matching Records in a National Medical Patient Index. Communication of the ACM 44(9), 83–88 (2001)
Borgman, C.L., Siegfried, S.L.: Getty’s Synoname and Its Cousins: A Survey of Applications of Personal Name-Matching Algorithms. Journal of the American Society for Information Science 43(7), 459–467 (1992)
Bosch, A., Daelemans, W.: Data-Oriented Methods for Grapheme-to-Phoneme Conversion. In: Proceedings of the Sixth Conference of the European Chapter of the ACL, Utrecht (April 1993)
Church, K.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In: The Second Conference on Applied Natural Language Processing, ACL, Austin, Texas (1988)
Conroy, D., Vitale, T., Klatt, D.H.: DECtalk DTC03 Text-to-Speech System Owner’s Manual (Educational Services of Digital Equipment Corporation, P.O. Box CS2008, Nashua, NH 03061. Document number EK-DTC03-0M-001) (1992)
Davidson, L.: Retrieval of Misspelled Names in an Airline Passenger Record System. Communication of the ACM 5, 169–171 (1962)
DeRose, S.: Grammatical Category Disambiguation by Statistical Optimization. Computational Linguistics 14(1) (1988)
Fokker, D.W., Lynch, M.F.: Application of the Variety-Generator Approach to Searches of Personal Names in Bibliographic Database-Part 1. Microstructure of Personal Authors’ Names. Journal of Library Automation 7(2), 105–118 (1974)
Fokker, D.W.: Application of the Variety-Generator Approach to Searches of Personal Names in Bibliographic Database-Part II. Optimization of Key-Sets, and Evaluation of Their Retrieval Efficiency. Journal of Library Automation 7(3), 201–215 (1974)
Golding, A.: Pronouncing Names by a Combination of Rule-Based and Case-Based Reasoning. Ph.D. Thesis, Stanford University (1991)
Hall, P.A., Dowling, G.R.: Approximate String Matching. Computing Surveys 12(4), 381–402 (1980)
Hermansen, J.C.: Automatic Name Searching in Large Data Base of International Names. Ph.D. Thesis, Georgetown University (1985)
Keen, E.M.: Some Aspects of Proximity Searching in Text Retrieval Systems. Journal of Information Science 18, 89–98 (1992)
Moore, G.J.: Mechanizing a Large Register of First Order Patient Data. Methods of Information in Medicine 4(1), 1–19 (1965)
Pfeifer, U., Poersch, T., Fuhr, N.: Retrieval effectiveness of proper name search methods. Information Processing & Management 32(6), 667–679 (1996)
Pirkola, A., Toivonen, J., Keskustalo, H., Visala, K., Järvelin, K.: Fuzzy Translation of Cross-Lingual Spelling Variants. In: SIGIR 2003, Toronto, Canada, July 28-August, pp. 345–352 (2003)
Qiu, Y., Frei, H.P.: Concept Based Query Expansion. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference, Pittsburgh, PA, USA, June-July (1993)
Roughton, K.G., Tyckoson, D.A.: Browsing with Sound: Sound-Based Codes and Automated Authority Control. Information Technology and Library 4, 130–136 (1985)
Siegfried, S.L., Bernstein, J.: Synoname: The Getty’s New Approach to Pattern Matching for Person Names. Computers and the Humanities 25, 211–226 (1991)
Stalls, B., Knight, K.: Translating names and technical terms in Arabic text. In: Proceedings of the COLING/ACL Workshop on Computational Approaches to Semitic Languages (1998)
Taft, R.L.: Name Search Techniques State Identification and Intelligent System. Albany, New York (1970)
Wong, W.S., Chuah, M.C.: A Hybrid Approach to Address Normalization. IEEE Expert 9(12) (1994)
Wu, P.H.J., Shen, Z.Q., Guo, S., Lim, P.S., Chng, T.J., Chong, C.J., Low, H.B.: Technologies in Meta-Information Management and Service. In: Proceedings of the joint Pacific Asian Conference on Expert Systems and Singapore International Conference on Intelligent Systems, Singapore, pp. 711–720 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, P.HJ., Na, JC., Khoo, C.S.G. (2004). NLP Versus IR Approaches to Fuzzy Name Searching in Digital Libraries. In: Heery, R., Lyon, L. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2004. Lecture Notes in Computer Science, vol 3232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30230-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-30230-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23013-7
Online ISBN: 978-3-540-30230-8
eBook Packages: Springer Book Archive