Years and Authors of Summarized Original Work
-
1993; Manber, Myers
Problem Definition
Text or string data naturally arises in many contexts including document processing, information retrieval, natural and computer language processing, and describing molecular sequences. In broad terms, the goal of text indexing is to design methodologies to store text data so as to significantly improve the speed and performance of answering queries. While text indexing has been studied for a long time, it shot into prominence during the last decade due to the ubiquity of web-based textual data and search engines to explore it, design of digital libraries for archiving human knowledge, and application of string techniques to further understanding of modern biology. Text indexing differs from the typical indexing of keys drawn from an underlying total order – text data can have varying lengths, and queries are often more complex and involve substrings, partial matches, or approximate matches.
Queries on...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Abouelhoda M, Kurtz S, Ohlebusch E (2004) Replacing suffix trees with enhanced suffix arrays. J Discret Algorithms 2:53–86
Aluru S (ed) (2005) Handbook of computational molecular biology, Computer and Information Science Series. Chapman and Hall/CRC, Boca Raton
Amir A, Kopelowitz T, Lewenstein M, Lewenstein N (2005) Towards real-time suffix tree construction. In: Proceedings of the string processing and information retrieval symposium (SPIRE), pp 67–78
Ciriani V, Ferragina P, Luccio F, Muthukrishnan S (2007) A data structure for a sequence of string accesses in external memory. ACM Trans Algorithms 3
Crescenzi P, Grossi R, Italiano G (2003) Search data structures for skewed strings. In: International workshop on experimental and efficient algorithms (WEA). Lecture notes in computer science, vol 2. Springer, Berlin, pp 81–96
Crochemore M, Rytter W (2002) Jewels of stringology. World Scientific Publishing Company, Singapore
Ferragina P, Grossi R (1998) Optimal on-line search and sublinear time update in string matching. SIAM J Comput 3:713–736
Franceschini G, Grossi R (2004) A general technique for managing strings in comparison-driven data structures. In: Annual international colloquium on automata, languages and programming (ICALP)
Grossi R, Italiano G (1999) Efficient techniques for maintaining multidimensional keys in linked data structures. In: Annual international colloquium on automata, languages and programming (ICALP), pp 372–381
Gusfield D (1997) Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, New York
Karkkainen J, Sanders P, Burkhardt S (2006) Linear work suffix arrays construction. J ACM 53:918–936
Kasai T, Lee G, Arimura H et al (2001) Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proceedings of the 12th annual symposium, combinatorial pattern matching (CPM), pp 181–192
Ko P, Aluru S (2005) Space efficient linear time construction of suffix arrays. J Discret Algorithms 3:143–156
Ko P, Aluru S (2007) Optimal self-adjustring tree for dynamic string data in secondary storage. In: Proceedings of the string processing and information retrieval symposium (SPIRE), Santiago. Lecture notes in computer science, vol 4726, pp 184–194
Manber U, Myers G (1993) Suffix arrays: a new method for on-line search. SIAM J Comput 22:935–948
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33:31–88
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this entry
Cite this entry
Aluru, S. (2016). Text Indexing. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2864-4_422
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2864-4_422
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2863-7
Online ISBN: 978-1-4939-2864-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering