Abstract
We consider skewed distributions of strings, in which any two such strings share a common prefix much longer than that expected in uniformly distributed (random) strings. For instance, this is the case of URL addresses, IP addresses, or XML path strings, all representing paths in some hierarchical order. As strings sharing a portion of the path have a quite long common prefix, we need to avoid the time-consuming repeated examination of these common prefixes while handling the linked data structures storing them. For this purpose, we show how to implement search data structures that can operate on strings with long prefixes in common. Despite the simplicity and the generality of the method, our experimental study shows that it is quite competitive with several optimized and tuned implementations currently available in the literature.
Partially supported by the Italian MIUR Project ALINWEB.
Partially supported by the EU Programme ALCOM-FT (n. IST-1999-14186) and by the Italian MIUR Project ALINWEB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Acharya, H. Zhu, K. Shen. Adaptive algorithms for cache-efficient trie search. Proc. ALENEX 99. Source code in http://www.cs.rochester.edu/~kshen, 1999.
G. M. Adel’son-Vel’skii and E. M. Landis. An algorithm for the organization of information. Soviet Mathematics Doklady, 3 (1962), 1259–1263.
Arianna Search Engine, http://arianna.libero.it.
R. Bayer. Symmetric binary B-trees: Data structure and maintenance algorithms. Acta Informatica 1 (1972), 290–306.
J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. SODA 1997, 360–369 (http://www.cs.princeton.edu/~rs/strings).
S. W. Bent, D. D. Sleator and R. E. Tarjan. Biased search trees. SIAM Journal on Computing 14 (1985), 545–568.
F. Bonchi, F. Giannotti, C. Gozzi, G. Manco, M. Nanni, D. Pedreschi, C. Renso and S. Ruggieri, Web log data warehousing and mining for intelligent web caching. Data and Knowledge Engineering, to appear.
H. A. Clampett. Randomized binary searching with the tree structures. Communications of the ACM 7 (1964), 163–165.
J. Clément, Ph. Flajolet and B. Vallée. The analysis of hybrid trie structures. SODA 1999.
J. Feigenbaum and R. E. Tarjan. Two new kinds of biased search trees. Bell Systems Technical Journal 62 (1983), 3139–3158.
T. F. Gonzalez. The on-line d-dimensional dictionary problem. SODA 1992, 376–385.
R. Grossi, G. F. Italiano. Efficient techniques for maintaining multidimensional keys in linked data structures. ICALP 1999, 372–381.
R. H. Gueting and H.-P. Kriegel. Multidimensional B-tree: An efficient dynamic file structure for exact match queries. 10th GI Annual Conference, 375–388.
L. J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. FOCS 1978, 8–21.
D. Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997.
S. Huddleston and K. Mehlhorn. A new data structure for representing sorted lists. Acta Informatica 17 (1982), 157–184.
D. E. Knuth. The Art of computer programming, Vol. 3: Sorting and Searching. Second edition, Addison-Wesley, 1973, 1998.
R. Jenkins, http://burtleburtle.net/bob/hash/.
K. Mehlhorn. Dynamic binary search. SIAM J. on Computing 8 (1979), 175–198.
K. Mehlhorn. Data structures and algorithms:Searching and sorting, Springer 1984.
D. R. Morrison. PATRICIA — Practical Algorithm To Retrieve Information Coded In Alphanumeric. J. ACM, 15, 514–534, 1968.
J. Nievergelt and E. M. Reingold. Binary search trees of bounded balance. SIAM Journal on Computing 2 (1973), 33–43.
J. M. Ockerbloom. The on-line books page. http://digital.library.upenn.edu/books/
W. Pugh. Skip Lists: A probabilistic alternative to balanced trees. Communications of the ACM 33 (1990), 668–676.
W. Pugh. Skip Lists ftp directory. ftp://ftp.cs.umd.edu/pub/skipLists/.
R. Sedgewick. Algorithms in C, Addison-Wesley, 1998.
R. Seidel and C. R. Aragon. Randomized search trees. Algorithmica 16 (1996), 464–497.
C. Silverstein. Library call data set. http://theory.stanford.edu/~csilvers/libdata.
C. Silverstein. A practical perfect hash algorithm. Manuscript, 1998.
D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. Journal of the ACM 32 (1985), 652–686.
W. Szpankowski, Average Case Analysis of Algorithms on Sequences, Wiley, 2001.
R. E. Tarjan. Data structures and network algorithms, SIAM (1983).
V. K. Vaishnavi. Multidimensional height-balanced trees. IEEE Transactions on Computers C-33 (1984), 334–343.
V. K. Vaishnavi. Multidimensional balanced binary trees. IEEE Transactions on Computers C-38 (1989), 968–985.
V. K. Vaishnavi. On k-dimensional balanced binary trees. Journal of Computer and System Sciences 52 (1996), 328–348.
Mark A. Weiss. Data structures and algorithm analysis in C, Addison Wesley, (1997). Source code in http://www.cs.fiu.edu/~weiss/dsaa_c2e/files.html.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crescenzi, P., Grossi, R., Italiano, G.F. (2003). Search Data Structures for Skewed Strings. In: Jansen, K., Margraf, M., Mastrolilli, M., Rolim, J.D.P. (eds) Experimental and Efficient Algorithms. WEA 2003. Lecture Notes in Computer Science, vol 2647. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44867-5_7
Download citation
DOI: https://doi.org/10.1007/3-540-44867-5_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40205-3
Online ISBN: 978-3-540-44867-9
eBook Packages: Springer Book Archive