Fully-Compressed Suffix Trees | SpringerLink
Skip to main content

Fully-Compressed Suffix Trees

  • Conference paper
LATIN 2008: Theoretical Informatics (LATIN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4957))

Included in the following conference series:

  • 1799 Accesses

Abstract

Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require O(n logn) bits of space, for a string of size n. This is considerably more than the n log2 σ bits needed for the string itself, where σ is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Θ(n) extra bits. This is already spectacular, but still unsatisfactory when σ is small as in DNA sequences.

In this paper we introduce the first compressed suffix tree representation that breaks this linear-space barrier. Our representation requires sublinear extra space and supports a large set of navigational operations in logarithmic time. An essential ingredient of our representation is the lowest common ancestor (LCA) query. We reveal important connections between LCA queries and suffix tree navigation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Apostolico, A.: Combinatorial Algorithms on Words. In: The myriad virtues of subword trees. NATO ISI Series, pp. 85–96. Springer, Heidelberg (1985)

    Google Scholar 

  2. Gusfield, D.: Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge (1997)

    MATH  Google Scholar 

  3. Giegerich, R., Kurtz, S., Stoye, J.: Efficient implementation of lazy suffix trees. Softw., Pract. Exper. 33(11), 1035–1049 (2003)

    Article  Google Scholar 

  4. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  5. Sadakane, K.: Compressed Suffix Trees with Full Functionality. Theo. Comp. Sys. (2007)

    Google Scholar 

  6. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1) (2007) (article 2)

    Google Scholar 

  7. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Algor. 3(2) (2007) (article 20)

    Google Scholar 

  8. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  9. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. of Algorithms 48(2), 294–313 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  10. Foschini, L., Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments with compressing suffix arrays and applications. ACM Trans. Algor. 2(4), 611–639 (2006)

    Article  MathSciNet  Google Scholar 

  11. Weiner, P.: Linear pattern matching algorithms. In: IEEE Symp. on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  12. Lee, S., Park, K.: Dynamic rank-select structures with applications to run-length encoded texts. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 96–106. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Bender, M., Farach-Colton, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  14. Fischer, J., Heun, V.: A new succinct representation of RMQ-information and improvements in the enhanced suffix array. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 459–470. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Bender, M., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comp. Sci. 321(1), 5–12 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  16. Geary, R., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: Munro, J.I. (ed.) SODA, pp. 1–10. SIAM, Philadelphia (2004)

    Google Scholar 

  17. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Eduardo Sany Laber Claudson Bornstein Loana Tito Nogueira Luerbio Faria

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Russo, L.M.S., Navarro, G., Oliveira, A.L. (2008). Fully-Compressed Suffix Trees. In: Laber, E.S., Bornstein, C., Nogueira, L.T., Faria, L. (eds) LATIN 2008: Theoretical Informatics. LATIN 2008. Lecture Notes in Computer Science, vol 4957. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78773-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78773-0_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78772-3

  • Online ISBN: 978-3-540-78773-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics