Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays | SpringerLink
Skip to main content

Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays

  • Conference paper
Experimental Algorithms (SEA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5526))

Included in the following conference series:

  • 1064 Accesses

Abstract

The suffix array of a string s of length n over the alphabet Σ is the permutation that gives us the lexicographic order of all suffixes of s. This popular index can be used to solve many problems in sequence analysis. In practice, one limitation of this data structure is its size of n logn bits, while the size of the text is n log|Σ| bits. For this reason compressed suffix arrays (CSAs) were introduced. The size of these CSAs is asymptotically less than or equal to the text size if the text is compressible, while maintaining O(logε n) access time to the elements (0 < ε ≤ 1). The goal of a good CSA implementation is to provide fast access time to the elements while using minimal space for the CSA. Both access time and space depend on the choice of a self-delimiting code for compression. We show that the Fibonacci code is superior to the Elias δ code for strings that are low compressible. Our second contribution are two new broadword methods that support the decoding of Fibonacci encoded numbers on 64 bit architectures. Furthermore, our experiments show that the use of known broadword methods speed up the decoding of Elias δ code for strings that are high compressible, like XML. Finally, we provide a new efficient C++ library for succinct data structures which includes a generic CSA for further experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–83 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Elias, P.: Universal code word sets and representations of the integers. IEEE Transactions on Information Theory 21(2), 194–203 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice! arXiv:0712.3360v1 [cs.DS] (2007)

    Google Scholar 

  4. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2000)

    Google Scholar 

  5. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554. IEEE, Los Alamitos (1989)

    Google Scholar 

  7. Knuth, D.E.: The Art of Computer Programming. Pre-Fascicle 1a. A Draft of Section 7.1.3: Bitwise Tricks and Techniques (2008)

    Google Scholar 

  8. Lakos, J.: Large-Scale C++ Software Design. Eddison-Wesley (1996)

    Google Scholar 

  9. Mäkinen, V., Navarro, G.: Compressed compact suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 420–433. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  11. Ian Munro, J.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  12. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)

    Google Scholar 

  13. Rahman, N., Raman, R.: Rank and select operations on binary strings. In: Encyclopedia of Algorithms. Springer, Heidelberg (2008)

    Google Scholar 

  14. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1999)

    MATH  Google Scholar 

  17. Zeckendorf, E.: Représentation des nombres naturels par une somme de nombres de Fibonacci ou de nombres de Lucas. Bull. Soc. R. Sci. Liège 41, 179–182 (1972)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gog, S. (2009). Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays. In: Vahrenhold, J. (eds) Experimental Algorithms. SEA 2009. Lecture Notes in Computer Science, vol 5526. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02011-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02011-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02010-0

  • Online ISBN: 978-3-642-02011-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics