Abstract
The suffix array of a string s of length n over the alphabet Σ is the permutation that gives us the lexicographic order of all suffixes of s. This popular index can be used to solve many problems in sequence analysis. In practice, one limitation of this data structure is its size of n logn bits, while the size of the text is n log|Σ| bits. For this reason compressed suffix arrays (CSAs) were introduced. The size of these CSAs is asymptotically less than or equal to the text size if the text is compressible, while maintaining O(logε n) access time to the elements (0 < ε ≤ 1). The goal of a good CSA implementation is to provide fast access time to the elements while using minimal space for the CSA. Both access time and space depend on the choice of a self-delimiting code for compression. We show that the Fibonacci code is superior to the Elias δ code for strings that are low compressible. Our second contribution are two new broadword methods that support the decoding of Fibonacci encoded numbers on 64 bit architectures. Furthermore, our experiments show that the use of known broadword methods speed up the decoding of Elias δ code for strings that are high compressible, like XML. Finally, we provide a new efficient C++ library for succinct data structures which includes a generic CSA for further experiments.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–83 (2004)
Elias, P.: Universal code word sets and representations of the integers. IEEE Transactions on Information Theory 21(2), 194–203 (1975)
Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice! arXiv:0712.3360v1 [cs.DS] (2007)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2000)
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)
Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554. IEEE, Los Alamitos (1989)
Knuth, D.E.: The Art of Computer Programming. Pre-Fascicle 1a. A Draft of Section 7.1.3: Bitwise Tricks and Techniques (2008)
Lakos, J.: Large-Scale C++ Software Design. Eddison-Wesley (1996)
Mäkinen, V., Navarro, G.: Compressed compact suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 420–433. Springer, Heidelberg (2004)
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Ian Munro, J.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)
Rahman, N., Raman, R.: Rank and select operations on binary strings. In: Encyclopedia of Algorithms. Springer, Heidelberg (2008)
Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003)
Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1999)
Zeckendorf, E.: Représentation des nombres naturels par une somme de nombres de Fibonacci ou de nombres de Lucas. Bull. Soc. R. Sci. Liège 41, 179–182 (1972)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gog, S. (2009). Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays. In: Vahrenhold, J. (eds) Experimental Algorithms. SEA 2009. Lecture Notes in Computer Science, vol 5526. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02011-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-02011-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02010-0
Online ISBN: 978-3-642-02011-7
eBook Packages: Computer ScienceComputer Science (R0)