Abstract
Suffix tree is a widely studied data structure with a range of applications. Although originally developed for string matching algorithms, it has proved to be useful for implementation of various lossless data compression methods. This interesting fact inspired us to explore the possibilities of creating new data compression methods, based entirely on properties of suffix trees. Resulting from a thorough study of existing suffix tree construction algorithms and their modifications, we propose and evaluate several variants of suffix tree based compression.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Research Report 124, Digital Systems Research Center, Palo Alto, California, USA (1994)
Cleary, J.G., Witten, I.H.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications COM-32, 396–402 (1984)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23, 337–343 (1977)
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the Association for Computing Machinery 23, 262–272 (1976)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of the 38th Annual IEEE Symposium on Foundations of Computer Science, pp. 137–143 (1997)
Fiala, E.R., Greene, D.H.: Data compression with finite windows. Communications of the Association for Computing Machinery 32, 490–505 (1989)
Larsson, N.J.: Structures of String Matching and Data Compression. PhD thesis, Department of Computer Science, Lund University, Sweden (1999)
Harary, F.: Graph Theory. Addison-Wesley, New York (1969)
Balkenhol, B., Kurtz, S., Shtarkov, Y.M.: Modifications of the Burrows and Wheeler data compression algorithm. In: Storer, J., Cohn, M. (eds.) IEEE Data Compression Conference, pp. 188–197 (1999)
Hoang, D.T., Long, P.M., Vitter, J.S.: Dictionary selection using partial matching. Information Sciences 119, 57–72 (1999)
Bloom, C.R.: LZP: A new data compression algorithm. In: Storer, J., Cohn, M. (eds.) IEEE Data Compression Conference, p. 425 (1996), http://www.cbloom.com/src/index_lz.html
Stuiver, L., Moffat, A.: Piecewise integer mapping for arithmetic coding. In: Storer, J., Cohn, M. (eds.) IEEE Data Compression Conference, pp. 3–12 (1998)
Schindler, M.: Byte oriented arithmetic coding. In: Storer, J., Cohn, M. (eds.) IEEE Data Compression Conference (1998) (Poster presentation)
Moffat, A., Neal, R.M., Witten, I.H.: Arithmetic coding revisited. ACM Transactions on Information Systems 16, 256–294 (1998)
Bell, T., Witten, I.H., Cleary, J.G.: Modelling for text compression. ACM Computing Surveys 21, 557–591 (1989)
Senft, M.: Lossless data compression using suffix trees. Master’s thesis, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic (2003) (in czech)
Seward, J.: BZip2, http://sources.redhat.com/bzip2/
Willems, F.M.J., Shtarkov, Y.M., Tjalkens, T.J.: The context-tree weighting method: Basic properties. IEEE Transactions on Information Theory IT-41, 653–664 (1995)
Franken, E., Peeters, M., Tjalkens, T.J.: CTW 0.1b2, http://www.ele.tue.nl/ctw
Gailly, J.L.: GZip, http://www.gzip.org/
Arnold, R., Bell, T.: A corpus for the evaluation of lossless compression algorithms. In: Storer, J., Cohn, M. (eds.) IEEE Data Compression Conference, pp. 201–210 (1997), http://corpus.canterbury.ac.cz/
Torvalds, L., et al.: Linux, ftp://ftp.cz.kernel.org/pub/linux/
Shkarin, D.: PPM: One step to practicality. In: Storer, J., Cohn, M. (eds.) IEEE Data Compression Conference, pp. 202–211 (2002)
Inenaga, S., et al.: On-line construction of compact directed acyclic word graph. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 169–180. Springer, Heidelberg (2001)
Inenaga, S., Shinohara, A., Takeda, M., Arikawa, S.: Compact directed acyclic word graphs for a sliding window. Journal of Discrete Algorithms 2, 25–51 (2004) (special issue for SPIRE 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Senft, M. (2005). Suffix Tree Based Data Compression. In: Vojtáš, P., Bieliková, M., Charron-Bost, B., Sýkora, O. (eds) SOFSEM 2005: Theory and Practice of Computer Science. SOFSEM 2005. Lecture Notes in Computer Science, vol 3381. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30577-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-30577-4_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24302-1
Online ISBN: 978-3-540-30577-4
eBook Packages: Computer ScienceComputer Science (R0)