Abstract
The compressed pattern matching problem is to find all occurrences of a given pattern in a compressed text. In this paper an efficient grammar-based compression algorithm is presented for the compressed pattern matching. The algorithm achieves the worst-case approximation ratio O(g *logg *logn) for the optimum grammar size g * with an input text of length n. This upper bound improves the complexity of the compressed pattern matching problem to \(O(g_*\log g_*\log m + \frac{n}{m} + m^2 + r)\) time and O(g *logg *logm + m 2) space for any pattern shorter than m and the number r of pattern occurrences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amir, A., Benson, G.: Efficient two-dimensional compressed matching. In: Proc. Data Compression Conference, p. 279 (1992)
Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern matching in Z-compressed files. J. Comput. System Sci. 52, 299–307 (1996)
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Heidelberg (1999)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., Sahai, A., Shelat, A.: Approximating the Smallest Grammar: Kolmogorov Complexity in Natural Models. In: Proc. 29th Ann. Sympo. on Theory of Computing, pp. 792–801 (2002)
Farach, M., Thorup, M.: String-matching in Lempel-Ziv compressed strings. In: 27th ACM STOC, pp. 703–713 (1995)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences, Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Kida, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage System: a Unifying Framework for Compressed Pattern Matching. Theoret. Comput. Sci. 298, 253–272
Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., Arikawa, S.: Multiple pattern matching in LZW compressed text. J. Discrete Algorithms 1(1), 133–158 (2000)
Kieffer, J.C., Yang, E.-H.: Grammar-Based Codes: a New Class of Universal Lossless Source Codes. IEEE Trans. on Inform. Theory 46(3), 737–754 (2000)
Kieffer, J.C., Yang, E.-H., Nelson, G., Cosman, P.: Universal Lossless Compression via Multilevel Pattern Matching. IEEE Trans. Inform. Theory IT-46(4), 1227–1245 (2000)
Knuth, D.: Seminumerical Algorithms, pp. 441–462. Addison-Wesley, Reading (1981)
Larsson, N.J., Moffat, A.: Offline Dictionary-Based Compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)
Lehman, E.: Approximation Algorithms for Grammar-Based Compression, PhD thesis, MIT (2002)
Lehman, E., Shelat, A.: Approximation Algorithms for Grammar-Based Compression. In: Proc. 20th Ann. ACM-SIAM Sympo. on Discrete Algorithms, pp. 205–212 (2002)
Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics and Its Applications, vol. 17. Addison-Wesley, Reading (1983)
Navarro, G., Raffinot, M.: A general practical approach to pattern matching over Ziv-Lempel compressed text. In: Crochemore, M., Paterson, M. (eds.) CPM 1999. LNCS, vol. 1645, pp. 14–36. Springer, Heidelberg (1999)
Nevill-Manning, C., Witten, I.: Compression and Explanation Using Hierarchical Grammars. Computer Journal 40(2/3), 103–116 (1997)
Nevill-Manning, C., Witten, I.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artificial Intelligence Research 7, 67–82 (1997)
Rytter, W.: Application of Lempel-Ziv Factorization to the Approximation of Grammar-Based Compression. In: Proc. 13th Ann. Sympo. Combinatorial Pattern Matching, pp. 20–31 (2002)
Sakamoto, H.: A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression. Journal of Discrete Algorithms 3, 416–430 (2005)
Sakamoto, H., Kida, T., Shimozono, S.: A Space-Saving Linear-Time Algorithm for Grammar-Based Compression. In: Proc. 11th International Symposium on String Processing and Information Retrieval, pp. 218–229 (2004)
Salomon, D.: Data compression: the complete reference, 2nd edn. Springer, Heidelberg (1998)
Storer, J., Szymanski, T.: Data compression via textual substitution. J. Assoc. Comput. Mach. 29(4), 928–951 (1982)
Storer, J.A., Szymanski, T.G.: The Macro Model for Data Compression. In: Proc. 10th Ann. Sympo. on Theory of Computing, pp. 30–39 (1978)
Welch, T.A.: A Technique for High Performance Data Compression. IEEE Comput. 17, 8–19 (1984)
Wu, S., Manber, U.: Agrep–a fast approximate pattern-matching tool. In: Usenix Winter 1992 Technical Conference, pp. 153–162 (1992)
Yang, E.-H., Kieffer, J.C.: Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform–Part One: without Context Models. IEEE Trans. on Inform. Theory 46(3), 755–777 (2000)
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Trans. on Inform. Theory IT-23(3), 337–349 (1977)
Ziv, J., Lempel, A.: Compression of Individual Sequences via Variable-Rate Coding. IEEE Trans. on Inform. Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maruyama, S., Miyagawa, H., Sakamoto, H. (2006). Improving Time and Space Complexity for Compressed Pattern Matching. In: Asano, T. (eds) Algorithms and Computation. ISAAC 2006. Lecture Notes in Computer Science, vol 4288. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940128_49
Download citation
DOI: https://doi.org/10.1007/11940128_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49694-6
Online ISBN: 978-3-540-49696-0
eBook Packages: Computer ScienceComputer Science (R0)