Improved ESP-index: A Practical Self-index for Highly Repetitive Texts | SpringerLink
Skip to main content

Improved ESP-index: A Practical Self-index for Highly Repetitive Texts

  • Conference paper
Experimental Algorithms (SEA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8504))

Included in the following conference series:

Abstract

While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.

This work was supported by JSPS KAKENHI(24700140,23680016) and the JST PRESTO program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Barbay, J., Navarro, G.: On compressing permutations and adaptive sorting. Theor. Comp. Sci. 513, 109–123 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  2. Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundam. Inform. 111, 313–337 (2010)

    MathSciNet  Google Scholar 

  3. Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 180–192. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. TALG 3, 2:1–2:19 (2007)

    Google Scholar 

  5. Delpratt, O., Rahman, N., Raman, R.: Engineering the louds succinct tree representation. In: Àlvarez, C., Serna, M. (eds.) WEA 2006. LNCS, vol. 4007, pp. 134–145. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Pardo, A., Viola, A. (eds.) LATIN 2014. LNCS, vol. 8392, pp. 731–742. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  8. Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)

    Google Scholar 

  9. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Fast q-gram mining on SLP compressed strings. JDA 18, 89–99 (2013)

    MATH  MathSciNet  Google Scholar 

  10. Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A unified algorithm for accelerating edit-distance computation via text-compression. In: STACS, pp. 529–540 (2009)

    Google Scholar 

  11. Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554 (1989)

    Google Scholar 

  12. Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. In: DCC, pp. 296–305 (1999)

    Google Scholar 

  13. Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A compressed index based on edit-sensitive parsing. JDA 18, 100–112 (2013)

    MATH  MathSciNet  Google Scholar 

  14. Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  15. Navarro, G.: Indexing text using the ziv-lempel trie. JDA 2, 87–114 (2004)

    MATH  Google Scholar 

  16. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. TALG 3 (2007)

    Google Scholar 

  17. Sakamoto, H., Maruyama, S., Kida, T., Shimozono, S.: A space-saving approximation algorithm for grammar-based compression. IEICE Trans. Inf. Syst. E92-D, 158–165 (2009)

    Google Scholar 

  18. Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster subsequence and don’t-care pattern matching on compressed texts. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 309–322. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Takabatake, Y., Tabei, Y., Sakamoto, H. (2014). Improved ESP-index: A Practical Self-index for Highly Repetitive Texts. In: Gudmundsson, J., Katajainen, J. (eds) Experimental Algorithms. SEA 2014. Lecture Notes in Computer Science, vol 8504. Springer, Cham. https://doi.org/10.1007/978-3-319-07959-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07959-2_29

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07958-5

  • Online ISBN: 978-3-319-07959-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics