On the Expressive Power of Regular Expressions with Backreferences

On the Expressive Power of Regular Expressions with Backreferences

Authors Taisei Nogami, Tachio Terauchi



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2023.71.pdf
  • Filesize: 0.79 MB
  • 15 pages

Document Identifiers

Author Details

Taisei Nogami
  • Waseda University, Tokyo, Japan
Tachio Terauchi
  • Waseda University, Tokyo, Japan

Cite As Get BibTex

Taisei Nogami and Tachio Terauchi. On the Expressive Power of Regular Expressions with Backreferences. In 48th International Symposium on Mathematical Foundations of Computer Science (MFCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 272, pp. 71:1-71:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.MFCS.2023.71

Abstract

A rewb is a regular expression extended with a feature called backreference. It is broadly known that backreference is a practical extension of regular expressions, and is supported by most modern regular expression engines, such as those in the standard libraries of Java, Python, and more. Meanwhile, indexed languages are the languages generated by indexed grammars, a formal grammar class proposed by A.V.Aho. We show that these two models' expressive powers are related in the following way: every language described by a rewb is an indexed language. As the smallest formal grammar class previously known to contain rewbs is the class of context sensitive languages, our result strictly improves the known upper-bound. Moreover, we prove the following two claims: there exists a rewb whose language does not belong to the class of stack languages, which is a proper subclass of indexed languages, and the language described by a rewb without a captured reference is in the class of nonerasing stack languages, which is a proper subclass of stack languages. Finally, we show that the hierarchy investigated in a prior study, which separates the expressive power of rewbs by the notion of nested levels, is within the class of nonerasing stack languages.

Subject Classification

ACM Subject Classification
  • Theory of computation → Formal languages and automata theory
Keywords
  • Regular expressions
  • Backreferences
  • Expressive power

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V Aho. Indexed grammars—an extension of context-free grammars. Journal of the ACM (JACM), 15(4):647-671, 1968. Google Scholar
  2. Alfred V Aho. Nested stack automata. Journal of the ACM (JACM), 16(3):383-406, 1969. Google Scholar
  3. Alfred V. Aho. Algorithms for finding patterns in strings, pages 255-300. MIT Press, Cambridge, MA, USA, 1991. Google Scholar
  4. Martin Berglund and Brink van der Merwe. Re-examining regular expressions with backreferences. Theoretical Computer Science, 940:66-80, 2023. Google Scholar
  5. Cezar Câmpeanu, Kai Salomaa, and Sheng Yu. A formal study of practical regular expressions. International Journal of Foundations of Computer Science, 14(06):1007-1018, 2003. Google Scholar
  6. Dominik D Freydenberger and Markus L Schmid. Deterministic regular expressions with back-references. Journal of Computer and System Sciences, 105:1-39, 2019. Google Scholar
  7. Seymour Ginsburg, Sheila A Greibach, and Michael A Harrison. One-way stack automata. Journal of the ACM (JACM), 14(2):389-418, 1967. Google Scholar
  8. Seymour Ginsburg, Sheila A Greibach, and Michael A Harrison. Stack automata and compiling. Journal of the ACM (JACM), 14(1):172-201, 1967. Google Scholar
  9. Takeshi Hayashi. On derivation trees of indexed grammars - An extension of the uvwxy-theorem. Publications of the Research Institute for Mathematical Sciences, 9(1):61-92, 1973. Google Scholar
  10. John E Hopcroft, Rajeev Motwani, and Jeffrey D Ullman. Introduction to automata theory, languages, and computation, 2nd Edition. Addison-Wesley, 2001. Google Scholar
  11. John E. Hopcroft and Jeffrey D. Ullman. Nonerasing stack automata. Journal of Computer and System Sciences, 1(2):166-186, 1967. Google Scholar
  12. Kim S Larsen. Regular expressions with nested levels of back referencing form a hierarchy. Information Processing Letters, 65(4):169-172, 1998. Google Scholar
  13. Taisei Nogami and Tachio Terauchi. On the expressive power of regular expressions with backreferences, July, 2023. URL: https://arxiv.org/abs/2307.08531.
  14. William F Ogden. Intercalation theorems for stack languages. In Proceedings of the first annual ACM symposium on Theory of computing, pages 31-42, 1969. Google Scholar
  15. Markus L Schmid. Characterising regex languages by regular languages equipped with factor-referencing. Information and Computation, 249:1-17, 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail