Abstract
Most regular-expression matching engines in practice are based on the Thompson construction and the Spencer matching algorithm. While these engines work fast and efficiently, a serious problem, the regular expression denial-of-service (ReDoS), has been reported recently. ReDoS is an algorithm complexity attack, which exploits the backtracking feature of the engine, and makes the service unresponsive indefinitely. Researchers suggested a few remedies to cope with the ReDoS problem, yet they are often ad-hoc or undesirable in practice. We instead propose a hybrid matching scheme that selects between the Thompson and the Spencer matching algorithms depending on the needed features. We also suggest to use the position construction for its intrinsic characteristics for fast matching. We evaluate the proposed approach using a benchmark dataset collected from various open-source projects, and compare the performance with the current approach. The experimental results show that a hybrid matcher reduces the ReDoS-vulnerability by 96% and 99.98% in full and partial matching, respectively. Moreover, 55% of the most problematic regular expressions become invulnerable to ReDoS by the position construction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Simplified.
and $ match the start and the end of lines, respectively.
matches the whitespace characters such as a space or a tab.
- 3.
- 4.
- 5.
- 6.
We have removed 359 ReDoS-vulnerable regexes in Table 1 that do not cause ReDoS behavior in both full and partial matching.
References
Berglund, M., Drewes, F., van der Merwe, B.: Analyzing catastrophic backtracking behavior in practical regular expression matching. In: Proceedings of the 14th International Conference on Automata and Formal Languages, pp. 109–123 (2014)
Berglund, M., van der Merwe, B.: Regular expressions with backreferences re-examined. In: Proceedings of the Prague Stringology Conference 2017, pp. 30–41 (2017)
Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 140(2), 229–253 (1998)
Câmpeanu, C., Salomaa, K., Yu, S.: A formal study of practical regular expressions. Int. J. Found. Comput. Sci. 14(6), 1007–1018 (2003)
Câmpeanu, C., Salomaa, K., Yu, S.: Regex and extended regex. In: Proceedings of the 7th International Conference on Implementation and Application of Automata, pp. 77–84 (2003)
Caron, P., Ziadi, D.: Characterization of Glushkov automata. Theoret. Comput. Sci. 233(1–2), 75–90 (2000)
Chapman, C., Stolee, K.T.: Exploring regular expression usage and context in Python. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, pp. 282–293 (2016)
Cheon, H., Hahn, J., Han, Y.S.: On the decidability of infix inclusion problem. In: Proceedings of the 26th International Conference on Developments in Language Theory (2022, in publishing)
Cortes, C., Mohri, M.: Learning with weighted transducers. In: Proceedings of the 7th International Workshop on Finite State Methods and Natural Language Processing, pp. 14–22 (2008)
Cox, R.: Regular expression matching in the wild (2010). http://swtch.com/~rsc/regexp/regexp3.html. Accessed 7 Apr 2022
Davis, J.C., Coghlan, C.A., Servant, F., Lee, D.: The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 246–256 (2018)
Davis, J.C., Michael IV, L.G., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 443–454 (2019)
Davis, J.C., Servant, F., Lee, D.: Using selective memoization to defeat regular expression denial of service (ReDoS). In: Proceedings of the 2021 IEEE Symposium on Security and Privacy, pp. 1–17 (2021)
Friedl, J.E.F.: Mastering Regular Expressions, 3rd edn. O’Reilly Media Inc. (2006)
Giammarresi, D., Ponty, J., Wood, D., Ziadi, D.: A characterization of Thompson digraphs. Discret. Appl. Math. 134(1–3), 317–337 (2004)
Glushkov, V.M.: The abstract theory of automata. Russ. Math. Surv. 16(5), 1–53 (1961)
Google: regexp (Go documentation). https://pkg.go.dev/regexp@go1.18. Accessed 7 Apr 2022
Kirrage, J., Rathnayake, A., Thielecke, H.: Static analysis for regular expression denial-of-service attacks. In: Proceedings of the 7th International Conference on Network and System Security, pp. 135–148 (2013)
Kleene, S.C.: Representation of events in nerve nets and finite automata. Automata Stud. 34, 3–41 (1956)
Liu, T., Sun, Y., Liu, A.X., Guo, L., Fang, B.: A prefiltering approach to regular expression matching for network security systems. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 363–380. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31284-7_22
Liu, Y., Zhang, M., Meng, W.: Revealer: detecting and exploiting regular expression denial-of-service vulnerabilities. In: Proceedings of the 2021 IEEE Symposium on Security and Privacy, pp. 1468–1484 (2021)
McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IRE Trans. Electron. Comput. EC-9(1), 39–47 (1960)
van der Merwe, B., Mouton, J., van Litsenborgh, S., Berglund, M.: Memoized regular expressions. In: Maneth, S. (ed.) CIAA 2021. LNCS, vol. 12803, pp. 39–52. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79121-6_4
Nicaud, C.: On the average size of Glushkov’s automata. In: Dediu, A.H., Ionescu, A.M., Martín-Vide, C. (eds.) LATA 2009. LNCS, vol. 5457, pp. 626–637. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00982-2_53
Rathnayake, A., Thielecke, H.: Static analysis for regular expression exponential runtime via substructural logics. arXiv preprint CoRR 1405.7058 (2014)
Schmid, M.L.: Regular expressions with backreferences: polynomial-time matching techniques. arXiv preprint CoRR 1903.05896 (2019)
Shen, Y., Jiang, Y., Xu, C., Yu, P., Ma, X., Lu, J.: ReScue: crafting regular expression DoS attacks. In: Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering, pp. 225–235 (2018)
Spencer, H.: A regular-expression matcher. In: Software Solutions in C, pp. 35–71. AP Professional (1994)
Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Weideman, N., van der Merwe, B., Berglund, M., Watson, B.: Analyzing matching time behavior of backtracking regular expression matchers by using ambiguity of NFA. In: Proceedings of the 21st International Conference on Implementation and Application of Automata, pp. 322–334 (2016)
Weidman, A., other contributers: Regular expression denial of service - ReDoS. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS. Accessed 7 Apr 2022
Wüstholz, V., Olivo, O., Heule, M.J.H., Dillig, I.: Static detection of DoS vulnerabilities in programs that use regular expressions. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 3–20. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_1
Acknowledgment
This research was supported by the NRF grant (NRF-2020R1A4A3079947) funded by MIST.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Sung, S., Cheon, H., Han, YS. (2022). How to Settle the ReDoS Problem: Back to the Classical Automata Theory. In: Caron, P., Mignot, L. (eds) Implementation and Application of Automata. CIAA 2022. Lecture Notes in Computer Science, vol 13266. Springer, Cham. https://doi.org/10.1007/978-3-031-07469-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-07469-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07468-4
Online ISBN: 978-3-031-07469-1
eBook Packages: Computer ScienceComputer Science (R0)