Abstract
Modern programming languages often provide functions to manipulate regular expressions in standard libraries. If they offer support for advanced features, the matching algorithm has an exponential worst-case time complexity: for some so-called vulnerable regular expressions, an attacker can craft ad hoc strings to force the matcher to exhibit an exponential behaviour and perform a Regular Expression Denial of Service (ReDoS) attack. In this paper, we introduce a framework based on a tree semantics to statically identify ReDoS vulnerabilities. In particular, we put forward an algorithm to extract an overapproximation of the set of words that are dangerous for a regular expression, effectively catching all possible attacks. We have implemented the analysis in a tool called rat, and testing it on a dataset of 74,670 regular expressions, we observed that in 99.47% of the instances the analysis terminates in less than one second. We compared rat to four other ReDoS detectors, and we found that our tool is faster, often by orders of magnitude, than most other tools. While raising a low number of false positives, rat is the only ReDoS detector that does not report false negatives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
By choice we mean taking the left/right branch of an alternative or expanding/not expanding a star.
References
The PyPI packet manager. https://pypi.org/. Accessed 09 May 2022
Regexlib database. https://regexlib.com/. Accessed 09 May 2022
Stack overflow outage postmortem (2016). https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016. Accessed 09 May 2022
Cloudflare’s outage postmortem (2019). https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/. Accessed 09 May 2022
National vulnerability database: CVE-2020-3899 (2020). https://nvd.nist.gov/vuln/detail/CVE-2020-3899. Accessed 09 May 2022
The snort database (2020). http://www.snort.org/. Accessed 09 May 2022
Allauzen, C., Mohri, M., Rastogi, A.: General algorithms for testing the ambiguity of finite automata and the double-tape ambiguity of finite-state transducers. Int. J. Found. Comput. Sci. 22(04), 883–904 (2011). https://doi.org/10.1142/s0129054111008477
Becchi, M., Cadambi, S.: Memory-efficient regular expression search using state merging. In: Joint Conference of the IEEE Computer and Communications Societies, INFOCOM, pp. 1064–1072 (2007). https://doi.org/10.1109/INFCOM.2007.128
Berglund, M., Drewes, F., van der Merwe, B.: Analyzing catastrophic backtracking behavior in practical regular expression matching. In: Automata and Formal Languages, AFL. EPTCS, vol. 151, pp. 109–123 (2014). https://doi.org/10.4204/EPTCS.151.7
Berglund, M., van der Merwe, B.: On the semantics of regular expression parsing in the wild. Theor. Comput. Sci. 679, 69–82 (2017). https://doi.org/10.1016/j.tcs.2016.09.006
Chapman, C., Stolee, K.T.: Exploring regular expression usage and context in Python. In: International Symposium on Software Testing and Analysis, ISSTA, pp. 282–293. ACM (2016). https://doi.org/10.1145/2931037.2931073
Cody-Kenny, B., Fenton, M., Ronayne, A., Considine, E., McGuire, T., O’Neill, M.: A search for improved performance in regular expressions. In: Genetic and Evolutionary Computation Conference, GECCO, pp. 1280–1287 (2017). https://doi.org/10.1145/3071178.3071196
Crosby, S.A., Wallach, D.S.: Denial of service via algorithmic complexity attacks. In: USENIX Security Symposium. USENIX Association (2003). https://doi.org/10.1007/11506881_10
Davis, J.C., Servant, F., Lee, D.: Using selective memoization to defeat regular expression denial of service (ReDoS). In: IEEE Symposium on Security and Privacy, SP, pp. 543–559. IEEE Computer Society (2021). https://doi.org/10.1109/SP40001.2021.00032
Davis, J.C., Coghlan, C.A., Servant, F., Lee, D.: The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE, pp. 246–256. ACM (2018). https://doi.org/10.1145/3236024.3236027
Friedl, J.E.F.: Mastering Regular Expressions - Understand Your Data and Be More Productive: For Perl, PHP, Java,NET, Ruby, and More, 3rd edn. O’Reilly (2006). https://www.oreilly.com/library/view/mastering-regular-expressions/0596528124/
Glushkov, V.M.: The abstract theory of automata. Russ. Math. Surv. 16(5), 1 (1961)
Michael, L.G., Donohue, J., Davis, J.C., Lee, D., Servant, F.: Regexes are hard: decision-making, difficulties, and risks in programming regular expressions. In: International Conference on Automated Software Engineering, ASE, pp. 415–426. IEEE (2019). https://doi.org/10.1109/ASE.2019.00047
Kirrage, J., Rathnayake, A., Thielecke, H.: Static analysis for regular expression denial-of-service attacks. In: Lopez, J., Huang, X., Sandhu, R. (eds.) NSS 2013. LNCS, vol. 7873, pp. 135–148. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38631-2_11
Li, Y., et al.: FlashRegex: deducing anti-redos regexes from examples. In: International Conference on Automated Software Engineering, ASE 2020, pp. 659–671 (2020). https://doi.org/10.1145/3324884.3416556
Lin, C., Liu, C., Chang, S.: Accelerating regular expression matching using hierarchical parallel machines on GPU. In: Global Communications Conference, GLOBECOM, pp. 1–5 (2011). https://doi.org/10.1109/GLOCOM.2011.6133663
López, F., Romero, V.: Mastering Python Regular Expressions. Packt Publishing Ltd. (2014). https://www.packtpub.com/product/mastering-python-regular-expressions/9781783283156
Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives re-examined. J. Funct. Program. 19(2), 173–190 (2009). https://doi.org/10.1017/s0956796808007090
Pan, R., Hu, Q., Xu, G., D’Antoni, L.: Automatic repair of regular expressions. Proc. ACM Program. Lang. 3(OOPSLA), 139:1–139:29 (2019). https://doi.org/10.1145/3360565
Parolini, F., Miné, A.: RAT - ReDoS Abstract Tester (2022). https://github.com/parof/rat
Petsios, T., Zhao, J., Keromytis, A.D., Jana, S.: SlowFuzz: automated domain-independent detection of algorithmic complexity vulnerabilities. In: Conference on Computer and Communications Security, CCS, pp. 2155–2168. ACM (2017). https://doi.org/10.1145/3133956.3134073
Rathnayake, A.: Semantics, analysis and security of backtracking regular expression matchers. Ph.D. thesis, University of Birmingham, UK (2015). http://etheses.bham.ac.uk/6011/
Rathnayake, A., Thielecke, H.: Static analysis for regular expression exponential runtime via substructural logics. CoRR abs/1405.7058 (2014)
Shen, Y., Jiang, Y., Xu, C., Yu, P., Ma, X., Lu, J.: ReScue: crafting regular expression DoS attacks. In: International Conference on Automated Software Engineering, ASE, pp. 225–235. ACM (2018). https://doi.org/10.1145/3238147.3238159
Staicu, C., Pradel, M.: Freezing the web: a study of ReDoS vulnerabilities in JavaScript-based web servers. In: USENIX Security Symposium, pp. 361–376. USENIX Association (2018)
Weber, A., Seidl, H.: On the degree of ambiguity of finite automata. Theor. Comput. Sci. 88(2), 325–349 (1991). https://doi.org/10.1016/0304-3975(91)90381-B
Weideman, N., van der Merwe, B., Berglund, M., Watson, B.: Analyzing matching time behavior of backtracking regular expression matchers by using ambiguity of NFA. In: Han, Y.-S., Salomaa, K. (eds.) CIAA 2016. LNCS, vol. 9705, pp. 322–334. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40946-7_27
Wüstholz, V., Olivo, O., Heule, M.J.H., Dillig, I.: Static detection of DoS vulnerabilities in programs that use regular expressions. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 3–20. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_1
Yu, X., Becchi, M.: GPU acceleration of regular expression matching for large datasets: exploring the implementation space. In: Computing Frontiers Conference, CF, pp. 18:1–18:10 (2013). https://doi.org/10.1145/2482767.2482791
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Parolini, F., Miné, A. (2022). Sound Static Analysis of Regular Expressions for Vulnerabilities to Denial of Service Attacks. In: Aït-Ameur, Y., Crăciun, F. (eds) Theoretical Aspects of Software Engineering. TASE 2022. Lecture Notes in Computer Science, vol 13299. Springer, Cham. https://doi.org/10.1007/978-3-031-10363-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-10363-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10362-9
Online ISBN: 978-3-031-10363-6
eBook Packages: Computer ScienceComputer Science (R0)