The Inefficiency of Genetic Programming for Symbolic Regression | SpringerLink
Skip to main content

The Inefficiency of Genetic Programming for Symbolic Regression

  • Conference paper
  • First Online:
Parallel Problem Solving from Nature – PPSN XVIII (PPSN 2024)

Abstract

We analyse the search behaviour of genetic programming (GP) for symbolic regression (SR) in search spaces that are small enough to allow exhaustive enumeration, and use an improved exhaustive symbolic regression algorithm to generate the set of semantically unique expression structures, which is orders of magnitude smaller than the original SR search space. The efficiency of GP and a hypothetical random search in this set of unique expressions is compared, whereby the efficiency is quantified via the number of function evaluations performed until a given error threshold is reached, and the percentage of unique expressions evaluated during the search after simplification to a canonical form. The results for two real-world datasets with a single input variable show that GP in such limited search space explores only a small fraction of the search space, and evaluates semantically equivalent expressions repeatedly. GP has a smaller success probability than the idealised random search for such small search spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8007
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10009
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    \({\text {powabs}}(x,y) =|x|^y\).

  2. 2.

    https://github.com/DeaglanBartlett/ESR.

  3. 3.

    https://github.com/folivetti/ppsn24_gp_esr_comparison.

  4. 4.

    https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.

  5. 5.

    These experiments were executed on a Dell G5 with a Core i7-9750H and 16 GB of RAM.

  6. 6.

    Details can be found in the supplement.

References

  1. Banzhaf, W., Hu, T., Ochoa, G.: How the combinatorics of neutral spaces leads genetic programming to discover simple solutions. In: Winkler, S., Trujillo, L., Ofria, C., Hu, T. (eds.) Genetic Programming Theory and Practice XX, pp. 65–86. Springer, Singapore (2024). https://doi.org/10.1007/978-981-99-8413-8_4

    Chapter  Google Scholar 

  2. Bartlett, D.J., Desmond, H.: Marginalised Normal Regression: unbiased curve fitting in the presence of x-errors. Open J. Astrophys. 6, 42 (2023). https://doi.org/10.21105/astro.2309.00948

  3. Bartlett, D.J., Desmond, H., Ferreira, P.G.: Exhaustive symbolic regression. IEEE Trans. Evol. Comput. 28, 950–964 (2023)

    Article  Google Scholar 

  4. Burlacu, B., Affenzeller, M., Kronberger, G., Kommenda, M.: Online diversity control in symbolic regression via a fast hash-based tree similarity measure. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 2175–2182. IEEE (2019)

    Google Scholar 

  5. Burlacu, B., Kammerer, L., Affenzeller, M., Kronberger, G.: Hash-based tree similarity and simplification in genetic programming for symbolic regression. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2019, Part I. LNCS, vol. 12013, pp. 361–369. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45093-9_44

    Chapter  Google Scholar 

  6. Burlacu, B., Kronberger, G., Kommenda, M.: Operon C++ an efficient genetic programming framework for symbolic regression. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 1562–1570 (2020)

    Google Scholar 

  7. Cao, L., Zheng, Z., Ding, C., Cai, J., Jiang, M.: Genetic programming symbolic regression with simplification-pruning operator for solving differential equations. In: Luo, B., Cheng, L., Wu, Z.G., Li, H., Li, C. (eds.) ICONIP 2023. CCIS, vol. 1962, pp. 287–298. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-8132-8_22

    Chapter  Google Scholar 

  8. Cranmer, M.: Interpretable machine learning for science with PySR and SymbolicRegression.Jl (2023). https://doi.org/10.48550/ARXIV.2305.01582

  9. Daida, J.M., Hilss, A.M.: Identifying structural mechanisms in standard genetic programming. In: Cantú-Paz, E., et al. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1639–1651. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45110-2_58

    Chapter  Google Scholar 

  10. Daida, J.M., Li, H., Tang, R., Hilss, A.M.: What makes a problem GP-Hard? Validating a hypothesis of structural causes. In: Cantú-Paz, E., et al. (eds.) GECCO 2003, Part II. LNCS, vol. 2724, pp. 1665–1677. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45110-2_60

    Chapter  Google Scholar 

  11. Desmond, H., Bartlett, D.J., Ferreira, P.G.: On the functional form of the radial acceleration relation. Monthly Not. RAS 521(2), 1817–1831 (2023). https://doi.org/10.1093/mnras/stad597

    Article  Google Scholar 

  12. Ebner, M.: On the search space of genetic programming and its relation to Nature’s search space. In: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2, pp. 1357–1361. IEEE (1999)

    Google Scholar 

  13. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-44874-8

    Book  Google Scholar 

  14. de França, F.O.: A greedy search tree heuristic for symbolic regression. Inf. Sci. 442, 18–32 (2018)

    Article  MathSciNet  Google Scholar 

  15. de França, F.O.: Transformation-interaction-rational representation for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 920–928 (2022)

    Google Scholar 

  16. de Franca, F.O., Aldeia, G.S.I.: Interaction-transformation evolutionary algorithm for symbolic regression. Evol. Comput. 29(3), 367–390 (2021)

    Article  Google Scholar 

  17. de Franca, F.O., Kronberger, G.: Reducing overparameterization of symbolic regression models with equality saturation. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1064–1072 (2023)

    Google Scholar 

  18. Guimerà, R., et al.: A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6(5) (2020). https://doi.org/10.1126/sciadv.aav6971

  19. Gustafson, S., Burke, E.K., Krasnogor, N.: On improving genetic programming for symbolic regression. In: 2005 IEEE Congress on Evolutionary Computation, vol. 1, pp. 912–919. IEEE (2005)

    Google Scholar 

  20. Halim, A.H., Ismail, I., Das, S.: Performance assessment of the metaheuristic optimization algorithms: an exhaustive review. Artif. Intell. Rev. 54(3), 2323–2409 (2021). https://doi.org/10.1007/s10462-020-09906-6

    Article  Google Scholar 

  21. Hansen, N., Auger, A., Ros, R., Mersmann, O., Tušar, T., Brockhoff, D.: COCO: a platform for comparing continuous optimizers in a black-box setting. Optim. Methods Softw. 36, 114–144 (2021). https://doi.org/10.1080/10556788.2020.1808977

    Article  MathSciNet  Google Scholar 

  22. Hu, T., Banzhaf, W.: Neutrality, robustness, and evolvability in genetic programming. In: Riolo, R., Worzel, B., Goldman, B., Tozier, B. (eds.) Genetic Programming Theory and Practice XIV. GEC, pp. 101–117. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97088-2_7

    Chapter  Google Scholar 

  23. Hu, T., Ochoa, G., Banzhaf, W.: Phenotype search trajectory networks for linear genetic programming. In: Pappa, G., Giacobini, M., Vasicek, Z. (eds.) EuroGP 2023. LNCS, vol. 13986, pp. 52–67. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-29573-7_4

    Chapter  Google Scholar 

  24. Kammerer, L., Kronberger, G., Burlacu, B., Winkler, S.M., Kommenda, M., Affenzeller, M.: Symbolic regression by exhaustive search: reducing the search space using syntactical constraints and efficient semantic structure deduplication. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds.) Genetic Programming Theory and Practice XVII. GEC, pp. 79–99. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39958-0_5

    Chapter  Google Scholar 

  25. Kartelj, A., Djukanović, M.: RILS-ROLS: robust symbolic regression via iterated local search and ordinary least squares. J. Big Data 10(1), 71 (2023)

    Article  Google Scholar 

  26. Kommenda, M., Burlacu, B., Kronberger, G., Affenzeller, M.: Parameter identification for symbolic regression using nonlinear least squares. Genet. Program Evolvable Mach. 21(3), 471–501 (2020)

    Article  Google Scholar 

  27. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    Google Scholar 

  28. Kronberger, G., Burlacu, B., Kommenda, M., Winkler, S.M., Affenzeller, M.: Symbolic Regression. Chapman & Hall / CRC Press, Boca Raton (2024)

    Google Scholar 

  29. Kronberger, G., de Franca, F.O.: Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming. J. Symb. Comput. (2024)

    Google Scholar 

  30. Langdon, W.B.: Genetic programming convergence. Genet. Program Evolvable Mach. 23(1), 71–104 (2021). https://doi.org/10.1007/s10710-021-09405-9

    Article  Google Scholar 

  31. Lelli, F., McGaugh, S.S., Schombert, J.M., Pawlowski, M.S.: One law to rule them all: the radial acceleration relation of galaxies. Astrophys. J. 836(2), 152 (2017). https://doi.org/10.3847/1538-4357/836/2/152

    Article  Google Scholar 

  32. Margossian, C.C.: A review of automatic differentiation and its efficient implementation. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9(4), e1305 (2019)

    Article  Google Scholar 

  33. McGaugh, S.S.: A tale of two paradigms: the mutual incommensurability of \({\varLambda }\)CDM and MOND. Can. J. Phys. 93(2), 250–259 (2015). https://doi.org/10.1139/cjp-2014-0203

    Article  Google Scholar 

  34. McPhee, N.F., Ohs, B., Hutchison, T.: Semantic Building Blocks in Genetic Programming. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 134–145. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78671-9_12

    Chapter  Google Scholar 

  35. Nelson, C.G.: Techniques for program verification. Stanford University (1980)

    Google Scholar 

  36. Niehaus, J., Igel, C., Banzhaf, W.: Reducing the number of fitness evaluations in graph genetic programming using a canonical graph indexed database. Evol. Comput. 15(2), 199–221 (2007)

    Article  Google Scholar 

  37. Nikuradse, J.: Laws of flow in rough pipes. Technical report, National Advisory Committee for Aeronautics Washington, NACA TM 1292 - Translation of “Strömungsgesetze in rauhen Rohren" VDI-Forschungsheft 361. Beilage zu “Forschung auf dem Gebiete des Ingenieurwesens” Ausgabe B Band 4, July/August 1933 (1950)

    Google Scholar 

  38. Randall, D.L., Townsend, T.S., Hochhalter, J.D., Bomarito, G.F.: Bingo: a customizable framework for symbolic regression with genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 2282–2288 (2022)

    Google Scholar 

  39. Reichardt, I., Pallarès, J., Sales-Pardo, M., Guimerà, R.: Bayesian machine scientist to compare data collapses for the Nikuradse dataset. Phys. Rev. Lett. 124(8) (2020). https://doi.org/10.1103/physrevlett.124.084503

  40. Rivero, D., Fernandez-Blanco, E., Pazos, A.: DoMe: a deterministic technique for equation development and symbolic regression. Expert Syst. Appl. 198, 116712 (2022)

    Article  Google Scholar 

  41. Seidyo Imai Aldeia, G., Olivetti de Franca, F., La Cava, W.G.: Inexact simplification of symbolic regression expressions with locality-sensitive hashing. arXiv e-prints pp. arXiv–2404 (2024)

    Google Scholar 

  42. Sipper, M.: Tiny genetic programming in Python (2019). https://github.com/moshesipper/tiny_gp

  43. Virgolin, M., Alderliesten, T., Witteveen, C., Bosman, P.A.: Scalable genetic programming by gene-pool optimal mixing and input-space entropy-based building-block learning. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1041–1048 (2017)

    Google Scholar 

  44. Wang, H., Vermetten, D., Ye, F., Doerr, C., Bäck, T.: IOHanalyzer: detailed performance analyses for iterative optimization heuristics. ACM Trans. Evol. Learn. Optim. 2(1) (2022). https://doi.org/10.1145/3510426

  45. Willsey, M., Nandi, C., Wang, Y.R., Flatt, O., Tatlock, Z., Panchekha, P.: Egg: fast and extensible equality saturation. Proc. ACM Program. Lang. 5(POPL), 1–29 (2021)

    Google Scholar 

  46. Worm, T., Chiu, K.: Prioritized grammar enumeration: symbolic regression by dynamic programming. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1021–1028 (2013)

    Google Scholar 

Download references

Acknowledgments

We thank Pedro Ferreira for enabling this collaboration and his ideas for improving this work, Bogdan Burlacu for his help when making changes to Operon, and Alessandro Cheli for help with the Metatheory.jl package. Gabriel Kronberger is supported by the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology, the Federal Ministry for Labour and Economy, and the regional government of Upper Austria within the COMET project ProMetHeus (904919) supported by the Austrian Research Promotion Agency (FFG). Fabricio Olivetti de Franca is supported by CNPq through the grant 301596/2022-0. Harry Desmond is supported by a Royal Society University Research Fellowship (grant no. 211046). Deaglan J. Bartlett is supported by the Simons Collaboration on “Learning the Universe.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriel Kronberger .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 521 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kronberger, G., Olivetti de Franca, F., Desmond, H., Bartlett, D.J., Kammerer, L. (2024). The Inefficiency of Genetic Programming for Symbolic Regression. In: Affenzeller, M., et al. Parallel Problem Solving from Nature – PPSN XVIII. PPSN 2024. Lecture Notes in Computer Science, vol 15148. Springer, Cham. https://doi.org/10.1007/978-3-031-70055-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70055-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70054-5

  • Online ISBN: 978-3-031-70055-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics