Abstract
There are some algorithms suited for inference of human-interpretable models for classification and regression tasks in machine learning, but it is hard to compete with Grammatical Evolution (GE) when it comes to powerfulness, model expressiveness and ease of implementation. On the other hand, algorithms that iteratively optimize a set of programs of arbitrary complexity—which is the case of GE—may take an inconceivable amount of running time when tackling complex problems. Fortunately, GE may scale to such problems by carefully harnessing the parallel processing of modern heterogeneous systems, taking advantage of traditional multi-core processors and many-core accelerators to speed up the execution by orders of magnitude. This chapter covers the subject of parallel GE, focusing on heterogeneous multi- and many-threaded decomposition in order to achieve a fully parallel implementation, where both the breeding and evaluation are parallelized. In the studied benchmarks, the overall parallel implementation runtime was 68 times faster than the sequential version, with the program evaluation kernel alone hitting an acceleration of 350 times. Details on how to efficiently accomplish that are given in the context of two well-established open standards for parallel computing: OpenMP and OpenCL. Decomposition strategies, optimization techniques and parallel benchmarks followed by analyses are presented in the chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Put simply, cache is a very fast but small memory that stores the most recently/commonly accessed data. When a requested data is not in cache, a fixed size block of data—which contains not only the requested data, but also the nearby ones—is copied from main memory into the cache, which is known as cache line or cache block [10].
- 2.
Due to accelerator device constraints, the OpenCL programming model forbids synchronization among work-groups which precludes gathering the partial errors within the evaluation kernel. Since only a fraction of partial errors remains at the end of the kernel execution, it is preferred to reduce them directly on the host processor (CPU).
- 3.
ppi is freely available at http://github.com/daaugusto/ppi.
- 4.
Note that a single generation would suffice to estimate the execution time of the tasks of the GE algorithm, however, we used two generations in order to make the timeline visualization more appealing.
- 5.
Speedup is the ratio between the execution time of the algorithm before and after improvement.
- 6.
Improvement percentage is the difference between the execution time of the algorithm before and after improvement over the execution time of the unimproved version, and multiplying by 100.
- 7.
GPop/s stands for genetic programming operations per second and is equivalent to the number of program “symbols” processed per second.
- 8.
SIMD stands for Single Instruction Multiple Data and means that a single instruction is executed on multiple data simultaneously.
- 9.
SPMD stands for Single Program Multiple Data and means that different compute units of a compute device are able to execute different instructions simultaneously from the same kernel.
References
D.A. Augusto, H.J.C. Barbosa, Symbolic regression via genetic programming, in Proceedings. Vol. 1 Sixth Brazilian Symposium on Neural Networks (2000), pp. 173–178
D.A. Augusto, H.J.C. Barbosa, Accelerated parallel genetic programming tree evaluation with OpenCL. J. Parallel Distrib. Comput. 73(1), 86–100 (2013)
W.J. Bolosky, M.L. Scott, False sharing and its effect on shared memory performance, in USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4. Sedms’93, Berkeley, CA (USENIX Association, Berkeley, CA, 1993), pp. 3–3
R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, R. Menon, Parallel Programming in OpenMP (Morgan Kaufmann, San Francisco, CA, 2001)
L. Deng, D. Yu, Deep learning: methods and applications. Technical Report (2014)
S.S. Haykin, Neural Networks and Learning Machines, 3rd edn. (Pearson Education, Upper Saddle River, NJ, 2009)
W.D. Hillis, G.L. Steele Jr., Data parallel algorithms. Commun. ACM 29, 1170–1183 (1986)
C. Iancu, S. Hofmeyr, F. Blagojević, Y. Zheng, Oversubscription on multicore processors, in 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS) (April 2010), pp. 1–11
P. Jääskeläinen, C.S. de La Lama, E. Schnetter, K. Raiskila, J. Takala, H. Berg, pocl: A performance-portable OpenCL implementation. Int. J. Parallel Program. 43(5), 752–785 (2015)
B. Jacob, S. Ng, D. Wang, Memory Systems: Cache, DRAM, Disk (Morgan Kaufmann, San Francisco, CA, 2007)
D. Kaeli, P. Mistry, D. Schaa, D.P. Zhang (eds.), Heterogeneous Computing with OpenCL 2.0, 3rd edn. (Morgan Kaufmann, Boston, 2015)
C. Lameter, Numa (non-uniform memory access): an overview. Queue 11(7), 40:40–40:51 (2013)
B.W. Lampson, Lazy and speculative execution in computer systems, in OPODIS, ed. by A. A. Shvartsman, vol. 4305. Lecture Notes in Computer Science, pp. 1–2 (Springer, Berlin, 2006)
P. Pospichal, E. Murphy, M. O’Neill, J. Schwarz, J. Jaros, Acceleration of grammatical evolution using graphics processing units: computational intelligence on consumer games and graphics hardware, in Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO ’11, New York, NY (ACM, New York, 2011), pp. 431–438
D. Robilliard, V. Marion-Poty, C. Fonlupt, Genetic programming on graphics processing units. Genet. Program Evolvable Mach. 10(4), 447 (2009)
I.L.S. Russo, H.S. Bernardino, H.J.C. Barbosa, A massively parallel grammatical evolution technique with OpenCL. J. Parallel Distrib. Comput. 109, 333–349 (2017)
J. Shen, J. Fang, H.J. Sips, A.L. Varbanescu, Performance traps in OpenCL for CPUs, in PDP (IEEE Computer Society, 2013), pp. 38–45
J.E. Smith, A study of branch prediction strategies, in Proceedings of the 8th Annual Symposium on Computer Architecture, ISCA ’81, Los Alamitos, CA (IEEE Computer Society Press, Los Alamitos, CA, 1981), pp. 135–148
Acknowledgements
The authors would like to thank the support provided by CNPq (grants 310778/2013-1, 502836/2014-8 and 300458/2017-7), FAPEMIG (grant APQ-03414-15), EU H2020 Programme and MCTI/RNP–Brazil under the HPC4E Project (grant 689772).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Dufek, A.S., Augusto, D.A., Barbosa, H.J.C., da Silva Dias, P.L. (2018). Multi- and Many-Threaded Heterogeneous Parallel Grammatical Evolution. In: Ryan, C., O'Neill, M., Collins, J. (eds) Handbook of Grammatical Evolution. Springer, Cham. https://doi.org/10.1007/978-3-319-78717-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-78717-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78716-9
Online ISBN: 978-3-319-78717-6
eBook Packages: Computer ScienceComputer Science (R0)