Abstract
While Moore’s law states that the number of transistors is approximately doubled every 2 years, powering these transistors simultaneously is only possible as long as Dennard scaling continues. Unfortunately, voltage scaling has slowed down in recent years, and microprocessor designers have hit what is known as the “utilization wall” or the “dark silicon” effect. Vectorization, parallelization, specialization and heterogeneity are the key approaches to deal with this utilization wall. However, how software developers can maximize energy efficiency of these architectures remains an open question. This paper presents an energy evaluation of parallelization using both physical and logical cores (i.e., SMT/Hyper-Threading), vectorization (SSE, Advanced Vector Extensions and NEON) and dynamic core reconfiguration [\(\hbox {Intel}^{\circledR }\)’s Turbo Boost Technology (TBT)]. The evaluation spans microprocessors for embedded, laptop, desktop and server markets, since there is a convergence among them towards energy efficiency. The analyzed processors include Intel’s Core\(^\mathrm{TM}\) i5 and i7 family and ARM\(^{\circledR }\)’s Cortex\(^\mathrm{TM}\) A9 and A15. Results show that software developers should prioritize vectorization over thread parallelism when possible, as it yields better energy efficiency, especially on the Intel platforms. Application scalability can be reduced drastically when using vectorization and threading simultaneously since vectorization increases pressure on the memory subsystem. Intel’s TBT further improves energy efficiency by an additional 10–20 % depending on the number of active threads.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anzt H, Castillo M, Fernández J, Heuveline V, Igual F, Mayo R, Quintana-Ortí E (2011) Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors. Computer Science: Research and Development, pp 1–9
Association ESI, Information JE, Association TI, Association KSI, Association TSI, Association SI (2012) International Technology Roadmap for Semiconductors report. http://www.itrs.net/Links/2012ITRS/Home2012.htm
Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University, Princeton
Chandrakasan A, Brodersen R (1998) Low power CMOS design. IEEE Press, New York
Chandrakasan AP, Bowhill WJ, Fox F (2000) Design of high-performance microprocessor circuits, 1st edn. Wiley-IEEE Press, New York
Dennard R, Gaensslen F, Rideout V, Bassous E, LeBlanc A (1974) Design of ion-implanted mosfet’s with very small physical dimensions. Solid-State Circuits IEEE J 9(5):256–268. doi:10.1109/JSSC.1974.1050511
Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: 17th international conference on architectural support for programming languages and operating systems (ASPLOS). Recognized as best paper by the program committee
Ghose S, Srinath S, Tse J (2011) Accelerating a PARSEC benchmark using portable subword SIMD. In: CS 5220: final Project, report
Intel (2008) White paper: Intel Turbo Boost Technology in Intel Core microarchitecture (Nehalem) based processors
Intel (2011) Avoiding AVX–SSE transition penalties
Intel (2011) Intel64 and IA-32 architectures optimization reference manual
Intel (2012) Intel64 and IA-32 architecture software development manual
Kim C, Satish N, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Technical report: closing the ninja performance gap through traditional programming and compiler technology
Li J, Martínez JF (2005) Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans Archit Code Optim 2(4):397–422. doi:10.1145/1113841.1113844
Macken P, Degrauwe M, Van Paemel M, Oguey H (1990) A voltage reduction technique for digital systems. In: Proceedings of the 37th IEEE international solid-state circuits conference. Digest of technical papers, pp 238–239. doi:10.1109/ISSCC.1990.110213
Molka D, Hackenberg D, Schöne R, Minartz T, Nagel W (2011) Flexible workload generation for HPC cluster efficiency benchmarking. Computer Science: Research and Development, pp 1–9. doi:10.1007/s00450-011-0194-9
Mucci PJ, Browne S, Deane C, Ho G (1999) PAPI: a portable interface to hardware performance counters. In: Proceedings of the department of defense HPCMP users group conference, pp 7–10
Sazeides Y, Kumar R, Tullsen DM, Constantinou T (2005) The danger of interval-based power efficiency metrics: When worst is best. In: Computer architecture letters, vol 4
Simunic T, Benini L, Acquaviva A, Glynn P, de Micheli G (2001) Dynamic voltage scaling and power management for portable systems. In: Proceedings on design automation conference, pp 524–529. doi:10.1109/DAC.2001.156195
Totoni E, Behzad B, Ghike S, Torrellas J (2012) Comparing the power and performance of Intel’s SCC to state-of-the-art CPUs and GPUs. In: IEEE international symposium on performance analysis of systems and software, vol 0, pp 78–87. doi:10.1109/ISPASS.2012.6189208
Acknowledgments
The authors gratefully acknowledge the support of the PRACE 2IP project, the NOTUR project, and the HiPEAC Network of Excellence.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cebrián, J.M., Natvig, L. & Meyer, J.C. Performance and energy impact of parallelization and vectorization techniques in modern microprocessors. Computing 96, 1179–1193 (2014). https://doi.org/10.1007/s00607-013-0366-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-013-0366-5