Performance and energy impact of parallelization and vectorization techniques in modern microprocessors | Computing Skip to main content

Advertisement

Log in

Performance and energy impact of parallelization and vectorization techniques in modern microprocessors

  • Published:
Computing Aims and scope Submit manuscript

Abstract

While Moore’s law states that the number of transistors is approximately doubled every 2 years, powering these transistors simultaneously is only possible as long as Dennard scaling continues. Unfortunately, voltage scaling has slowed down in recent years, and microprocessor designers have hit what is known as the “utilization wall” or the “dark silicon” effect. Vectorization, parallelization, specialization and heterogeneity are the key approaches to deal with this utilization wall. However, how software developers can maximize energy efficiency of these architectures remains an open question. This paper presents an energy evaluation of parallelization using both physical and logical cores (i.e., SMT/Hyper-Threading), vectorization (SSE, Advanced Vector Extensions and NEON) and dynamic core reconfiguration [\(\hbox {Intel}^{\circledR }\)’s Turbo Boost Technology (TBT)]. The evaluation spans microprocessors for embedded, laptop, desktop and server markets, since there is a convergence among them towards energy efficiency. The analyzed processors include Intel’s Core\(^\mathrm{TM}\) i5 and i7 family and ARM\(^{\circledR }\)’s Cortex\(^\mathrm{TM}\) A9 and A15. Results show that software developers should prioritize vectorization over thread parallelism when possible, as it yields better energy efficiency, especially on the Intel platforms. Application scalability can be reduced drastically when using vectorization and threading simultaneously since vectorization increases pressure on the memory subsystem. Intel’s TBT further improves energy efficiency by an additional 10–20 % depending on the number of active threads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Anzt H, Castillo M, Fernández J, Heuveline V, Igual F, Mayo R, Quintana-Ortí E (2011) Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors. Computer Science: Research and Development, pp 1–9

  2. Association ESI, Information JE, Association TI, Association KSI, Association TSI, Association SI (2012) International Technology Roadmap for Semiconductors report. http://www.itrs.net/Links/2012ITRS/Home2012.htm

  3. Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University, Princeton

  4. Chandrakasan A, Brodersen R (1998) Low power CMOS design. IEEE Press, New York

    Book  Google Scholar 

  5. Chandrakasan AP, Bowhill WJ, Fox F (2000) Design of high-performance microprocessor circuits, 1st edn. Wiley-IEEE Press, New York

    Book  Google Scholar 

  6. Dennard R, Gaensslen F, Rideout V, Bassous E, LeBlanc A (1974) Design of ion-implanted mosfet’s with very small physical dimensions. Solid-State Circuits IEEE J 9(5):256–268. doi:10.1109/JSSC.1974.1050511

    Google Scholar 

  7. Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: 17th international conference on architectural support for programming languages and operating systems (ASPLOS). Recognized as best paper by the program committee

  8. Ghose S, Srinath S, Tse J (2011) Accelerating a PARSEC benchmark using portable subword SIMD. In: CS 5220: final Project, report

  9. Intel (2008) White paper: Intel Turbo Boost Technology in Intel Core microarchitecture (Nehalem) based processors

  10. Intel (2011) Avoiding AVX–SSE transition penalties

  11. Intel (2011) Intel64 and IA-32 architectures optimization reference manual

  12. Intel (2012) Intel64 and IA-32 architecture software development manual

  13. Kim C, Satish N, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Technical report: closing the ninja performance gap through traditional programming and compiler technology

  14. Li J, Martínez JF (2005) Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans Archit Code Optim 2(4):397–422. doi:10.1145/1113841.1113844

    Article  Google Scholar 

  15. Macken P, Degrauwe M, Van Paemel M, Oguey H (1990) A voltage reduction technique for digital systems. In: Proceedings of the 37th IEEE international solid-state circuits conference. Digest of technical papers, pp 238–239. doi:10.1109/ISSCC.1990.110213

  16. Molka D, Hackenberg D, Schöne R, Minartz T, Nagel W (2011) Flexible workload generation for HPC cluster efficiency benchmarking. Computer Science: Research and Development, pp 1–9. doi:10.1007/s00450-011-0194-9

  17. Mucci PJ, Browne S, Deane C, Ho G (1999) PAPI: a portable interface to hardware performance counters. In: Proceedings of the department of defense HPCMP users group conference, pp 7–10

  18. Sazeides Y, Kumar R, Tullsen DM, Constantinou T (2005) The danger of interval-based power efficiency metrics: When worst is best. In: Computer architecture letters, vol 4

  19. Simunic T, Benini L, Acquaviva A, Glynn P, de Micheli G (2001) Dynamic voltage scaling and power management for portable systems. In: Proceedings on design automation conference, pp 524–529. doi:10.1109/DAC.2001.156195

  20. Totoni E, Behzad B, Ghike S, Torrellas J (2012) Comparing the power and performance of Intel’s SCC to state-of-the-art CPUs and GPUs. In: IEEE international symposium on performance analysis of systems and software, vol 0, pp 78–87. doi:10.1109/ISPASS.2012.6189208

Download references

Acknowledgments

The authors gratefully acknowledge the support of the PRACE 2IP project, the NOTUR project, and the HiPEAC Network of Excellence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan M. Cebrián.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cebrián, J.M., Natvig, L. & Meyer, J.C. Performance and energy impact of parallelization and vectorization techniques in modern microprocessors. Computing 96, 1179–1193 (2014). https://doi.org/10.1007/s00607-013-0366-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-013-0366-5

Keywords

Mathematics Subject Classification

Navigation