Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis | SpringerLink
Skip to main content

Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis

  • Conference paper
  • First Online:
High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation (PMBS 2014)

Abstract

We present preliminary results of the Roofline Toolkit for multicore, manycore, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measure sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5491
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 6864
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    GPU driver version: 331.89; CUDA toolkit version: 6.0beta.

References

  1. Babbage Testbed. https://www.nersc.gov/users/computational-systems/testbeds/babbage/

  2. Bailey, D.H., Lucas, R.F., Williams, S.W.: Performance Tuning of Scientific Applications. CRC Press, New York (2011)

    MATH  Google Scholar 

  3. Choi, J.W., Bedard, D., Fowler, R., Vuduc, R.: A roofline model of energy. In: IEEE IPDPS, May 2013

    Google Scholar 

  4. Cori Cray XC30. https://www.nersc.gov/users/computational-systems/nersc-8-system-cori/

  5. IBM Corporation: IBM system blue gene solution: Blue gene/q application development. IBM, June 2013

    Google Scholar 

  6. Intel Corporation: Intel xeon phi corprocessor system softeare developers guide. Intel, June 2012

    Google Scholar 

  7. Nvidia Corporation: Kepler gk 110: The fatest, most efficient hpc architecture ever built. Nvidia v1.0 (2012)

    Google Scholar 

  8. Nvidia Corporation: Cuda c programming guide. Nvidia PG-02819 v6.0, February 2014

    Google Scholar 

  9. Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Katherine, Y.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev. 51(1), 129–159 (2009)

    Article  Google Scholar 

  10. Dirac Testbed. http://www.nersc.gov/users/computational-systems/testbeds/dirac/

  11. Edison Cray XC30. http://www.nersc.gov/systems/edison-cray-xc30/

  12. Gyrokinetic Toroidal Code Website. http://phoenix.ps.uci.edu/GTC/

  13. Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. CoRR abs/1208.2908 (2012)

    Google Scholar 

  14. HPGMG website. http://hpgmg.org

  15. Kamil, S., Husbands, P., Oliker, L., Shalf, J., Yelick, K.: Impact of modern memory subsystems on cache optimizations for stencil computations. In: ACM MSP (2005)

    Google Scholar 

  16. LLCBench - Low Level Architectural Characterization Benchmark Suite. http://icl.cs.utk.edu/projects/llcbench/index.htm

  17. QEforge website: MiniDFT. http://qe-forge.org/gf/project/minidft/

  18. STREAM benchmark. http://www.cs.virginia.edu/stream/ref.html

  19. Williams, S.: Auto-tuning performance on multicore computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008

    Google Scholar 

  20. Williams, S., Watterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

Authors from Lawrence Berkeley National Laboratory were supported by the U.S. Department of Energy’s Advanced Scientific Computing Research Program under contract DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH11231. This research used resources of the Argonne Leadership Computing Facility, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357. This research used resources of the Oak Ridge Leadership Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Jung Lo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lo, Y.J. et al. (2015). Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis . In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2014. Lecture Notes in Computer Science(), vol 8966. Springer, Cham. https://doi.org/10.1007/978-3-319-17248-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17248-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17247-7

  • Online ISBN: 978-3-319-17248-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics