Numprof: A Performance Analysis Framework for Numerical Libraries | SpringerLink
Skip to main content

Numprof: A Performance Analysis Framework for Numerical Libraries

  • Conference paper
Applied Parallel and Scientific Computing (PARA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7782))

Included in the following conference series:

  • 2581 Accesses

Abstract

This paper introduces Numprof, a profiling framework for performance analysis of numerical libraries. The framework consists of a profiler and replayer for the BLAS and FFTW3 libraries. The profiler records library call events with a user configurable amount of detail. The replayer can be used to execute library calls based on the profiling trace files generated by the profiler. We explore real-world use cases for the framework and demonstrate that due to its low overhead it is feasible to be used for continuous statistical analysis of numerical library calls.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Dongarra, J.J., Croz, J.D., Hammarling, S., Hanson, R.J.: An extended set of Fortran basic linear algebra subprograms. ACM Transactions on Mathematical Software 14, 117 (1986)

    Google Scholar 

  2. Frigo, M., Johnson, S.G.: The design and implementation of fftw3. In: Proceedings of the IEEE, pp. 216–231 (2005)

    Google Scholar 

  3. Graham, S.L., Kessler, P.B., McKusick, M.K.: gprof: a call graph execution profiler (1982)

    Google Scholar 

  4. Myers, D.S., Bazinet, A.L.: Intercepting arbitrary functions on Windows, UNIX, and Macintosh OS X platforms. Institute for Advanced Computer Studies. University of Maryland, CS-TR-4585, UMIACS-TR-2004-28 (2004)

    Google Scholar 

  5. Roth, P.C.: Characterizing the i/o behavior of scientific applications on the cray xt. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: held in Conjunction with Supercomputing 2007 (PDSW 2007), pp. 50–55. ACM, New York (2007)

    Chapter  Google Scholar 

  6. Sunderland, A., Pickles, S., Nikolic, M., Jovic, A., Jakic, J., Slavnic, V., Girotto, I., Nash, P., Lysaght, M.: An Analysis of FFT Performance in PRACE Application Codes, PRACE whitepaper (2012)

    Google Scholar 

  7. Benchmarking Single- and Multi-Core BLAS Implementations and GPUs for use with R, http://cran.r-project.org/web/packages/gcbd/vignettes/gcbd.pdf

  8. Boisvert, R.F., Boisvert, R.F., Pozo, R., Pozo, R., Remington, K.A., Remington, K.A.: The matrix market exchange formats: Initial design. NISTIR, 5935

    Google Scholar 

  9. Vetter, J.S., Mueller, F.: Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In: International Parallel and Distributed Processing Symposium (2002)

    Google Scholar 

  10. Nath, R., Tomov, S., Dongarra, J.: Accelerating GPU Kernels for Dense Linear Algebra. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 83–92. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. NVidia CUDA FFT Library, http://developer.nvidia.com/cuda/cufft

  12. NVidia CUDA BLAS Library, http://developer.nvidia.com/cublas

  13. Anderson, E., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Du Croz, J., Hammerling, S., Demmel, J., Bischof, C., Sorensen, D.: Lapack: a portable linear algebra library for high-performance computers. In: Proceedings of the 1990 ACM/IEEE Conference on Supercomputing, Supercomputing 1990, pp. 2–11. IEEE Computer Society Press, Los Alamitos (1990)

    Google Scholar 

  14. Simpson, A.D., Bull, M., Hill, J.: Identification and Categorisation of Applications and Initial Benchmarks Suite. PRACE Technical Report (2008)

    Google Scholar 

  15. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. International Journal of High Performance Computing Applications 14(3), 189–204 (2000) (Fall)

    Article  Google Scholar 

  16. Koziol, Q., Matzke, R.: HDF5 - A New Generation of HDF: Reference Manual and User’s Guide. NCSA (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lehto, OP. (2013). Numprof: A Performance Analysis Framework for Numerical Libraries. In: Manninen, P., Öster, P. (eds) Applied Parallel and Scientific Computing. PARA 2012. Lecture Notes in Computer Science, vol 7782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36803-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36803-5_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36802-8

  • Online ISBN: 978-3-642-36803-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics