Towards Reversible Basic Linear Algebra Subprograms: A Performance Study

Perumalla, Kalyan S.; Yoginath, Srikanth B.

doi:10.1007/978-3-662-45711-5_4

Kalyan S. Perumalla¹⁹ &
Srikanth B. Yoginath¹⁹

Part of the book series: Lecture Notes in Computer Science ((TCOMPUTATSCIE,volume 8911))

540 Accesses
11 Altmetric

Abstract

Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications. Making applications reversible by relying on computation rather than on memory is ideal for large scale parallel computing, especially for the next generation of supercomputers in which memory is expensive in terms of latency, energy, and price. In this direction, a case study is presented here in reversing a computational core, namely, Basic Linear Algebra Subprograms (BLAS), which is widely used in scientific applications. A new Reversible BLAS (RBLAS) library interface has been designed, and a prototype has been implemented with two modes: (1) a memory-mode in which reversibility is obtained by checkpointing to memory, and (2) a computational-mode in which nothing is saved, and restoration is done entirely via inverse computation. The article is focused on detailed performance benchmarking to evaluate the runtime dynamics and performance effects, comparing reversible computation with checkpointing on both traditional CPU platforms and recent GPU accelerator platforms. For BLAS Level-1 subprograms, data indicates over an order of magnitude speed up of reversible computation compared to checkpointing. For BLAS Level-2 and Level-3, a more complex tradeoff is observed between reversible computation and checkpointing, depending on computational and memory complexities of the subprograms.

This paper has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Dept. of Energy. Accordingly, the U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. Government purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LC-MEMENTO: A Memory Model for Accelerated Architectures

Supporting Data Shuffle Between Threads in OpenMP

Low-Overhead Fault-Tolerance Support Using DISC Programming Model

Notes

1.
The overlap in conventional terminology of “L1” and “L2” between BLAS levels and cache levels is unfortunately unavoidable.

References

Cublas: Common Unified Data Architecture Basic Linear Algebra Subprograms (2012). http://developer.nvidia.com/cublas
Acml: Advanced micro devices core math library (2013). http://developer.amd.com
Barnes, P., Carothers, C., Jefferson, D., LaPre, J.: Warp speed: executing time warp on 1,966,080 cores. In: Proceedings of the ACM SIGSIM Principles of Advanced Discrete Simulation (2013)
Google Scholar
Besseron, X., Gautier, T.: Impact of over-decomposition on coordinated checkpoint/rollback protocol. In: Alexander, M., D’Ambra, P., Belloum, A., Bosilca, G., Cannataro, M., Danelutto, M., Di Martino, B., Gerndt, M., Jeannot, E., Namyst, R., Roman, J., Scott, S.L., Traff, J.L., Vallée, G., Weidendorfer, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 322–332. Springer, Heidelberg (2012)
Chapter Google Scholar
Bessho, N., Dohi, T.: Comparing checkpoint and rollback recovery schemes in a cluster system. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds.) ICA3PP 2012, Part I. LNCS, vol. 7439, pp. 531–545. Springer, Heidelberg (2012)
Chapter Google Scholar
Dongarra, J., Duff, I., DuCroz, J., Hammarling, S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1–28 (1990)
Article MATH Google Scholar
Frank, M.: Introduction to reversible computing: motivation, progress, and challenges. In: International Workshop on Reversible Computing (Special Session at ACM Computing Frontiers) (2005)
Google Scholar
Goto, K., Van de Geijn, R.: High-performance implementation of the level-3 blas. ACM Trans. Math. Softw. 35(1), 4:1–4:14 (2008)
Article Google Scholar
He, Y., Ding, C.: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. Springer J. Supercomput. 18, 259–277 (2001)
Article MATH Google Scholar
Lawson, C., Hanson, R., Kincaid, D., Krogh, F.: Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Softw. 5, 308–325 (1979)
Article MATH Google Scholar
Li, X., Demmel, J., Baile, D., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kapur, A., Martin, M.C., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision blas. ACM Trans. Math. Softw. 28(2), 206–238 (2002)
Article Google Scholar
Perumalla, K., Park, A.: Reverse computation for rollback-based fault tolerance in large parallel systems. Cluster Comput. 17(2), 303–313 (2014)
Article Google Scholar
Perumalla, K., Park, A., Tipparaju, V.: Discrete event execution with one-sided and two-sided GVT algorithms on 216,000 processor cores. ACM Trans. Model. Comput. Simul. 24(3), 16:1–16:25 (2014)
Article Google Scholar
Perumalla, K.S.: Introduction to Reversible Computing. Computational Science Series. Chapman Hall/CRC Press, Boca Raton (2013). ISBN: 978-1439873403
Google Scholar

Download references

Author information

Authors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, TN, 37831-6085, USA
Kalyan S. Perumalla & Srikanth B. Yoginath

Authors

Kalyan S. Perumalla
View author publications
You can also search for this author in PubMed Google Scholar
Srikanth B. Yoginath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kalyan S. Perumalla .

Editor information

Editors and Affiliations

University of Calgary, Calgary, Alberta, Canada
Marina L. Gavrilova
CloudFabriQ Ltd., London, United Kingdom
C.J. Kenneth Tan
University of Kentucky, Lexington, Kentucky, USA
Himanshu Thapliyal
Department of Computer Science and Engineering, University of South Florida, Tampa, Florida, USA
Nagarajan Ranganathan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Perumalla, K.S., Yoginath, S.B. (2014). Towards Reversible Basic Linear Algebra Subprograms: A Performance Study. In: Gavrilova, M., Tan, C., Thapliyal, H., Ranganathan, N. (eds) Transactions on Computational Science XXIV. Lecture Notes in Computer Science(), vol 8911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45711-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-662-45711-5_4
Published: 06 December 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45710-8
Online ISBN: 978-3-662-45711-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Reversible Basic Linear Algebra Subprograms: A Performance Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LC-MEMENTO: A Memory Model for Accelerated Architectures

Supporting Data Shuffle Between Threads in OpenMP

Low-Overhead Fault-Tolerance Support Using DISC Programming Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Reversible Basic Linear Algebra Subprograms: A Performance Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LC-MEMENTO: A Memory Model for Accelerated Architectures

Supporting Data Shuffle Between Threads in OpenMP

Low-Overhead Fault-Tolerance Support Using DISC Programming Model

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation