Reproducible and Accurate Matrix Multiplication | SpringerLink
Skip to main content

Reproducible and Accurate Matrix Multiplication

  • Conference paper
  • First Online:
Scientific Computing, Computer Arithmetic, and Validated Numerics (SCAN 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9553))

Abstract

Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectures, getting a bit-wise reproducible floating-point result for multiple executions of the same code on different or even similar parallel architectures is challenging. In this paper, we address the problem of reproducibility in the context of matrix multiplication and propose an algorithm that yields both reproducible and accurate results. This algorithm is composed of two main stages: a filtering stage that uses fast vectorized floating-point expansions in conjunction with error-free transformations; an accumulation stage based on Kulisch long accumulators in a high-radix carry-save representation. Finally, we provide implementations and performance results in parallel environments like GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In general, x stands for four different formats, but in the scope of this article we consider x to correspond to single (S) or double (D) precision.

  2. 2.

    http://bebop.cs.berkeley.edu/reproblas/.

References

  1. Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). Supercomputing 1998, 1–27. IEEE Computer Society (1998)

    Google Scholar 

  2. Goto, K., van de Geijn, R.A.: High-performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1), 1–14 (2008)

    Article  MathSciNet  Google Scholar 

  3. Fabregat-Traver, D., Bientinesi, P.: Computing petaflops over terabytes of data: the case of genome-wide association studies. ACM Trans. Math. Softw. 40(4), 27:1–27:22 (2014)

    Article  Google Scholar 

  4. Bergman, K., al.: Exascale computing study: technology challenges in achieving exascale systems. DARPA report, September 2008

    Google Scholar 

  5. Whitehead, N., Fit-Florea, A.: Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. Technical report, NVIDIA (2011)

    Google Scholar 

  6. Corden, M.: Differences in floating-point arithmetic between Intel Xeon processors and the Intel Xeon Phi™ coprocessor. Technical report, Intel (2013)

    Google Scholar 

  7. Doertel, K.: Best known method: Avoid heterogeneous precision in control flow calculations. Technical report, Intel (2013)

    Google Scholar 

  8. Kulisch, U., Snyder, V.: The exact dot product as basic tool for long interval arithmetic. Computing 91(3), 307–313 (2011)

    Article  MathSciNet  Google Scholar 

  9. Demmel, J., Nguyen, H.D.: Fast reproducible floating-point summation. In: Proceedings of the 21st IEEE Symposium on Computer Arithmetic, Austin, Texas, USA, pp. 163–172 (2013)

    Google Scholar 

  10. Collange, C., Defour, D., Graillat, S., Iakymchuk, R.: Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures. Technical report HAL: hal-00949355, INRIA, DALI-LIRMM, LIP6, ICS, February 2014

    Google Scholar 

  11. IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic. IEEE Standard 754–2008, August 2008

    Google Scholar 

  12. Higham, N.J.: Accuracy and stability of numerical algorithms, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2002)

    Book  Google Scholar 

  13. Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010)

    Book  Google Scholar 

  14. Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw. 28(2), 152–205 (2002)

    Article  MathSciNet  Google Scholar 

  15. Hida, Y., Li, X.S., Bailey, D.H.: Algorithms for quad-double precision floating point arithmetic. In: Proceedings of the 15th IEEE Symposium on Computer Arithmetic, CA, USA, 155–162. IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

  16. Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2, 3rd edn. Addison-Wesley, Boston (1997)

    MATH  Google Scholar 

  17. Matsumoto, K., Nakasato, N., Sakai, T., Yahagi, H., Sedukhin, S.G.: Multi-level optimization of matrix multiplication for gpu-equipped systems. In: ICCS. Procedia Computer Science, vol. 4, pp. 342–351. Elsevier (2011)

    Google Scholar 

Download references

Acknowledgement

This work undertaken (partially) in the framework of CALSIMLAB is supported by the public grant ANR-11-LABX-0037-01 overseen by the French National Research Agency (ANR) as part of the “Investissements d’Avenir” program (reference: ANR-11-IDEX-0004-02). This work was also (partially) supported by the FastRelax project through the ANR public grant (reference: ANR-14-CE25-0018-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman Iakymchuk .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Iakymchuk, R., Defour, D., Collange, C., Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. In: Nehmeier, M., Wolff von Gudenberg, J., Tucker, W. (eds) Scientific Computing, Computer Arithmetic, and Validated Numerics. SCAN 2015. Lecture Notes in Computer Science(), vol 9553. Springer, Cham. https://doi.org/10.1007/978-3-319-31769-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31769-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31768-7

  • Online ISBN: 978-3-319-31769-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics