Reproducible and Accurate Matrix Multiplication

Iakymchuk, Roman; Defour, David; Collange, Caroline; Graillat, Stef

doi:10.1007/978-3-319-31769-4_11

Roman Iakymchuk^16,17,18,
David Defour¹⁹,
Caroline Collange²⁰ &
…
Stef Graillat^16,17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9553))

Included in the following conference series:

International Symposium on Scientific Computing, Computer Arithmetic, and Validated Numerics

846 Accesses
4 Citations

Abstract

Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectures, getting a bit-wise reproducible floating-point result for multiple executions of the same code on different or even similar parallel architectures is challenging. In this paper, we address the problem of reproducibility in the context of matrix multiplication and propose an algorithm that yields both reproducible and accurate results. This algorithm is composed of two main stages: a filtering stage that uses fast vectorized floating-point expansions in conjunction with error-free transformations; an accumulation stage based on Kulisch long accumulators in a high-radix carry-save representation. Finally, we provide implementations and performance results in parallel environments like GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

High-Performance Matrix-Matrix Multiplications of Very Small Matrices

Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures

Sparse Matrix-Vector Product

Notes

1.
In general, x stands for four different formats, but in the scope of this article we consider x to correspond to single (S) or double (D) precision.
2.
http://bebop.cs.berkeley.edu/reproblas/.

References

Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). Supercomputing 1998, 1–27. IEEE Computer Society (1998)
Google Scholar
Goto, K., van de Geijn, R.A.: High-performance implementation of the level-3 BLAS. ACM Trans. Math. Softw. 35(1), 1–14 (2008)
Article MathSciNet Google Scholar
Fabregat-Traver, D., Bientinesi, P.: Computing petaflops over terabytes of data: the case of genome-wide association studies. ACM Trans. Math. Softw. 40(4), 27:1–27:22 (2014)
Article Google Scholar
Bergman, K., al.: Exascale computing study: technology challenges in achieving exascale systems. DARPA report, September 2008
Google Scholar
Whitehead, N., Fit-Florea, A.: Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. Technical report, NVIDIA (2011)
Google Scholar
Corden, M.: Differences in floating-point arithmetic between Intel Xeon processors and the Intel Xeon Phi™ coprocessor. Technical report, Intel (2013)
Google Scholar
Doertel, K.: Best known method: Avoid heterogeneous precision in control flow calculations. Technical report, Intel (2013)
Google Scholar
Kulisch, U., Snyder, V.: The exact dot product as basic tool for long interval arithmetic. Computing 91(3), 307–313 (2011)
Article MathSciNet Google Scholar
Demmel, J., Nguyen, H.D.: Fast reproducible floating-point summation. In: Proceedings of the 21st IEEE Symposium on Computer Arithmetic, Austin, Texas, USA, pp. 163–172 (2013)
Google Scholar
Collange, C., Defour, D., Graillat, S., Iakymchuk, R.: Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures. Technical report HAL: hal-00949355, INRIA, DALI-LIRMM, LIP6, ICS, February 2014
Google Scholar
IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic. IEEE Standard 754–2008, August 2008
Google Scholar
Higham, N.J.: Accuracy and stability of numerical algorithms, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2002)
Book Google Scholar
Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010)
Book Google Scholar
Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.J.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw. 28(2), 152–205 (2002)
Article MathSciNet Google Scholar
Hida, Y., Li, X.S., Bailey, D.H.: Algorithms for quad-double precision floating point arithmetic. In: Proceedings of the 15th IEEE Symposium on Computer Arithmetic, CA, USA, 155–162. IEEE Computer Society Press, Los Alamitos (2001)
Google Scholar
Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2, 3rd edn. Addison-Wesley, Boston (1997)
MATH Google Scholar
Matsumoto, K., Nakasato, N., Sakai, T., Yahagi, H., Sedukhin, S.G.: Multi-level optimization of matrix multiplication for gpu-equipped systems. In: ICCS. Procedia Computer Science, vol. 4, pp. 342–351. Elsevier (2011)
Google Scholar

Download references

Acknowledgement

This work undertaken (partially) in the framework of CALSIMLAB is supported by the public grant ANR-11-LABX-0037-01 overseen by the French National Research Agency (ANR) as part of the “Investissements d’Avenir” program (reference: ANR-11-IDEX-0004-02). This work was also (partially) supported by the FastRelax project through the ANR public grant (reference: ANR-14-CE25-0018-01).

Author information

Authors and Affiliations

Sorbonne Universités UPMC Univ Paris 06, UMR 7606, LIP6, 75005, Paris, France
Roman Iakymchuk & Stef Graillat
CNRS, UMR 7606, LIP6, 75005, Paris, France
Roman Iakymchuk & Stef Graillat
Sorbonne Universités UPMC Univ Paris 06, ICS, 75005, Paris, France
Roman Iakymchuk
DALI–LIRMM, Université de Perpignan, 52 Avenue Paul Alduy, 66860, Perpignan, France
David Defour
INRIA – Centre de Recherche Rennes – Bretagne Atlantique, Campus de Beaulieu, 35042, Rennes Cedex, France
Caroline Collange

Authors

Roman Iakymchuk
View author publications
You can also search for this author in PubMed Google Scholar
David Defour
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Collange
View author publications
You can also search for this author in PubMed Google Scholar
Stef Graillat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roman Iakymchuk .

Editor information

Editors and Affiliations

Institute of Computer Science, University of Würzburg, Würzburg, Germany
Marco Nehmeier
Universität Würzburg, Würzburg, Germany
Jürgen Wolff von Gudenberg
Department of Mathematics, Uppsala University, Uppsala, Sweden
Warwick Tucker

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iakymchuk, R., Defour, D., Collange, C., Graillat, S. (2016). Reproducible and Accurate Matrix Multiplication. In: Nehmeier, M., Wolff von Gudenberg, J., Tucker, W. (eds) Scientific Computing, Computer Arithmetic, and Validated Numerics. SCAN 2015. Lecture Notes in Computer Science(), vol 9553. Springer, Cham. https://doi.org/10.1007/978-3-319-31769-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-31769-4_11
Published: 09 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31768-7
Online ISBN: 978-3-319-31769-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reproducible and Accurate Matrix Multiplication

Abstract

Access this chapter

Similar content being viewed by others

High-Performance Matrix-Matrix Multiplications of Very Small Matrices

Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures

Sparse Matrix-Vector Product

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Reproducible and Accurate Matrix Multiplication

Abstract

Access this chapter

Similar content being viewed by others

High-Performance Matrix-Matrix Multiplications of Very Small Matrices

Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures

Sparse Matrix-Vector Product

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation