{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,6]],"date-time":"2024-10-06T04:17:25Z","timestamp":1728188245059},"reference-count":32,"publisher":"SAGE Publications","issue":"5","license":[{"start":{"date-parts":[[2020,7,14]],"date-time":"2020-07-14T00:00:00Z","timestamp":1594684800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2020,9]]},"abstract":" Enhanced-precision global sums are key to reproducibility in exascale applications. We examine two classic summation algorithms and show that vectorized versions are fast, good and reproducible at exascale. Both 256-bit and 512-bit implementations speed up the operation by almost a factor of four over the serial version. They thus demonstrate improved performance on global summations while retaining the numerical reproducibility of these methods. <\/jats:p>","DOI":"10.1177\/1094342020938425","type":"journal-article","created":{"date-parts":[[2020,7,14]],"date-time":"2020-07-14T15:27:11Z","timestamp":1594740431000},"page":"519-531","update-policy":"http:\/\/dx.doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Fast, good, and repeatable: Summations, vectorization, and reproducibility"],"prefix":"10.1177","volume":"34","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-3203-0805","authenticated-orcid":false,"given":"Brett","family":"Neuman","sequence":"first","affiliation":[{"name":"High Performance Computing Division, Los Alamos National Laboratory, Los Alamos, NM, USA"}]},{"given":"Andy","family":"Dubois","sequence":"additional","affiliation":[{"name":"High Performance Computing Division, Los Alamos National Laboratory, Los Alamos, NM, USA"}]},{"given":"Laura","family":"Monroe","sequence":"additional","affiliation":[{"name":"High Performance Computing Division, Los Alamos National Laboratory, Los Alamos, NM, USA"}]},{"given":"Robert W","family":"Robey","sequence":"additional","affiliation":[{"name":"Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM, USA"}]}],"member":"179","published-online":{"date-parts":[[2020,7,14]]},"reference":[{"volume-title":"EECS Department, University of California, Berkeley, Tech. Rep. UCB\/EECS-2015-229","year":"2015","author":"Ahrens P","key":"bibr1-1094342020938425"},{"key":"bibr2-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2005.52"},{"key":"bibr3-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2015.34"},{"key":"bibr4-1094342020938425","doi-asserted-by":"crossref","unstructured":"Collange S, Defour D, Graillat S, et al. (2015) Numerical reproducibility for the parallel reduction on multi- and many-core architectures. Parallel Computing 49: 83\u201397. [Online]. Available at: http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0167819115001155 (accessed 21 August 2019).","DOI":"10.1016\/j.parco.2015.09.001"},{"key":"bibr5-1094342020938425","unstructured":"Demmel J, Nguyen HD, Ahrens P (2015) Cost of floating-point reproducibility. Available at: https:\/\/www.nist.gov\/sites\/default\/files\/documents\/itl\/ssd\/is\/NRE-2015-07-Nguyen_slides.pdf (accessed 7 August 2019)."},{"key":"bibr6-1094342020938425","first-page":"1","volume-title":"Proceedings of the 2014 Workshop on Programming models for SIMD\/Vector processing","volume":"2014","author":"Est\u00e9rie P","year":"2014"},{"key":"bibr7-1094342020938425","unstructured":"Fog A (2019a) VCL C++ vector class manual. Available at: https:\/\/www.agner.org\/optimize\/vcl_manual.pdf (accessed 7 August 2019)."},{"key":"bibr8-1094342020938425","unstructured":"Fog A (2019b) VCL C++ vector class source code. Available at: https:\/\/github.com\/vectorclass (accessed 7 August 2019)."},{"key":"bibr9-1094342020938425","unstructured":"GCC (n.d.) GCC vector extensions. Available at: https:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Vector-Extensions.html (accessed 22 July 2019)."},{"volume-title":"Report of the hpc correctness summit, jan 25\u201326, 2017","year":"2017","author":"Gopalakrishnan G","key":"bibr10-1094342020938425"},{"key":"bibr11-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008153532043"},{"key":"bibr12-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1137\/0914050"},{"key":"bibr13-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718027"},{"key":"bibr14-1094342020938425","unstructured":"Iakymchuk R, Collange S, Defour D, et al. (2015) ExBLAS: reproducible and accurate BLAS library. Available at: https:\/\/www.nist.gov\/sites\/default\/files\/documents\/itl\/ssd\/is\/NRE-2015-04-iakymchuk.pdf (accessed 24 July 2019)."},{"key":"bibr15-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1145\/363707.363723"},{"key":"bibr16-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-005-0139-x"},{"volume":"2","volume-title":"The Art of Computer Programming","year":"1969","author":"Knuth DE","key":"bibr17-1094342020938425"},{"volume-title":"Numerical methods and Fortran Programming: With Applications in Engineering and Science","year":"1964","author":"McCracken DD","key":"bibr18-1094342020938425"},{"volume-title":"Invited Talk, Supercomputing","year":"2016","author":"McCalpin JD","key":"bibr19-1094342020938425"},{"key":"bibr20-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1002\/zamm.19740540106"},{"key":"bibr21-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1177\/1094342019839124"},{"key":"bibr22-1094342020938425","unstructured":"Robey R (2019) Global sum examples. Available at: https:\/\/github.com\/LANL\/GlobalSums (accessed 8 July 2019)."},{"key":"bibr23-1094342020938425","unstructured":"Robey R, Zamora Y (2019a) Vectorization examples. Available at: https:\/\/github.com\/EssentialsofParallelComputing\/Chapter6 (accessed 8 July 2019)."},{"key":"bibr24-1094342020938425","unstructured":"Robey R, Zamora Y (2019b) Openmp examples. Available at: https:\/\/github.com\/EssentialsofParallelComputing\/Chapter7 (accessed 8 July 2019)."},{"volume-title":"Parallel and High Performance Computing","author":"Robey R","key":"bibr25-1094342020938425"},{"volume-title":"Computational reproducibility in production physics applications","year":"2015","author":"Robey RW","key":"bibr26-1094342020938425"},{"key":"bibr27-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.02.009"},{"key":"bibr28-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470481"},{"key":"bibr29-1094342020938425","unstructured":"Wikipedia Contributors (2019a) Kahan summation algorithm\u2014Wikipedia, the free encyclopedia. Available at: https:\/\/en.wikipedia.org\/w\/index.php?title=Kahan_summation_algorithmoldid=910078822 (accessed 27 August 2019)."},{"key":"bibr30-1094342020938425","unstructured":"Wikipedia Contributors (2019a) Pairwise summation\u2014Wikipedia, the free encyclopedia. Available at: https:\/\/en.wikipedia.org\/w\/index.php?title=Pairwise_summationoldid=899870482 (accessed 27 August 2019)."},{"volume-title":"Rounding Errors in Algebraic Processes","year":"1994","author":"Wilkinson J","key":"bibr31-1094342020938425"},{"key":"bibr32-1094342020938425","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2017.42"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342020938425","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342020938425","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342020938425","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,5]],"date-time":"2024-10-05T14:57:42Z","timestamp":1728140262000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342020938425"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,14]]},"references-count":32,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["10.1177\/1094342020938425"],"URL":"https:\/\/doi.org\/10.1177\/1094342020938425","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2020,7,14]]}}}