{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T07:18:16Z","timestamp":1724743096925},"reference-count":43,"publisher":"Wiley","issue":"13","license":[{"start":{"date-parts":[[2014,2,5]],"date-time":"2014-02-05T00:00:00Z","timestamp":1391558400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2015,9,10]]},"abstract":"Summary<\/jats:title>The resolution of the 3D Helmholtz equation is required in the development of models related to a wide range of scientific and technological applications. For solving this equation in complex arithmetic, the biconjugate gradient (BCG) method is one of the most relevant solvers. However, this iterative method has a high computational cost because of the large sparse matrix and the vector operations involved. In this paper, a specific BCG method, adapted for the regularities of the Helmholtz equation is presented. This BCG is based on the implementation of a novel format (named \u2018Regular Format\u2019) that allows the storage of the large sparse matrix involved in the sparse matrix vector product in a compact form. The contribution of this work is twofold: (1) decreasing the memory requirements of the 3D Helmholtz equation using the \u2018Regular Format\u2019 and (2) speeding up the resolution of the equation using high performance computing resources. A hybrid Message Passing Interface (MPI)\u2010graphics processing unit CUDA GPU parallelization that is capable of solving complex problems in short time has carried out (Fast\u2010Helmholtz). Fast\u2010Helmholtz combines optimizations at Message Passing Interface and GPU levels to reduce communications costs and to improve the exploitation of GPU architecture. This strategy makes it possible to extend the dimension of the Helmholtz problem to be solved, thanks to the relevant reduction of memory requirements and runtime. Copyright \u00a9 2014 John Wiley & Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.3212","type":"journal-article","created":{"date-parts":[[2014,2,5]],"date-time":"2014-02-05T05:47:40Z","timestamp":1391579260000},"page":"3205-3219","source":"Crossref","is-referenced-by-count":5,"title":["Parallel resolution of the 3D Helmholtz equation based on multi\u2010graphics processing unit clusters"],"prefix":"10.1002","volume":"27","author":[{"given":"Gloria","family":"Ortega","sequence":"first","affiliation":[{"name":"Informatics Department, Agrifood Campus of Int. Excellence (ceiA3) University of Almer\u00eda 04120, Almer\u00eda Spain"}]},{"given":"Julia","family":"Lobera","sequence":"additional","affiliation":[{"name":"Centro Universitario de la Defensa de Zaragoza, Ctra. Huesca s\/n 50090 Zaragoza Spain"}]},{"given":"Inmaculada","family":"Garc\u00eda","sequence":"additional","affiliation":[{"name":"Computer Architecture and Electronics University of M\u00e1laga 29071, M\u00e1laga Spain"}]},{"given":"M.","family":"Pilar Arroyo","sequence":"additional","affiliation":[{"name":"Arag\u00f3n Institute of Engineering Research (I3A). University of Zaragoza 50009, Zaragoza Spain"}]},{"given":"Ester M.","family":"Garz\u00f3n","sequence":"additional","affiliation":[{"name":"Informatics Department, Agrifood Campus of Int. Excellence (ceiA3) University of Almer\u00eda 04120, Almer\u00eda Spain"}]}],"member":"311","published-online":{"date-parts":[[2014,2,5]]},"reference":[{"key":"e_1_2_9_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-61529-0"},{"key":"e_1_2_9_3_1","volume-title":"Sound, Structures, and Their Interaction","author":"Junger MC","year":"1986"},{"key":"e_1_2_9_4_1","doi-asserted-by":"crossref","unstructured":"HarrisM.Fast fluid dynamics simulation on the GPU ACM SIGGRAPH 2005 Courses New York NY USA 2005;637\u2013665.","DOI":"10.1145\/1198555.1198790"},{"key":"e_1_2_9_5_1","volume-title":"Numerical Techniques in Electromagnetics","author":"Sadiku MNO","year":"2001"},{"key":"e_1_2_9_6_1","volume-title":"Fast Multipole Methods for the Helmholtz Equation in Three Dimensions","author":"Nail A","year":"2004"},{"key":"e_1_2_9_7_1","first-page":"397","article-title":"Solution of Helmholtz problems by knowledge\u2010based FEM","volume":"4","author":"Ihlenburg F","year":"1997","journal-title":"CAMES"},{"key":"e_1_2_9_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0036142994269186"},{"key":"e_1_2_9_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/nme.883"},{"key":"e_1_2_9_10_1","doi-asserted-by":"publisher","DOI":"10.1088\/0957-0233\/19\/7\/074013"},{"key":"e_1_2_9_11_1","unstructured":"TOP 500 supercomputing site. Available form:http:\/\/www.top500.org\/[Accessed on 21 january 2014]."},{"key":"e_1_2_9_12_1","doi-asserted-by":"crossref","unstructured":"JacobsenDA ThibaultJC SenocakI.An MPI\u2010CUDA implementation for massively parallel incompressible flow computations on multi\u2010GPU clusters 2010. Available from:http:\/\/scholarworks.boisestate.edu\/cgi\/viewcontent.cgi?article=1004&context=mecheng_facpubs[Accessed on 21 january 2014].","DOI":"10.2514\/6.2010-522"},{"key":"e_1_2_9_13_1","doi-asserted-by":"crossref","unstructured":"OrtegaG LoberaJ ArroyoMP Garc\u00edaI Garz\u00f3nEM.High performance computing for optical diffraction Tomography.Proceedings of The 2012 International Conference on High Performance Computing & Simulation (HPCS 2012) 2012;195\u2013201.","DOI":"10.1109\/HPCSim.2012.6266911"},{"key":"e_1_2_9_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718003"},{"key":"e_1_2_9_15_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611971538"},{"key":"e_1_2_9_16_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511615115"},{"key":"e_1_2_9_17_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611970937"},{"key":"e_1_2_9_18_1","doi-asserted-by":"publisher","DOI":"10.1002\/1099-1506(200005)7:4<197::AID-NLA194>3.0.CO;2-S"},{"key":"e_1_2_9_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/20.106415"},{"key":"e_1_2_9_20_1","unstructured":"BalayS et al.PETSc Users Manual. Revision 3.3. Available from:http:\/\/www.mcs.anl.gov\/petsc\/petsc\u2010current\/docs\/manual.pdf[Accessed on 21 january 2014]."},{"key":"e_1_2_9_21_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.2979"},{"key":"e_1_2_9_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33134-3_69"},{"key":"e_1_2_9_23_1","unstructured":"BordageC.Parallelization on heterogeneous multicore and multi\u2010GPU systems of the fast multipole method for the Helmholtz equation using a runtime system.ADVCIMP12 Barcelone Espagne September2012;90\u201395. Available from:http:\/\/hal.inria.fr\/hal\u201000773114[Accessed on 21 january 2014]."},{"key":"e_1_2_9_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00791\u2010007\u20100069\u20106"},{"key":"e_1_2_9_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.12.006"},{"key":"e_1_2_9_26_1","unstructured":"INTEL.Math kernel library 2013. Available from:http:\/\/software.intel.com\/en\u2010us\/articles\/intel\u2010math\u2010kernel\u2010library\u2010documentation[Accessed on 21 january 2014]."},{"key":"e_1_2_9_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.12.010"},{"key":"e_1_2_9_28_1","doi-asserted-by":"crossref","unstructured":"BellN GarlandM.Implementing sparse matrix\u2010vector multiplication on throughput\u2010oriented processors.Sc '09: Proceedings of the Conference on High Performance Computing Networking Storage and Analysis New York NY USA 2009;1\u201311.","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_2_9_29_1","doi-asserted-by":"publisher","DOI":"10.1080\/17445760802337010"},{"key":"e_1_2_9_30_1","doi-asserted-by":"crossref","unstructured":"MonakovA LokhmotovA AvetisyanA.Automatically tuning sparse matrix\u2010vector multiplication for GPU architectures.Proceedings of HiPEAC 2010 LNCS 5952 Pisa Italy 2010;111\u2013125.","DOI":"10.1007\/978-3-642-11515-8_10"},{"key":"e_1_2_9_31_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1658"},{"key":"e_1_2_9_32_1","doi-asserted-by":"crossref","unstructured":"V\u00e1zquezF OrtegaG Fern\u00e1ndezJJ Garz\u00f3nEM.Improving the performance of the sparse matrix vector product with GPUs.10th IEEE International Conference on Computer and Information Technology. CIT 2010 2010;1146\u20131151.","DOI":"10.1109\/CIT.2010.208"},{"key":"e_1_2_9_33_1","unstructured":"NVIDIA.Cusparse library V5.5 2013. Available from:http:\/\/docs.nvidia.com\/cuda\/cusparse\/[Accessed on 21 january 2014]."},{"key":"e_1_2_9_34_1","volume-title":"Matrix Computations (Johns Hopkins Studies in Mathematical Sciences)","author":"Golub GH","year":"1996"},{"key":"e_1_2_9_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jsb.2010.01.021"},{"key":"e_1_2_9_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/0898\u20101221(95)00144\u2010N"},{"key":"e_1_2_9_37_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0219876204000083"},{"key":"e_1_2_9_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(97)00005-7"},{"key":"e_1_2_9_39_1","doi-asserted-by":"publisher","DOI":"10.1093\/acprof:oso\/9780198529392.001.0001"},{"key":"e_1_2_9_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227\u2010012\u20100761\u20102"},{"key":"e_1_2_9_41_1","volume-title":"Computer Organization and Design \u2010 The Hardware \/ Software Interface","author":"Patterson DA","year":"2012"},{"key":"e_1_2_9_42_1","volume-title":"MPI\u2010The Complete Reference, Volume 1: The MPI Core","author":"Snir M","year":"1998"},{"key":"e_1_2_9_43_1","unstructured":"NVIDIA Corporation 2701 San Tomas Expressway.Santa Clara 95050 USA.CUDA C Best Practices Guide. 2013. Available from:http:\/\/docs.nvidia.com\/cuda\/cuda\u2010c\u2010best\u2010practices\u2010guide\/index.html[Accessed on 21 january 2014]."},{"issue":"4","key":"e_1_2_9_44_1","first-page":"299","article-title":"Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors","volume":"27","author":"Anzt H","year":"2012","journal-title":"Computer Science \u2010 R&D"}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.3212","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.3212","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,2]],"date-time":"2023-09-02T12:52:47Z","timestamp":1693659167000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.3212"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,2,5]]},"references-count":43,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2015,9,10]]}},"alternative-id":["10.1002\/cpe.3212"],"URL":"https:\/\/doi.org\/10.1002\/cpe.3212","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"value":"1532-0626","type":"print"},{"value":"1532-0634","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,2,5]]}}}