{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T05:41:53Z","timestamp":1741066913577,"version":"3.38.0"},"reference-count":40,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2018,9,19]],"date-time":"2018-09-19T00:00:00Z","timestamp":1537315200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,7]]},"abstract":" In this article, we present the ExaScale PaRallel finite element tearing and interconnecting SOlver (ESPRESO) finite element method (FEM) library, which includes an FEM toolbox with interfaces to professional and open-source simulation tools, and a massively parallel hybrid total finite element tearing and interconnecting (HTFETI) solver which can fully utilize the Oak Ridge Leadership Computing Facility Titan supercomputer and achieve superlinear scaling. This article presents several new techniques for finite element tearing and interconnecting (FETI) solvers designed for efficient utilization of supercomputers with a focus on (i) performance\u2014we present a fivefold reduction of solver runtime for the Laplace equation by redesigning the FETI solver and offloading the key workload to the accelerator. We compare Intel Xeon Phi 7120p and Tesla K80 and P100 accelerators to Intel Xeon E5-2680v3 and Xeon Phi 7210 central processing units; and (ii) memory efficiency\u2014we present two techniques which increase the efficiency of the HTFETI solver 1.8 times and push the limits of the largest possible problem ESPRESO that can solve from 124 to 223 billion unknowns for problems with unstructured meshes. Finally, we show that by dynamically tuning hardware parameters, we can reduce energy consumption by up to 33%. <\/jats:p>","DOI":"10.1177\/1094342018798452","type":"journal-article","created":{"date-parts":[[2018,9,20]],"date-time":"2018-09-20T03:57:09Z","timestamp":1537415829000},"page":"660-677","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":10,"title":["A massively parallel and memory-efficient FEM toolbox with a hybrid total FETI solver with accelerator support"],"prefix":"10.1177","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1017-5766","authenticated-orcid":false,"given":"Lubomir","family":"Riha","sequence":"first","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]},{"given":"Michal","family":"Merta","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"},{"name":"Department of Applied Mathematics, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]},{"given":"Radim","family":"Vavrik","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]},{"given":"Tomas","family":"Brzobohaty","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]},{"given":"Alexandros","family":"Markopoulos","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]},{"given":"Ondrej","family":"Meca","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7849-2744","authenticated-orcid":false,"given":"Ondrej","family":"Vysocky","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]},{"given":"Tomas","family":"Kozubek","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]},{"given":"Vit","family":"Vondrak","sequence":"additional","affiliation":[{"name":"IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic"}]}],"member":"179","published-online":{"date-parts":[[2018,9,19]]},"reference":[{"key":"bibr1-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479894278952"},{"key":"bibr2-1094342018798452","unstructured":"ANSYS (2017) ANSYS workbench. (accessed 29 August 2018)."},{"key":"bibr3-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1137\/130931989"},{"key":"bibr4-1094342018798452","unstructured":"Casadei A, Faverge M, Lacoste X, et al. (2008\u20132017) Parallel sparse matriX package. Available at: http:\/\/pastix.gforge.inria.fr (accessed 29 August 2018)."},{"key":"bibr5-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1002\/cnm.881"},{"key":"bibr6-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1016\/0045-7825(94)90068-X"},{"key":"bibr8-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1007\/BF02905857"},{"key":"bibr9-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1145\/1089014.1089021"},{"key":"bibr10-1094342018798452","unstructured":"Intel Corporation (2003\u20132017) Intel Math Kernel Library. Available at: https:\/\/software.intel.com\/en-us\/intel-mkl (accessed 29 August 2018)."},{"key":"bibr11-1094342018798452","unstructured":"Intel Corporation (2012) Intel Xeon Processor e5-2600 product family uncore performance monitoring guide. Available at: https:\/\/www.intel.com\/content\/dam\/www\/public\/us\/en\/documents\/design-guides\/xeon-e5-2600-uncore-guide.pdf (accessed 29 August 2018)."},{"key":"bibr12-1094342018798452","unstructured":"Intel Corporation (2017a) Intel Xeon E5-2689v3. Available at: https:\/\/ark.intel.com\/products\/81908\/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz (accessed 29 August 2018)."},{"key":"bibr13-1094342018798452","unstructured":"Intel Corporation (2017b) Intel Xeon Phi 7120p. Available at: https:\/\/ark.intel.com\/products\/75799\/Intel-Xeon-Phi-Coprocessor-7120P-16GB-1_238-GHz-61-core (accessed 29 August 2018)."},{"key":"bibr14-1094342018798452","unstructured":"Intel Corporation (2017c) Intel Xeon Phi 7210. Available at: https:\/\/ark.intel.com\/products\/94033\/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core (accessed 29 August 2018)."},{"key":"bibr15-1094342018798452","unstructured":"IT Center for Science, CSC (2017) Elmer. Available at: https:\/\/www.csc.fi\/web\/elmer\/elmer (accessed 29 August 2018)."},{"key":"bibr16-1094342018798452","unstructured":"IT4Innovations National Supercomputing Centre, IT4I (2017) Salomon supercomputer. Available at: https:\/\/docs.it4i.cz\/salomon\/introduction\/ (accessed 29 August 2018)."},{"key":"bibr17-1094342018798452","first-page":"1","volume-title":"International workshop on coupled methods in numerical dynamics","volume":"1000","author":"Jasak H","year":"2007"},{"key":"bibr18-1094342018798452","volume-title":"Intel Xeon Phi Coprocessor High Performance Programming","author":"Jeffers J","year":"2013","edition":"1"},{"volume-title":"High Performance Parallelism Pearls Volume One: Multicore and Many-Core Programming Approaches","year":"2014","author":"Jeffers J","key":"bibr19-1094342018798452"},{"volume-title":"High Performance Parallelism Pearls Volume Two: Multicore and Many-Core Programming Approaches","year":"2015","author":"Jeffers J","key":"bibr20-1094342018798452"},{"key":"bibr21-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1002\/zamm.200900329"},{"key":"bibr22-1094342018798452","first-page":"797","volume":"27","author":"Klawonn A","year":"2016","journal-title":"Advances in Parallel Computing"},{"key":"bibr23-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-39929-4_25"},{"key":"bibr24-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40047-6_54"},{"key":"bibr25-1094342018798452","unstructured":"Lawrence Livermore National Laboratory, LLNL (2017) Sequoia supercomputer. Available at: http:\/\/computation.llnl.gov\/computers\/sequoia (accessed 29 August 2018)."},{"key":"bibr26-1094342018798452","unstructured":"NVIDIA Corporation (2006\u20132017) NVIDIA CUDA Toolkit. Available at: https:\/\/developer.nvidia.com\/cuda-toolkit (accessed 29 August 2018)."},{"key":"bibr27-1094342018798452","unstructured":"NVIDIA Corporation (2017a) Nvidia Tesla K20X. Available at: http:\/\/www.nvidia.com\/content\/PDF\/kepler\/Tesla-K20X-BD-06397-001-v05.pdf (accessed 29 August 2018)."},{"key":"bibr28-1094342018798452","unstructured":"NVIDIA Corporation (2017b) Nvidia Tesla K80. Available at: http:\/\/www.nvidia.com\/object\/tesla-k80.html (accessed 29 August 2018)."},{"key":"bibr29-1094342018798452","unstructured":"NVIDIA Corporation (2017c) Nvidia Tesla P100. Available at: http:\/\/www.nvidia.com\/object\/tesla-p100.html (accessed 29 August 2018)."},{"key":"bibr30-1094342018798452","unstructured":"Oak Ridge Leadership Computing Facility, OLCF (2017) Titan supercomputer. Available at: https:\/\/www.olcf.ornl.gov\/computing-resources\/titan-cray-xk7\/ (accessed 1 December 2017)."},{"key":"bibr31-1094342018798452","unstructured":"READEX (2018) Horizon 2020 READEX project. Available at: https:\/\/www.readex.eu (accessed 29 August 2018)."},{"key":"bibr32-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2016.06.004"},{"volume-title":"Platform for advanced scientific computing conference","year":"2016","author":"Riha L","key":"bibr33-1094342018798452"},{"volume-title":"HPCSE 2015","year":"2016","author":"Riha L","key":"bibr34-1094342018798452"},{"key":"bibr35-1094342018798452","unstructured":"RIKEN Advanced Institute for Computational Science, AICS (2017) K computer. Available at: http:\/\/www.aics.riken.jp\/en\/k-computer\/system (accessed 29 August 2018)."},{"key":"bibr36-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1137\/070707002"},{"key":"bibr37-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1007\/s10589-006-9003-y"},{"key":"bibr38-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-016-0532-7"},{"key":"bibr39-1094342018798452","first-page":"25","volume":"24","author":"Shroeder W","year":"1998","journal-title":"Prentice Hall"},{"key":"bibr40-1094342018798452","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2015.08.026"},{"key":"bibr41-1094342018798452","doi-asserted-by":"publisher","DOI":"10.4203\/ccp.111.3"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018798452","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342018798452","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018798452","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T13:59:46Z","timestamp":1741010386000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342018798452"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9,19]]},"references-count":40,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,7]]}},"alternative-id":["10.1177\/1094342018798452"],"URL":"https:\/\/doi.org\/10.1177\/1094342018798452","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2018,9,19]]}}}