{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T23:08:40Z","timestamp":1740179320390,"version":"3.37.3"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"PLDI","funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["1937301, CF-2143061"],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2023,6,6]]},"abstract":"We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor expressions to external functions of other tensor algebra libraries and compilers. Users can extend Mosaic by adding new functions and bind a sub-expression to a function using a scheduling API. Mosaic substitutes the bound sub-expressions with calls to the external functions and automatically generates the remaining code using a default code generator. As the generated code is fused by default, users can productively leverage both fusion and calls to specialized functions within the same compiler. We demonstrate the benefits of our dual approach by showing that calling hand-written CPU and specialized hardware functions can provide speedups of up to 206\u00d7 against fused code in some cases, while generating fused code can provide speedups of up to 3.57\u00d7 against code that calls external functions in other cases. Mosaic also offers a search system that can automatically map an expression to a set of registered external functions. Both the explicit binding and automatic search are verified by Mosaic. Additionally, the interface for adding new external functions is simple and general. Currently, 38 external functions have been added to Mosaic, with each addition averaging 20 lines of code.<\/jats:p>","DOI":"10.1145\/3591236","type":"journal-article","created":{"date-parts":[[2023,6,6]],"date-time":"2023-06-06T20:06:24Z","timestamp":1686081984000},"page":"394-419","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Mosaic: An Interoperable Compiler for Tensor Algebra"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-8570-1947","authenticated-orcid":false,"given":"Manya","family":"Bansal","sequence":"first","affiliation":[{"name":"Stanford University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4195-8106","authenticated-orcid":false,"given":"Olivia","family":"Hsu","sequence":"additional","affiliation":[{"name":"Stanford University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8779-0636","authenticated-orcid":false,"given":"Kunle","family":"Olukotun","sequence":"additional","affiliation":[{"name":"Stanford University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2267-903X","authenticated-orcid":false,"given":"Fredrik","family":"Kjolstad","sequence":"additional","affiliation":[{"name":"Stanford University, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,6,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322967"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523442"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485486"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186785.1186794"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1137\/060676489"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.7814275"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/141000671"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544559"},{"key":"e_1_2_1_9_1","volume-title":"Hall","author":"Chen Chun","year":"2007","unstructured":"Chun Chen , Jacqueline Chame , and Mary W . Hall . 2007 . CHiLL : A Framework for Composing High-Level Loop Transformations . Chun Chen, Jacqueline Chame, and Mary W. Hall. 2007. CHiLL : A Framework for Composing High-Level Loop Transformations."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Meghan Cowan , Haichen Shen , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018 . TVM: An Automated End-to-End Optimizing Compiler for Deep Learning . In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918) . USENIX Association, USA. 579\u2013594. isbn:978 1931971478 Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918). USENIX Association, USA. 579\u2013594. isbn:9781931971478"},{"key":"#cr-split#-e_1_2_1_11_1.1","unstructured":"Yu-Hsin Chen Tien-Ju Yang Joel Emer and Vivienne Sze. 2018. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. https:\/\/doi.org\/10.48550\/ARXIV.1807.07928 10.48550\/ARXIV.1807.07928"},{"key":"#cr-split#-e_1_2_1_11_1.2","doi-asserted-by":"crossref","unstructured":"Yu-Hsin Chen Tien-Ju Yang Joel Emer and Vivienne Sze. 2018. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. https:\/\/doi.org\/10.48550\/ARXIV.1807.07928","DOI":"10.1109\/JETCAS.2019.2910232"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3061394"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276493"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358276"},{"key":"e_1_2_1_15_1","volume-title":"Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations","author":"Dalton Steven","year":"2014","unstructured":"Steven Dalton , Nathan Bell , Luke Olson , and Michael Garland . 2014 . Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations . http:\/\/cusplibrary.github.io\/ Version 0.5.0. Steven Dalton, Nathan Bell, Luke Olson, and Michael Garland. 2014. Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. http:\/\/cusplibrary.github.io\/ Version 0.5.0."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/1792734.1792766"},{"key":"e_1_2_1_18_1","series-title":"SIAM journal on matrix analysis and applications, 13, 1","volume-title":"Sparse matrices in MATLAB: Design and implementation","author":"Gilbert John R","year":"1992","unstructured":"John R Gilbert , Cleve Moler , and Robert Schreiber . 1992. Sparse matrices in MATLAB: Design and implementation . SIAM journal on matrix analysis and applications, 13, 1 ( 1992 ), 333\u2013356. John R Gilbert, Cleve Moler, and Robert Schreiber. 1992. Sparse matrices in MATLAB: Design and implementation. SIAM journal on matrix analysis and applications, 13, 1 (1992), 333\u2013356."},{"volume-title":"GNU scientific library reference manual","author":"Gough Brian","key":"e_1_2_1_19_1","unstructured":"Brian Gough . 2009. GNU scientific library reference manual . Network Theory Ltd .. Brian Gough. 2009. GNU scientific library reference manual. Network Theory Ltd.."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/355791.355796"},{"key":"e_1_2_1_21_1","unstructured":"Bastian Hagedorn Johannes Lenfers Thomas Koehler Sergei Gorlatch and Michel Steuwer. 2020. A Language for Describing Optimization Strategies. arxiv:2002.02268. \t\t\t\t Bastian Hagedorn Johannes Lenfers Thomas Koehler Sergei Gorlatch and Michel Steuwer. 2020. A Language for Describing Optimization Strategies. arxiv:2002.02268."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3392717.3392751"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358275"},{"key":"e_1_2_1_24_1","volume-title":"Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture. https:\/\/doi.org\/10.48550\/ARXIV.2211.03251","author":"Hsu Olivia","year":"2022","unstructured":"Olivia Hsu , Alexander Rucker , Tian Zhao , Kunle Olukotun , and Fredrik Kjolstad . 2022 . Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture. https:\/\/doi.org\/10.48550\/ARXIV.2211.03251 10.48550\/ARXIV.2211.03251 Olivia Hsu, Alexander Rucker, Tian Zhao, Kunle Olukotun, and Fredrik Kjolstad. 2022. Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture. https:\/\/doi.org\/10.48550\/ARXIV.2211.03251"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582051"},{"key":"e_1_2_1_26_1","volume-title":"van de Geijn","author":"Huang Jianyu","year":"2017","unstructured":"Jianyu Huang , Devin A. Matthews , and Robert A . van de Geijn . 2017 . Strassen\u2019s Algorithm for Tensor Contraction. CoRR , abs\/1704.03092 (2017), arXiv:1704.03092. arxiv:1704.03092 Jianyu Huang, Devin A. Matthews, and Robert A. van de Geijn. 2017. Strassen\u2019s Algorithm for Tensor Contraction. CoRR, abs\/1704.03092 (2017), arXiv:1704.03092. arxiv:1704.03092"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523446"},{"volume-title":"Intel Math Kernel Library. Reference Manual","key":"e_1_2_1_28_1","unstructured":"2009. Intel Math Kernel Library. Reference Manual . Intel Corporation , Santa Clara . isbn:630813-054US 2009. Intel Math Kernel Library. Reference Manual. Intel Corporation, Santa Clara. isbn:630813-054US"},{"volume-title":"Intel Advanced Vector Extensions Programming Reference","key":"e_1_2_1_29_1","unstructured":"2011. Intel Advanced Vector Extensions Programming Reference . Intel Corporation , Santa Clara, USA. https:\/\/www.intel.com\/content\/dam\/develop\/external\/us\/en\/documents\/36945 2011. Intel Advanced Vector Extensions Programming Reference. Intel Corporation, Santa Clara, USA. https:\/\/www.intel.com\/content\/dam\/develop\/external\/us\/en\/documents\/36945"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080246"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2015.2414456"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661185"},{"key":"e_1_2_1_33_1","volume-title":"Taco: The tensor algebra compiler","author":"Kj\u00f8lstad Fredrik","year":"2022","unstructured":"Fredrik Kj\u00f8lstad , Stephen Chou , and Saman Amarasinghe . 2022 . Taco: The tensor algebra compiler . http:\/\/tensor-compiler.org\/ Fredrik Kj\u00f8lstad, Stephen Chou, and Saman Amarasinghe. 2022. Taco: The tensor algebra compiler. http:\/\/tensor-compiler.org\/"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192379"},{"key":"e_1_2_1_36_1","volume-title":"Bader","author":"Kolda Tamara G.","year":"2006","unstructured":"Tamara G. Kolda and Brett W . Bader . 2006 . MATLAB Tensor Toolbox , Version 00. https:\/\/www.osti.gov\/biblio\/1230898 Tamara G. Kolda and Brett W. Bader. 2006. MATLAB Tensor Toolbox, Version 00. https:\/\/www.osti.gov\/biblio\/1230898"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/355841.355847"},{"key":"e_1_2_1_39_1","volume-title":"GPU Technology Conference (GTC).","author":"Naumov P Vandermersch M","year":"2010","unstructured":"P Vandermersch M Naumov , LS Chien . 2010 . Cusparse library . GPU Technology Conference (GTC). P Vandermersch M Naumov, LS Chien. 2010. Cusparse library. GPU Technology Conference (GTC)."},{"volume-title":"version 7.10.0 (R2010a)","author":"MATLAB.","key":"e_1_2_1_40_1","unstructured":"MATLAB. 2010. version 7.10.0 (R2010a) . The MathWorks Inc., Natick, Massachusetts . MATLAB. 2010. version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts."},{"key":"e_1_2_1_41_1","volume-title":"High-Performance Tensor Contraction without BLAS. CoRR, abs\/1607.00291","author":"Matthews Devin","year":"2016","unstructured":"Devin Matthews . 2016. High-Performance Tensor Contraction without BLAS. CoRR, abs\/1607.00291 ( 2016 ), arXiv:1607.00291. arxiv:1607.00291 Devin Matthews. 2016. High-Performance Tensor Contraction without BLAS. CoRR, abs\/1607.00291 (2016), arXiv:1607.00291. arxiv:1607.00291"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925952"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-95953-1_7"},{"key":"e_1_2_1_44_1","unstructured":"NVIDIA. 2022. CUTLASS. https:\/\/developer.nvidia.com\/blog\/cutlass-linear-algebra-cuda\/ \t\t\t\t NVIDIA. 2022. CUTLASS. https:\/\/developer.nvidia.com\/blog\/cutlass-linear-algebra-cuda\/"},{"key":"e_1_2_1_45_1","unstructured":"NVIDIA. 2022. Tensor Cores. https:\/\/www.nvidia.com\/en-us\/data-center\/tensor-cores\/ \t\t\t\t NVIDIA. 2022. Tensor Cores. https:\/\/www.nvidia.com\/en-us\/data-center\/tensor-cores\/"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00067"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00015"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185528"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480047"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3428226"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2022.07.005"},{"key":"e_1_2_1_53_1","unstructured":"Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. arxiv:1512.00066. \t\t\t\t Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. arxiv:1512.00066."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.06.002"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00068"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00062"},{"key":"e_1_2_1_57_1","volume-title":"The ARM Scalable Vector Extension. CoRR, abs\/1803.06185","author":"Stephens Nigel","year":"2018","unstructured":"Nigel Stephens , Stuart Biles , Matthias Boettcher , Jacob Eapen , Mbou Eyole , Giacomo Gabrielli , Matt Horsnell , Grigorios Magklis , Alejandro Martinez , Nathana\u00ebl Pr\u00e9millieu , Alastair Reid , Alejandro Rico , and Paul Walker . 2018. The ARM Scalable Vector Extension. CoRR, abs\/1803.06185 ( 2018 ), arXiv:1803.06185. arxiv:1803.06185 Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro Martinez, Nathana\u00ebl Pr\u00e9millieu, Alastair Reid, Alejandro Rico, and Paul Walker. 2018. The ARM Scalable Vector Extension. CoRR, abs\/1803.06185 (2018), arXiv:1803.06185. arxiv:1803.06185"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2857721"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/LLVMHPC54804.2021.00009"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2764454"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2738003"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.626"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523437"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582047"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.1089"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446702"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322249"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00085"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3566054"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527440"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3591236","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,6]],"date-time":"2023-06-06T20:07:17Z","timestamp":1686082037000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3591236"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,6]]},"references-count":71,"journal-issue":{"issue":"PLDI","published-print":{"date-parts":[[2023,6,6]]}},"alternative-id":["10.1145\/3591236"],"URL":"https:\/\/doi.org\/10.1145\/3591236","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2023,6,6]]},"assertion":[{"value":"2023-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}