{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T13:11:08Z","timestamp":1740143468622,"version":"3.37.3"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2018,5,22]],"date-time":"2018-05-22T00:00:00Z","timestamp":1526947200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"European Commission under Horizon 2020 Research and Innovation Action","award":["688131"]},{"name":"\u201cWCET-Aware Parallelization of Model-Based Applications for Heterogeneous Parallel Systems (ARGO),\u201d"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2018,5,31]]},"abstract":"One of the biggest challenges in multicore platforms is shared cache management, especially for data-dominant applications. Two commonly used approaches for increasing shared cache utilization are cache partitioning and loop tiling. However, state-of-the-art compilers lack efficient cache partitioning and loop tiling methods for two reasons. First, cache partitioning and loop tiling are strongly coupled together, and thus addressing them separately is simply not effective. Second, cache partitioning and loop tiling must be tailored to the target shared cache architecture details and the memory characteristics of the corunning workloads.<\/jats:p>\n To the best of our knowledge, this is the first time that a methodology provides (1) a theoretical foundation in the above-mentioned cache management mechanisms and (2) a unified framework to orchestrate these two mechanisms in tandem (not separately). Our approach manages to lower the number of main memory accesses by an order of magnitude keeping at the same time the number of arithmetic\/addressing instructions to a minimal level. We motivate this work by showcasing that cache partitioning, loop tiling, data array layouts, shared cache architecture details (i.e., cache size and associativity), and the memory reuse patterns of the executing tasks must be addressed together as one problem, when a (near)-optimal solution is requested. To this end, we present a search space exploration analysis where our proposal is able to offer a vast deduction in the required search space.<\/jats:p>","DOI":"10.1145\/3202663","type":"journal-article","created":{"date-parts":[[2018,5,23]],"date-time":"2018-05-23T15:08:42Z","timestamp":1527088122000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Combining Software Cache Partitioning and Loop Tiling for Effective Shared Cache Management"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9591-913X","authenticated-orcid":false,"given":"Kelefouras","family":"Vasilios","sequence":"first","affiliation":[{"name":"Technological Educational Institute of Western Greece, Antirio, Greece"}]},{"given":"Keramidas","family":"Georgios","sequence":"additional","affiliation":[{"name":"Technological Educational Institute of Western Greece, Antirio, Greece"}]},{"given":"Voros","family":"Nikolaos","sequence":"additional","affiliation":[{"name":"Technological Educational Institute of Western Greece, Antirio, Greece"}]}],"member":"320","published-online":{"date-parts":[[2018,5,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.37"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/998300.997196"},{"volume-title":"Linear Equations and Inequalities","author":"Banerjee Utpal","key":"e_1_2_1_3_1","unstructured":"Utpal Banerjee . 1993. Linear Equations and Inequalities . Springer , Boston, MA , 49--94. Utpal Banerjee. 1993. Linear Equations and Inequalities. Springer, Boston, MA, 49--94."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2013.6495008"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1594835.1504209"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1379022.1375595"},{"volume-title":"Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI\u201908)","author":"Bondhugula Uday","key":"e_1_2_1_8_1","unstructured":"Uday Bondhugula , J. Ramanujam , and P. Sadayppan . 2008b. PLuTo: A practical and fully automatic polyhedral program optimization system . In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI\u201908) . Uday Bondhugula, J. Ramanujam, and P. Sadayppan. 2008b. PLuTo: A practical and fully automatic polyhedral program optimization system. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI\u201908)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTCSA.2008.42"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1274971.1275005"},{"key":"e_1_2_1_11_1","series-title":"Series on Software Engineering and Knowledge Engineering","volume-title":"Data Structures and Algorithms","author":"Chang Shi-Kuo","unstructured":"Shi-Kuo Chang . 2003. Data Structures and Algorithms . Series on Software Engineering and Knowledge Engineering , Vol. 13 . World Scientific . Shi-Kuo Chang. 2003. Data Structures and Algorithms. Series on Software Engineering and Knowledge Engineering, Vol. 13. World Scientific."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1070891.1065921"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-006-7954-5"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941553.1941568"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346180"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.9"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.36"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669176"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1809028.1806605"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2009.55"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cl.2015.01.003"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362691"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ECRTS.2013.19"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v16:2\/3"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/996893.996863"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2007.9"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1509864.1509865"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669172"},{"volume-title":"Proceedings of the 2012 IEEE\/IFIP 42nd International Conference on Dependable Systems and Networks Workshops (DSN-W\u201912)","author":"Lidman Jacob","key":"e_1_2_1_29_1","unstructured":"Jacob Lidman , Daniel J. Quinlan , Chunhua Liao , and Sally A . McKee. 2012. ROSE: FTTransform-A source-to-source translation framework for exascale fault-tolerance research . In Proceedings of the 2012 IEEE\/IFIP 42nd International Conference on Dependable Systems and Networks Workshops (DSN-W\u201912) . IEEE, 1--6. Jacob Lidman, Daniel J. Quinlan, Chunhua Liao, and Sally A. McKee. 2012. ROSE: FTTransform-A source-to-source translation framework for exascale fault-tolerance research. In Proceedings of the 2012 IEEE\/IFIP 42nd International Conference on Dependable Systems and Networks Workshops (DSN-W\u201912). IEEE, 1--6."},{"volume-title":"Proceedings of the IEEE 14th International Symposium on High Performance Computer Architecture (HPCA\u201908)","author":"Lin Jiang","key":"e_1_2_1_30_1","unstructured":"Jiang Lin , Qingda Lu , Xiaoning Ding , Zhao Zhang , Xiaodong Zhang , and P. Sadayappan . 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems . In Proceedings of the IEEE 14th International Symposium on High Performance Computer Architecture (HPCA\u201908) . IEEE, 367--378. http:\/\/dblp.uni-trier.de\/db\/conf\/hpca\/hpca2008.html. Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of the IEEE 14th International Symposium on High Performance Computer Architecture (HPCA\u201908). IEEE, 367--378. http:\/\/dblp.uni-trier.de\/db\/conf\/hpca\/hpca2008.html."},{"volume-title":"Proceedings of the 9th Annual IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201911)","author":"Liu Jun","key":"e_1_2_1_31_1","unstructured":"Jun Liu , Yuanrui Zhang , Wei Ding , and Mahmut T. Kandemir . 2011. On-chip cache hierarchy-aware tile scheduling for multicore machines . In Proceedings of the 9th Annual IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201911) . IEEE, 161--170. http:\/\/dblp.uni-trier.de\/db\/conf\/cgo\/cgo2011.html. Jun Liu, Yuanrui Zhang, Wei Ding, and Mahmut T. Kandemir. 2011. On-chip cache hierarchy-aware tile scheduling for multicore machines. In Proceedings of the 9th Annual IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201911). IEEE, 161--170. http:\/\/dblp.uni-trier.de\/db\/conf\/cgo\/cgo2011.html."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/646053.677574"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/1786054.1786086"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-39707-6_5"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2038698.2038711"},{"key":"e_1_2_1_36_1","unstructured":"L.-N. Pouchet. 2012. PolyBench\/C Benchmark Suite. Retrieved from http:\/\/web.cs.ucla.edu\/ pouchet\/software\/polybench\/. L.-N. Pouchet. 2012. PolyBench\/C Benchmark Suite. Retrieved from http:\/\/web.cs.ucla.edu\/ pouchet\/software\/polybench\/."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1698772.1698774"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273442.1250780"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/780822.781141"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6169036"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-011-0182-5"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the Workshop on the Interaction Between Operating Systems and Computer Architecture.","author":"Tam David","year":"2007","unstructured":"David Tam , Reza Azimi , Livio Soares , and Michael Stumm . 2007 . Managing shared L2 caches on multicore systems in software . In Proceedings of the Workshop on the Interaction Between Operating Systems and Computer Architecture. David Tam, Reza Azimi, Livio Soares, and Michael Stumm. 2007. Managing shared L2 caches on multicore systems in software. In Proceedings of the Workshop on the Interaction Between Operating Systems and Computer Architecture."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400705"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(00)00086-7"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628104"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837274.1837309"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1519065.1519076"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259044"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3202663","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,31]],"date-time":"2022-12-31T18:56:32Z","timestamp":1672512992000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3202663"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,5,22]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,5,31]]}},"alternative-id":["10.1145\/3202663"],"URL":"https:\/\/doi.org\/10.1145\/3202663","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2018,5,22]]},"assertion":[{"value":"2017-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-05-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}