{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T20:42:28Z","timestamp":1730320948832,"version":"3.28.0"},"publisher-location":"New York, NY, USA","reference-count":59,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,11,17]],"date-time":"2019-11-17T00:00:00Z","timestamp":1573948800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,11,17]]},"DOI":"10.1145\/3295500.3356167","type":"proceedings-article","created":{"date-parts":[[2019,11,7]],"date-time":"2019-11-07T14:43:22Z","timestamp":1573137802000},"page":"1-19","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Pinpointing performance inefficiencies via lightweight variance profiling"],"prefix":"10.1145","author":[{"given":"Pengfei","family":"Su","sequence":"first","affiliation":[{"name":"College of William and Mary Williamsburg"}]},{"given":"Shuyin","family":"Jiao","sequence":"additional","affiliation":[{"name":"College of William and Mary Williamsburg"}]},{"given":"Milind","family":"Chabbi","sequence":"additional","affiliation":[{"name":"Scalable Machines Research"}]},{"given":"Xu","family":"Liu","sequence":"additional","affiliation":[{"name":"College of William and Mary Williamsburg"}]}],"member":"320","published-online":{"date-parts":[[2019,11,17]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"6","article-title":"HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency Computation","volume":"22","author":"Adhianto L.","year":"2010","unstructured":"L. Adhianto , S. Banerjee , M. Fagan , M. Krentel , G. Marin , J. Mellor-Crummey , and N. R. Tallent . 2010 . HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency Computation : Practice Expererience 22 , 6 (Apr 2010), 685--701. L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. 2010. HPCToolkit: Tools for Performance Analysis of Optimized Parallel Programs. Concurrency Computation : Practice Expererience 22, 6 (Apr 2010), 685--701.","journal-title":"Practice Expererience"},{"volume-title":"Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI '97)","author":"Ammons Glenn","key":"e_1_3_2_1_2_1","unstructured":"Glenn Ammons , Thomas Ball , and James R. Larus . 1997. Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling . In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI '97) . ACM, New York, NY, USA, 85--96. Glenn Ammons, Thomas Ball, and James R. Larus. 1997. Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI '97). ACM, New York, NY, USA, 85--96."},{"key":"e_1_3_2_1_3_1","unstructured":"M. Arnold and P. F. Sweeney. 1999. Approximating the calling context tree via sampling. Technical Report IBM. M. Arnold and P. F. Sweeney. 1999. Approximating the calling context tree via sampling. Technical Report IBM."},{"key":"e_1_3_2_1_4_1","unstructured":"Mona Attariyan Michael Chow and Jason Flinn. 2012. X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX Hollywood CA 307--320. Mona Attariyan Michael Chow and Jason Flinn. 2012. X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX Hollywood CA 307--320."},{"key":"e_1_3_2_1_6_1","unstructured":"Derek L. Bruening. 2004. Efficient Transparent and Comprehensive Runtime Code Manipulation. Ph.D. Dissertation. Cambridge MA USA. AAI0807735. Derek L. Bruening. 2004. Efficient Transparent and Comprehensive Runtime Code Manipulation. Ph.D. Dissertation. Cambridge MA USA. AAI0807735."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178499"},{"key":"e_1_3_2_1_8_1","unstructured":"Intel Corp. 2010. Intel Microarchitecture Codename Nehalem Performance Monitoring Unit Programming Guide. https:\/\/software.intel.com\/sites\/default\/files\/m\/5\/2\/c\/f\/1\/30320-Nehalem-PMU-Programming-Guide-Core.pdf. Intel Corp. 2010. Intel Microarchitecture Codename Nehalem Performance Monitoring Unit Programming Guide. https:\/\/software.intel.com\/sites\/default\/files\/m\/5\/2\/c\/f\/1\/30320-Nehalem-PMU-Programming-Guide-Core.pdf."},{"key":"e_1_3_2_1_9_1","unstructured":"Intel Corp. 2018. Intel VTune. https:\/\/software.intel.com\/en-us\/intel-vtune-amplifier-xe. Intel Corp. 2018. Intel VTune. https:\/\/software.intel.com\/en-us\/intel-vtune-amplifier-xe."},{"volume-title":"Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15)","author":"Curtsinger Charlie","key":"e_1_3_2_1_10_1","unstructured":"Charlie Curtsinger and Emery D. Berger . 2015. Coz: Finding Code That Counts with Causal Profiling . In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15) . ACM, New York, NY, USA, 184--197. Charlie Curtsinger and Emery D. Berger. 2015. Coz: Finding Code That Counts with Causal Profiling. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 184--197."},{"volume-title":"Tools for High Performance Computing","author":"DeRose Luiz","key":"e_1_3_2_1_11_1","unstructured":"Luiz DeRose , Bill Homer , Dean Johnson , Steve Kaufmann , and Heidi Poxon . 2008. Cray Performance Analysis Tools . In Tools for High Performance Computing . Springer Berlin Heidelberg , 191--199. Luiz DeRose, Bill Homer, Dean Johnson, Steve Kaufmann, and Heidi Poxon. 2008. Cray Performance Analysis Tools. In Tools for High Performance Computing. Springer Berlin Heidelberg, 191--199."},{"key":"e_1_3_2_1_12_1","unstructured":"Paul J. Drongowski. 2007. Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors. htttps:\/\/pdfs.semanticscholar.org\/5219\/4b43b8385ce39b2b08ecd409c753e0efafe5.pdf. Paul J. Drongowski. 2007. Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors. htttps:\/\/pdfs.semanticscholar.org\/5219\/4b43b8385ce39b2b08ecd409c753e0efafe5.pdf."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/1924943.1924954"},{"key":"e_1_3_2_1_14_1","first-page":"6","article-title":"The Scalasca Performance Toolset Architecture","volume":"22","author":"Geimer Markus","year":"2010","unstructured":"Markus Geimer , Felix Wolf , Brian J. N. Wylie , Erika \u00c1brah\u00e1m , Daniel Becker , and Bernd Mohr . 2010 . The Scalasca Performance Toolset Architecture . Concurr. Comput. : Pract. Exper. 22 , 6 (April 2010), 702--719. Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erika \u00c1brah\u00e1m, Daniel Becker, and Bernd Mohr. 2010. The Scalasca Performance Toolset Architecture. Concurr. Comput. : Pract. Exper. 22, 6 (April 2010), 702--719.","journal-title":"Concurr. Comput. : Pract. Exper."},{"volume-title":"Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction (SIGPLAN '82)","author":"Graham Susan L.","key":"e_1_3_2_1_15_1","unstructured":"Susan L. Graham , Peter B. Kessler , and Marshall K. Mckusick . 1982. Gprof: A Call Graph Execution Profiler . In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction (SIGPLAN '82) . ACM, New York, NY, USA, 120--126. Susan L. Graham, Peter B. Kessler, and Marshall K. Mckusick. 1982. Gprof: A Call Graph Execution Profiler. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction (SIGPLAN '82). ACM, New York, NY, USA, 120--126."},{"volume-title":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15)","author":"Haque Md E.","key":"e_1_3_2_1_16_1","unstructured":"Md E. Haque , Yong hun Eom , Yuxiong He , Sameh Elnikety , Ricardo Bianchini , and Kathryn S . McKinley. 2015. Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services . In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15) . ACM, New York, NY, USA, 161--175. Md E. Haque, Yong hun Eom, Yuxiong He, Sameh Elnikety, Ricardo Bianchini, and Kathryn S. McKinley. 2015. Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 161--175."},{"volume-title":"Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50 '17)","author":"Haque Md E.","key":"e_1_3_2_1_17_1","unstructured":"Md E. Haque , Yuxiong He , Sameh Elnikety , Thu D. Nguyen , Ricardo Bianchini , and Kathryn S . McKinley. 2017. Exploiting Heterogeneity for Tail Latency and Energy Efficiency . In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50 '17) . ACM, New York, NY, USA, 625--638. Md E. Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini, and Kathryn S. McKinley. 2017. Exploiting Heterogeneity for Tail Latency and Energy Efficiency. In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50 '17). ACM, New York, NY, USA, 625--638."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3054742"},{"volume-title":"Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17)","author":"Huang Jiamin","key":"e_1_3_2_1_19_1","unstructured":"Jiamin Huang , Barzan Mozafari , and Thomas F. Wenisch . 2017. Statistical Analysis of Latency Through Semantic Profiling . In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17) . ACM, New York, NY, USA, 64--79. Jiamin Huang, Barzan Mozafari, and Thomas F. Wenisch. 2017. Statistical Analysis of Latency Through Semantic Profiling. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17). ACM, New York, NY, USA, 64--79."},{"volume-title":"Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16)","author":"Jeon Myeongjae","key":"e_1_3_2_1_20_1","unstructured":"Myeongjae Jeon , Yuxiong He , Hwanju Kim , Sameh Elnikety , Scott Rixner , and Alan L. Cox . 2016. TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services . In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16) . ACM, New York, NY, USA, 129--141. Myeongjae Jeon, Yuxiong He, Hwanju Kim, Sameh Elnikety, Scott Rixner, and Alan L. Cox. 2016. TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 129--141."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-016-1691-1"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/800050.801837"},{"volume-title":"2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 919--932","author":"Karlin I.","key":"e_1_3_2_1_23_1","unstructured":"I. Karlin , A. Bhatele , J. Keasler , B. L. Chamberlain , J. Cohen , Z. Devito , R. Haque , D. Laney , E. Luke , F. Wang , D. Richards , M. Schulz , and C. H. Still . 2013. Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application . In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 919--932 . I. Karlin, A. Bhatele, J. Keasler, B. L. Chamberlain, J. Cohen, Z. Devito, R. Haque, D. Laney, E. Luke, F. Wang, D. Richards, M. Schulz, and C. H. Still. 2013. Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 919--932."},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association","author":"Kasikci Baris","year":"2014","unstructured":"Baris Kasikci , Thomas Ball , George Candea , John Erickson , and Madanlal Musuvathi . 2014 . Efficient Tracing of Cold Code via Bias-free Sampling . In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association , Berkeley, CA, USA, 243--254. Baris Kasikci, Thomas Ball, George Candea, John Erickson, and Madanlal Musuvathi. 2014. Efficient Tracing of Cold Code via Bias-free Sampling. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 243--254."},{"key":"e_1_3_2_1_25_1","unstructured":"Argonne National Laboratory. 2014. MPICH wiki. https:\/\/wiki.mpich.org\/mpich\/index.php. Argonne National Laboratory. 2014. MPICH wiki. https:\/\/wiki.mpich.org\/mpich\/index.php."},{"key":"e_1_3_2_1_26_1","unstructured":"Lawrence Livermore National Laboratory. 1995. Sweep3D Benchmark Code. http:\/\/www.llnl.gov\/asci_benchmarks\/asci\/limited\/sweep3d\/asci_sweep3d.html. Lawrence Livermore National Laboratory. 1995. Sweep3D Benchmark Code. http:\/\/www.llnl.gov\/asci_benchmarks\/asci\/limited\/sweep3d\/asci_sweep3d.html."},{"key":"e_1_3_2_1_27_1","unstructured":"Lawrence Livermore National Laboratory. 2013. ASC Sequoia Benchmark Codes. https:\/\/asc.llnl.gov\/sequoia\/benchmarks. Lawrence Livermore National Laboratory. 2013. ASC Sequoia Benchmark Codes. https:\/\/asc.llnl.gov\/sequoia\/benchmarks."},{"key":"e_1_3_2_1_28_1","unstructured":"Lawrence Livermore National Laboratory. 2018. CORAL-2 Benchmarks. https:\/\/asc.llnl.gov\/coral-2-benchmarks. Lawrence Livermore National Laboratory. 2018. CORAL-2 Benchmarks. https:\/\/asc.llnl.gov\/coral-2-benchmarks."},{"volume-title":"IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications. 1--9.","author":"Lai Z.","key":"e_1_3_2_1_29_1","unstructured":"Z. Lai , Y. Cui , M. Li , Z. Li , N. Dai , and Y. Chen . 2016. TailCutter: Wisely cutting tail latency in cloud CDN under cost constraints . In IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications. 1--9. Z. Lai, Y. Cui, M. Li, Z. Li, N. Dai, and Y. Chen. 2016. TailCutter: Wisely cutting tail latency in cloud CDN under cost constraints. In IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications. 1--9."},{"key":"e_1_3_2_1_30_1","unstructured":"Linux. 2012. perf_event_open - Linux man page. https:\/\/linux.die.net\/man\/2\/perf_event_open. Linux. 2012. perf_event_open - Linux man page. https:\/\/linux.die.net\/man\/2\/perf_event_open."},{"key":"e_1_3_2_1_31_1","unstructured":"Linux. 2015. Linux Perf Tool. https:\/\/perf.wiki.kernel.org\/index.php\/Main_Page. Linux. 2015. Linux Perf Tool. https:\/\/perf.wiki.kernel.org\/index.php\/Main_Page."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661198"},{"volume-title":"Proceedings of the 38th International Conference on Software Engineering (ICSE '16)","author":"Liu Tongping","key":"e_1_3_2_1_33_1","unstructured":"Tongping Liu , Charlie Curtsinger , and Emery D. Berger . 2016. DoubleTake: Fast and Precise Error Detection via Evidence-based Dynamic Analysis . In Proceedings of the 38th International Conference on Software Engineering (ICSE '16) . ACM, New York, NY, USA, 911--922. Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2016. DoubleTake: Fast and Precise Error Detection via Evidence-based Dynamic Analysis. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 911--922."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2555243.2555271"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/125\/1\/012087"},{"volume-title":"Proceedings of the First International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS I). ACM","author":"McLear R. E.","key":"e_1_3_2_1_37_1","unstructured":"R. E. McLear , D. M. Scheibelhut , and E. Tammaru . 1982. Guidelines for Creating a Debuggable Processor . In Proceedings of the First International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS I). ACM , New York, NY, USA, 100--106. R. E. McLear, D. M. Scheibelhut, and E. Tammaru. 1982. Guidelines for Creating a Debuggable Processor. In Proceedings of the First International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS I). ACM, New York, NY, USA, 100--106."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1015789220266"},{"key":"e_1_3_2_1_39_1","unstructured":"NERSC. 2016. NERSC-8 \/ Trinity Benchmarks. http:\/\/www.nersc.gov\/users\/computational-systems\/cori\/nersc-8-procurement\/trinity-nersc-8-rfp\/nersc-8-trinity-benchmarks. NERSC. 2016. NERSC-8 \/ Trinity Benchmarks. http:\/\/www.nersc.gov\/users\/computational-systems\/cori\/nersc-8-procurement\/trinity-nersc-8-rfp\/nersc-8-trinity-benchmarks."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250734.1250746"},{"key":"e_1_3_2_1_41_1","unstructured":"University of Maryland and University of Wisconsin. 2017. Putting the Performance in High Performance Computing. https:\/\/www.dyninst.org. University of Maryland and University of Wisconsin. 2017. Putting the Performance in High Performance Computing. https:\/\/www.dyninst.org."},{"key":"e_1_3_2_1_42_1","unstructured":"Oracle Corp. 2017. Oracle Solaris Studio. http:\/\/www.oracle.com\/technetwork\/server-storage\/solarisstudio\/overview\/index.html. Oracle Corp. 2017. Oracle Solaris Studio. http:\/\/www.oracle.com\/technetwork\/server-storage\/solarisstudio\/overview\/index.html."},{"volume-title":"Proceedings of the 5th European Conference on Computer Systems (EuroSys '10)","author":"Pesterev Aleksey","key":"e_1_3_2_1_43_1","unstructured":"Aleksey Pesterev , Nickolai Zeldovich , and Robert T. Morris . 2010. Locating Cache Performance Bottlenecks Using Data Profiling . In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10) . ACM, New York, NY, USA, 335--348. Aleksey Pesterev, Nickolai Zeldovich, and Robert T. Morris. 2010. Locating Cache Performance Bottlenecks Using Data Profiling. In Proceedings of the 5th European Conference on Computer Systems (EuroSys '10). ACM, New York, NY, USA, 335--348."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/3014904.3014911"},{"key":"e_1_3_2_1_45_1","unstructured":"Raja R. Sambasivan and Gregory R. Ganger. Submitted. Automated Diagnosis Without Predictability Is a Recipe for Failure. In Presented as part of the. USENIX. Raja R. Sambasivan and Gregory R. Ganger. Submitted. Automated Diagnosis Without Predictability Is a Recipe for Failure. In Presented as part of the. USENIX."},{"volume-title":"Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11)","author":"Sambasivan Raja R.","key":"e_1_3_2_1_46_1","unstructured":"Raja R. Sambasivan , Alice X. Zheng , Michael De Rosa , Elie Krevat , Spencer Whitman , Michael Stroucken , William Wang , Lianghong Xu , and Gregory R. Ganger . 2011. Diagnosing Performance Changes by Comparing Request Flows . In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11) . USENIX Association, Berkeley, CA, USA, 43--56. Raja R. Sambasivan, Alice X. Zheng, Michael De Rosa, Elie Krevat, Spencer Whitman, Michael Stroucken, William Wang, Lianghong Xu, and Gregory R. Ganger. 2011. Diagnosing Performance Changes by Comparing Request Flows. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). USENIX Association, Berkeley, CA, USA, 43--56."},{"key":"e_1_3_2_1_47_1","first-page":"2","article-title":"OpenSpeedShop: An open source infrastructure for parallel performance analysis. Sci","volume":"16","author":"Schulz Martin","year":"2008","unstructured":"Martin Schulz , Jim Galarowicz , Don Maghrak , William Hachfeld , David Montoya , and Scott Cranford . 2008 . OpenSpeedShop: An open source infrastructure for parallel performance analysis. Sci . Program. 16 , 2 -- 3 (April 2008), 105--121. Martin Schulz, Jim Galarowicz, Don Maghrak, William Hachfeld, David Montoya, and Scott Cranford. 2008. OpenSpeedShop: An open source infrastructure for parallel performance analysis. Sci. Program. 16, 2--3 (April 2008), 105--121.","journal-title":"Program."},{"key":"e_1_3_2_1_48_1","first-page":"2","article-title":"The Tau Parallel Performance","volume":"20","author":"Shende Sameer S.","year":"2006","unstructured":"Sameer S. Shende and Allen D. Malony . 2006 . The Tau Parallel Performance System. Int. J. High Perform. Comput. Appl. 20 , 2 (May 2006), 287--311. Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. Int. J. High Perform. Comput. Appl. 20, 2 (May 2006), 287--311.","journal-title":"System. Int. J. High Perform. Comput. Appl."},{"key":"e_1_3_2_1_49_1","first-page":"3","article-title":"IBM POWER7 performance modeling, verification, and evaluation","volume":"55","author":"Srinivas M.","year":"2011","unstructured":"M. Srinivas , B. Sinharoy , R. J. Eickemeyer , R. Raghavan , S. Kunkel , T. Chen , W. Maron , D. Flemming , A. Blanchard , P. Seshadri , J. W. Kellington , A. Mericas , A. E. Petruski , V. R. Indukuru , and S. Reyes . 2011 . IBM POWER7 performance modeling, verification, and evaluation . IBM JRD 55 , 3 (May-June 2011), 4:1--4:19. M. Srinivas, B. Sinharoy, R. J. Eickemeyer, R. Raghavan, S. Kunkel, T. Chen, W. Maron, D. Flemming, A. Blanchard, P. Seshadri, J. W. Kellington, A. Mericas, A. E. Petruski, V. R. Indukuru, and S. Reyes. 2011. IBM POWER7 performance modeling, verification, and evaluation. IBM JRD 55, 3 (May-June 2011), 4:1--4:19.","journal-title":"IBM JRD"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338923"},{"key":"e_1_3_2_1_51_1","volume-title":"Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs. In 2011 IEEE International Parallel Distributed Processing Symposium. 640--651","author":"Szebenyi Z.","year":"2011","unstructured":"Z. Szebenyi , T. Gamblin , M. Schulz , B. R. d. Supinski , F. Wolf , and B.J. N. Wylie . 2011 . Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs. In 2011 IEEE International Parallel Distributed Processing Symposium. 640--651 . Z. Szebenyi, T. Gamblin, M. Schulz, B. R. d. Supinski, F. Wolf, and B.J. N. Wylie. 2011. Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs. In 2011 IEEE International Parallel Distributed Processing Symposium. 640--651."},{"volume-title":"Proceedings of the 2010 ACM\/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10)","author":"Tallent Nathan R.","key":"e_1_3_2_1_52_1","unstructured":"Nathan R. Tallent , Laksono Adhianto , and John M . Mellor-Crummey. 2010. Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles . In Proceedings of the 2010 ACM\/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10) . IEEE Computer Society, Washington, DC, USA, 1--11. Nathan R. Tallent, Laksono Adhianto, and John M. Mellor-Crummey. 2010. Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles. In Proceedings of the 2010 ACM\/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10). IEEE Computer Society, Washington, DC, USA, 1--11."},{"volume-title":"Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09)","author":"Tallent Nathan R.","key":"e_1_3_2_1_53_1","unstructured":"Nathan R. Tallent , John M. Mellor-Crummey , and Michael W. Fagan . 2009. Binary Analysis for Measurement and Attribution of Program Performance . In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09) . ACM, New York, NY, USA, 441--452. Nathan R. Tallent, John M. Mellor-Crummey, and Michael W. Fagan. 2009. Binary Analysis for Measurement and Attribution of Program Performance. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). ACM, New York, NY, USA, 441--452."},{"key":"e_1_3_2_1_54_1","unstructured":"The Portland Group. 2011. PGPROF Profiler Guide Parallel Profiling for Scientists and Engineers. http:\/\/www.pgroup.com\/doc\/pgprofug.pdf. The Portland Group. 2011. PGPROF Profiler Guide Parallel Profiling for Scientists and Engineers. http:\/\/www.pgroup.com\/doc\/pgprofug.pdf."},{"volume-title":"Featherlight Reuse-Distance Measurement. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 440--453","author":"Wang Q.","key":"e_1_3_2_1_55_1","unstructured":"Q. Wang , X. Liu , and M. Chabbi . 2019 . Featherlight Reuse-Distance Measurement. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 440--453 . Q. Wang, X. Liu, and M. Chabbi. 2019. Featherlight Reuse-Distance Measurement. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 440--453."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1080\/00401706.1962.10490022"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3177159"},{"key":"e_1_3_2_1_58_1","unstructured":"Wikipedia. 2019. Algorithms for calculating variance. https:\/\/software.intel.com\/sites\/default\/files\/m\/5\/2\/c\/f\/1\/30320-Nehalem-PMU-Programming-Guide-Core.pdf. Wikipedia. 2019. Algorithms for calculating variance. https:\/\/software.intel.com\/sites\/default\/files\/m\/5\/2\/c\/f\/1\/30320-Nehalem-PMU-Programming-Guide-Core.pdf."},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330371"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915218"}],"event":{"name":"SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing","IEEE CS"],"location":"Denver Colorado","acronym":"SC '19"},"container-title":["Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3295500.3356167","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,8]],"date-time":"2023-01-08T14:00:31Z","timestamp":1673186431000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3295500.3356167"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,17]]},"references-count":59,"alternative-id":["10.1145\/3295500.3356167","10.1145\/3295500"],"URL":"https:\/\/doi.org\/10.1145\/3295500.3356167","relation":{},"subject":[],"published":{"date-parts":[[2019,11,17]]},"assertion":[{"value":"2019-11-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}