{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,1,15]],"date-time":"2024-01-15T20:31:14Z","timestamp":1705350674490},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,9,30]]},"abstract":"\n In this article, we first characterize register operand value locality in shader programs of modern gaming applications and observe that there is a high likelihood of one of the register operands of several multiply, logical-and, and similar operations being zero, dynamically. We provide intuition, examples, and a quantitative characterization for how zeros originate dynamically in these programs. Next, we show that this dynamic behavior can be gainfully exploited with a profile-guided code optimization called\n Zeroploit<\/jats:italic>\n that transforms targeted code regions into a zero-(value-)specialized fast path and a default slow path. The fast path benefits from zero-specialization in two ways, namely: (a) the backward slice of the\n other<\/jats:italic>\n operand of a given multiply or logical-and can be skipped dynamically, provided the only use of that other operand is in the given instruction, and (b) the forward slice of instructions originating at the given instruction can be zero-specialized, potentially triggering further backward slice specializations from operations of that forward slice as well. Such specialization helps the fast path avoid redundant dynamic computations as well as memory fetches, while the fast-slow versioning transform helps preserve functional correctness. With an offline value profiler and manually optimized shader programs, we demonstrate that\n Zeroploit<\/jats:italic>\n is able to achieve an average speedup of 35.8% for targeted shader programs, amounting to an average frame-rate speedup of 2.8% across a collection of modern gaming applications on an NVIDIA\u00ae GeForce RTX\u2122 2080 GPU.\n <\/jats:p>","DOI":"10.1145\/3394284","type":"journal-article","created":{"date-parts":[[2020,7,7]],"date-time":"2020-07-07T12:32:10Z","timestamp":1594125130000},"page":"1-26","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Zeroploit<\/i>"],"prefix":"10.1145","volume":"17","author":[{"given":"Ram","family":"Rangan","sequence":"first","affiliation":[{"name":"NVIDIA, Karnataka, India"}]},{"given":"Mark W.","family":"Stephenson","sequence":"additional","affiliation":[{"name":"NVIDIA, USA"}]},{"given":"Aditya","family":"Ukarande","sequence":"additional","affiliation":[{"name":"NVIDIA, Karnataka, India"}]},{"given":"Shyam","family":"Murthy","sequence":"additional","affiliation":[{"name":"University of Wisconsin, WI, USA"}]},{"given":"Virat","family":"Agarwal","sequence":"additional","affiliation":[{"name":"Xilinx, Hyderabad, Telangana, India"}]},{"given":"Marc","family":"Blackstein","sequence":"additional","affiliation":[{"name":"NVIDIA, Hillsboro, OR, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,8,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001138"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/567067.567085"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/321119.321121"},{"key":"e_1_2_1_4_1","unstructured":"Louis Bavoil. 2019. The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload. Retrieved from https:\/\/devblogs.nvidia.com\/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload\/. Louis Bavoil. 2019. The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload. Retrieved from https:\/\/devblogs.nvidia.com\/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload\/."},{"key":"e_1_2_1_5_1","unstructured":"Chris Brennan. 2016. Delta Color Compression Overview. Retrieved from https:\/\/gpuopen.com\/dcc-overview\/. Chris Brennan. 2016. Delta Color Compression Overview. Retrieved from https:\/\/gpuopen.com\/dcc-overview\/."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/266800.266825"},{"key":"e_1_2_1_7_1","volume-title":"Value profiling and optimization. J. Instruct. Level Parallel. 1 (Mar","author":"Calder Brad","year":"1999","unstructured":"Brad Calder , Peter Feller , and Alan Eustace . 1999. Value profiling and optimization. J. Instruct. Level Parallel. 1 (Mar . 1999 ). Retrieved from https:\/\/www.jilp.org\/vol1\/v1paper2.pdf. Brad Calder, Peter Feller, and Alan Eustace. 1999. Value profiling and optimization. J. Instruct. Level Parallel. 1 (Mar. 1999). Retrieved from https:\/\/www.jilp.org\/vol1\/v1paper2.pdf."},{"key":"e_1_2_1_8_1","first-page":"9","article-title":"Value-sensitive automatic code specialization for embedded software","volume":"21","author":"Chung Eui-Young","year":"2002","unstructured":"Eui-Young Chung , B. Luca , G. DeMicheli , G. Luculli , and M. Carilli . 2002 . Value-sensitive automatic code specialization for embedded software . IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 21 , 9 (Sep. 2002). Eui-Young Chung, B. Luca, G. DeMicheli, G. Luculli, and M. Carilli. 2002. Value-sensitive automatic code specialization for embedded software. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 21, 9 (Sep. 2002).","journal-title":"IEEE Trans. Comput.-Aided Design Integr. Circ. Syst."},{"key":"e_1_2_1_10_1","unstructured":"Microprocessor Standards Committee. 2019. 754-2019-IEEE Standard for Floating-Point Arithmetic. Retrieved from https:\/\/ieeexplore.ieee.org\/servlet\/opac?punumber=8766227. Microprocessor Standards Committee. 2019. 754-2019-IEEE Standard for Floating-Point Arithmetic. Retrieved from https:\/\/ieeexplore.ieee.org\/servlet\/opac?punumber=8766227."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Charles Consel Luke Hornof Fran\u00e7ois No\u00ebl Jacques Noy\u00e9 and Nicolae Volansche. 1996. A uniform approach for compile-time and run-time specialization. In Selected Papers from the International Seminar on Partial Evaluation. Charles Consel Luke Hornof Fran\u00e7ois No\u00ebl Jacques Noy\u00e9 and Nicolae Volansche. 1996. A uniform approach for compile-time and run-time specialization. In Selected Papers from the International Seminar on Partial Evaluation.","DOI":"10.1007\/3-540-61580-6_4"},{"key":"e_1_2_1_12_1","unstructured":"Microsoft Corporation. 2015. Fixed Order of Pipeline Results. Retrieved from https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/archive\/D3D11_3_FunctionalSpec.htm#4.2%20Fixed%20Order%20of%20Pipeline%20Results. Microsoft Corporation. 2015. Fixed Order of Pipeline Results. Retrieved from https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/archive\/D3D11_3_FunctionalSpec.htm#4.2%20Fixed%20Order%20of%20Pipeline%20Results."},{"key":"e_1_2_1_13_1","unstructured":"Microsoft Corporation. 2015. Unordered Access Views. Retrieved from https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/archive\/D3D11_3_FunctionalSpec.htm#UAVs. Microsoft Corporation. 2015. Unordered Access Views. Retrieved from https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/archive\/D3D11_3_FunctionalSpec.htm#UAVs."},{"key":"e_1_2_1_14_1","unstructured":"Microsoft Corporation. 2018. Atomic Iadd. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/atomic-iadd--sm5---asm. Microsoft Corporation. 2018. Atomic Iadd. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/atomic-iadd--sm5---asm."},{"key":"e_1_2_1_15_1","unstructured":"Microsoft Corporation. 2018. Direct3D 11 Graphics. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3d11\/atoc-dx-graphics-direct3d-11. Microsoft Corporation. 2018. Direct3D 11 Graphics. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3d11\/atoc-dx-graphics-direct3d-11."},{"key":"e_1_2_1_16_1","unstructured":"Microsoft Corporation. 2018. Direct3D 12 Graphics. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3d12\/direct3d-12-graphics. Microsoft Corporation. 2018. Direct3D 12 Graphics. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3d12\/direct3d-12-graphics."},{"key":"e_1_2_1_17_1","unstructured":"Microsoft Corporation. 2018. Effect-Compiler Tool. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dtools\/fxc. Microsoft Corporation. 2018. Effect-Compiler Tool. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dtools\/fxc."},{"key":"e_1_2_1_18_1","unstructured":"Microsoft Corporation. 2018. High Level Shading Language. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/dx-graphics-hlsl. Microsoft Corporation. 2018. High Level Shading Language. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/dx-graphics-hlsl."},{"key":"e_1_2_1_19_1","unstructured":"Microsoft Corporation. 2018. movc (sm4-asm). Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/movc--sm4---asm. Microsoft Corporation. 2018. movc (sm4-asm). Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/movc--sm4---asm."},{"key":"e_1_2_1_20_1","unstructured":"Microsoft Corporation. 2018. Shader Model 4 Assembly (DirectX HLSL)-dcl_globalFlags. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/dcl-globalflags. Microsoft Corporation. 2018. Shader Model 4 Assembly (DirectX HLSL)-dcl_globalFlags. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/dcl-globalflags."},{"key":"e_1_2_1_21_1","unstructured":"Microsoft Corporation. 2018. Shader Model 5 Assembly (DirectX HLSL). Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/shader-model-5-assembly--directx-hlsl. Microsoft Corporation. 2018. Shader Model 5 Assembly (DirectX HLSL). Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/shader-model-5-assembly--directx-hlsl."},{"key":"e_1_2_1_22_1","unstructured":"Microsoft Corporation. 2018. Sync. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/sync--sm5---asm. Microsoft Corporation. 2018. Sync. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/sync--sm5---asm."},{"key":"e_1_2_1_23_1","unstructured":"Microsoft Corporation. 2018. Unordered Access Buffer or Texture. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3d11\/direct3d-11-advanced-stages-cs-resources#unordered-access-buffer-or-texture. Microsoft Corporation. 2018. Unordered Access Buffer or Texture. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3d11\/direct3d-11-advanced-stages-cs-resources#unordered-access-buffer-or-texture."},{"key":"e_1_2_1_24_1","unstructured":"Microsoft Corporation. 2018. Variable Syntax. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/dx-graphics-hlsl-variable-syntax. Microsoft Corporation. 2018. Variable Syntax. Retrieved from https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/direct3dhlsl\/dx-graphics-hlsl-variable-syntax."},{"key":"e_1_2_1_25_1","unstructured":"Microsoft Corporation. 2019. DirectX Intermediate Language. Retrieved from https:\/\/github.com\/Microsoft\/DirectXShaderCompiler\/blob\/master\/docs\/DXIL.rst. Microsoft Corporation. 2019. DirectX Intermediate Language. Retrieved from https:\/\/github.com\/Microsoft\/DirectXShaderCompiler\/blob\/master\/docs\/DXIL.rst."},{"key":"e_1_2_1_26_1","first-page":"41","article-title":"Geforce Game Ready Driver","volume":"441","author":"NVIDIA Corporation","year":"2019","unstructured":"NVIDIA Corporation . 2019 . Geforce Game Ready Driver , Version 441 . 41 -WHQL. Retrieved from https:\/\/www.geforce.com\/drivers\/results\/155060. NVIDIA Corporation. 2019. Geforce Game Ready Driver, Version 441.41-WHQL. Retrieved from https:\/\/www.geforce.com\/drivers\/results\/155060.","journal-title":"Version"},{"key":"e_1_2_1_27_1","unstructured":"NVIDIA Corporation. 2019. Nsight 2019.6. Retrieved from https:\/\/developer.nvidia.com\/nsight-graphics. NVIDIA Corporation. 2019. Nsight 2019.6. Retrieved from https:\/\/developer.nvidia.com\/nsight-graphics."},{"key":"e_1_2_1_28_1","unstructured":"NVIDIA Corporation. 2019. Parallel Thread Execution ISA: Application Guide. Retrieved from https:\/\/docs.nvidia.com\/pdf\/ptx_isa_6.5.pdf. NVIDIA Corporation. 2019. Parallel Thread Execution ISA: Application Guide. Retrieved from https:\/\/docs.nvidia.com\/pdf\/ptx_isa_6.5.pdf."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2013.6495006"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201913)","author":"Gilani S. Z.","unstructured":"S. Z. Gilani , N. S. Kim , and M. J. Schulte . 2013. Power-efficient computing for compute-intensive GPGPU applications . In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201913) . 330--341. S. Z. Gilani, N. S. Kim, and M. J. Schulte. 2013. Power-efficient computing for compute-intensive GPGPU applications. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201913). 330--341."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201999)","author":"Grant Brian","unstructured":"Brian Grant , Matthai Philipose , Markus Mock , Craig Chambers , and Susan J. Eggers . 1999. An evaluation of staged run-time optimizations in DyC . In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201999) . 293--304. Brian Grant, Matthai Philipose, Markus Mock, Craig Chambers, and Susan J. Eggers. 1999. An evaluation of staged run-time optimizations in DyC. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI\u201999). 293--304."},{"key":"e_1_2_1_33_1","volume-title":"NVIDIA GeForce RTX 2080 SUPER (8GB Founder).","author":"Hagedoorn Hilbert","year":"2019","unstructured":"Hilbert Hagedoorn . 2019 . NVIDIA GeForce RTX 2080 SUPER (8GB Founder). Retrieved from https:\/\/www.guru3d.com\/articles-pages\/geforce-rtx-2080-super-review,1.html. Hilbert Hagedoorn. 2019. NVIDIA GeForce RTX 2080 SUPER (8GB Founder). Retrieved from https:\/\/www.guru3d.com\/articles-pages\/geforce-rtx-2080-super-review,1.html."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916)","author":"Han Song","unstructured":"Song Han , Xingyu Liu , Huizi Mao , Jing Pu , Ardavan Pedram , Mark A. Horowitz , and William J. Dally . 2016. EIE: Efficient inference engine on compressed deep neural network . In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916) . 243--254. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916). 243--254."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/366062.366072"},{"key":"e_1_2_1_36_1","volume-title":"Writing High-Level","author":"Hyde Randall","unstructured":"Randall Hyde . 2006. Writing Great Code, Volume 2: Thinking Low-Level , Writing High-Level . No Starch Press, Chapter 13, 427--435. Randall Hyde. 2006. Writing Great Code, Volume 2: Thinking Low-Level, Writing High-Level. No Starch Press, Chapter 13, 427--435."},{"key":"e_1_2_1_37_1","unstructured":"The Khronos Group Inc.[n.d.]. OpenGL Overview. Retrieved from https:\/\/www.opengl.org\/documentation\/. The Khronos Group Inc.[n.d.]. OpenGL Overview. Retrieved from https:\/\/www.opengl.org\/documentation\/."},{"key":"e_1_2_1_38_1","unstructured":"The Khronos Group Inc.2018. Vulkan Overview. Retrieved from https:\/\/www.khronos.org\/vulkan\/. The Khronos Group Inc.2018. Vulkan Overview. Retrieved from https:\/\/www.khronos.org\/vulkan\/."},{"key":"e_1_2_1_39_1","volume-title":"Partial Evaluation and Automatic Program Generation","author":"Jones Neil D.","unstructured":"Neil D. Jones , Carsten K. Gomard , and Peter Sestoft . 1993. Partial Evaluation and Automatic Program Generation . Prentice-Hall , Upper Saddle River, NJ. Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. 1993. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Upper Saddle River, NJ."},{"key":"e_1_2_1_40_1","unstructured":"Baldur Karlsson. 2019. Renderdoc v1.5. Retrieved from https:\/\/renderdoc.org\/docs\/index.html. Baldur Karlsson. 2019. Renderdoc v1.5. Retrieved from https:\/\/renderdoc.org\/docs\/index.html."},{"key":"e_1_2_1_41_1","unstructured":"John Kessenich Dave Baldwin and Randi Rost. 2017. The OpenGL Shading Language. Retrieved from https:\/\/www.khronos.org\/registry\/OpenGL\/specs\/gl\/GLSLangSpec.4.50.pdf. John Kessenich Dave Baldwin and Randi Rost. 2017. The OpenGL Shading Language. Retrieved from https:\/\/www.khronos.org\/registry\/OpenGL\/specs\/gl\/GLSLangSpec.4.50.pdf."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485934"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the 27th International Symposium on Computer Architecture.","author":"Kevin","unstructured":"Kevin M. Lepak and Mikko H. Lipasti. 2000. On the value locality of store instructions . In Proceedings of the 27th International Symposium on Computer Architecture. Kevin M. Lepak and Mikko H. Lipasti. 2000. On the value locality of store instructions. In Proceedings of the 27th International Symposium on Computer Architecture."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the 33rd Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201900)","author":"Kevin","unstructured":"Kevin M. Lepak and Mikko H. Lipasti. 2000. Silent stores for free . In Proceedings of the 33rd Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201900) . 22--31. Kevin M. Lepak and Mikko H. Lipasti. 2000. Silent stores for free. In Proceedings of the 33rd Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201900). 22--31."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the 29th Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201996)","author":"Mikko","unstructured":"Mikko H. Lipasti and John Paul Shen. 1996. Exceeding the dataflow limit via value prediction . In Proceedings of the 29th Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201996) . 226--237. Mikko H. Lipasti and John Paul Shen. 1996. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201996). 226--237."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/237090.237173"},{"key":"e_1_2_1_47_1","unstructured":"Future Mark. 2019. 3DMARK\u00ae Technical Guide. Retrieved from https:\/\/s3.amazonaws.com\/download-aws.futuremark.com\/3dmark-technical-guide.pdf. Future Mark. 2019. 3DMARK\u00ae Technical Guide. Retrieved from https:\/\/s3.amazonaws.com\/download-aws.futuremark.com\/3dmark-technical-guide.pdf."},{"key":"e_1_2_1_48_1","unstructured":"D. K. McAllister S. E. Molnar Jr. J. F. Duluk E. M. Kilgariff P. R. Brown C. J. Amsinck J. M. O\u2019Connor J. M. Burgess G. A. Muthler and J. Robertson. 2012. Zero Bandwidth Clears. United States Patent No. 8330766. D. K. McAllister S. E. Molnar Jr. J. F. Duluk E. M. Kilgariff P. R. Brown C. J. Amsinck J. M. O\u2019Connor J. M. Burgess G. A. Muthler and J. Robertson. 2012. Zero Bandwidth Clears. United States Patent No. 8330766."},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 7th International Symposium on Static Analysis (SAS\u201900)","author":"Muth Robert","unstructured":"Robert Muth , Scott A. Watterson , and Saumya K. Debray . 2000. Code specialization based on value profiles . In Proceedings of the 7th International Symposium on Static Analysis (SAS\u201900) . Springer-Verlag, London, 340--359. Robert Muth, Scott A. Watterson, and Saumya K. Debray. 2000. Code specialization based on value profiles. In Proceedings of the 7th International Symposium on Static Analysis (SAS\u201900). Springer-Verlag, London, 340--359."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 44th Annual International Symposium on Computer Architecture. 27--40","author":"Parashar Angshuman","unstructured":"Angshuman Parashar , Minsoo Rhu , Anurag Mukkara , Antonio Puglielli , Rangharajan Venkatesan , Brucek Khailany , Joel Emer , Stephen W. Keckler , and William J. Dally . 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks . In Proceedings of the 44th Annual International Symposium on Computer Architecture. 27--40 . Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 27--40."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2968456.2968476"},{"key":"e_1_2_1_52_1","volume-title":"Intel MMX for multimedia PCs. Commun. ACM 40, 1","author":"Peleg Alex","year":"1997","unstructured":"Alex Peleg , Sam Wilkie , and Uri Weiser . 1997. Intel MMX for multimedia PCs. Commun. ACM 40, 1 ( 1997 ). Alex Peleg, Sam Wilkie, and Uri Weiser. 1997. Intel MMX for multimedia PCs. Commun. ACM 40, 1 (1997)."},{"key":"e_1_2_1_53_1","volume-title":"NVIDIA Geforce RTX 2080","author":"Powerup Tech","year":"2018","unstructured":"Tech Powerup . 2018 . NVIDIA Geforce RTX 2080 . Retrieved from https:\/\/www.techpowerup.com\/gpu-specs\/geforce-rtx- 2080.c3224. Tech Powerup. 2018. NVIDIA Geforce RTX 2080. Retrieved from https:\/\/www.techpowerup.com\/gpu-specs\/geforce-rtx-2080.c3224."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the ACM Workshop on Feedback Directed and Dynamic Optimization.","author":"Sastry S. Subramanya","unstructured":"S. Subramanya Sastry , Rastilav Bodik , and James E. Smith . 2000. Characterizing coarse-grained reuse of computation . In Proceedings of the ACM Workshop on Feedback Directed and Dynamic Optimization. S. Subramanya Sastry, Rastilav Bodik, and James E. Smith. 2000. Characterizing coarse-grained reuse of computation. In Proceedings of the ACM Workshop on Feedback Directed and Dynamic Optimization."},{"key":"e_1_2_1_56_1","first-page":"6","article-title":"SparCE: Sparsity aware general-purpose core extensions to accelerate deep neural networks","volume":"68","author":"Sen Sanchari","year":"2018","unstructured":"Sanchari Sen , Shubham Jain , Swagath Venkataramani , and Anand Raghunathan . 2018 . SparCE: Sparsity aware general-purpose core extensions to accelerate deep neural networks . IEEE Trans. Comput. 68 , 6 (Nov. 2018). Sanchari Sen, Shubham Jain, Swagath Venkataramani, and Anand Raghunathan. 2018. SparCE: Sparsity aware general-purpose core extensions to accelerate deep neural networks. IEEE Trans. Comput. 68, 6 (Nov. 2018).","journal-title":"IEEE Trans. Comput."},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA\u201905)","author":"Shankar Ajeet","unstructured":"Ajeet Shankar , S. Subramanya Sastry , Rastislav Bod\u00edk , and James E. Smith . 2005. Runtime specialization with optimistic heap analysis . In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA\u201905) . 327--343. Ajeet Shankar, S. Subramanya Sastry, Rastislav Bod\u00edk, and James E. Smith. 2005. Runtime specialization with optimistic heap analysis. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA\u201905). 327--343."},{"key":"e_1_2_1_59_1","volume-title":"The NVIDIA Geforce RTX 2080 Super Review: Memories of the Future.","author":"Smith Ryan","year":"2019","unstructured":"Ryan Smith . 2019 . The NVIDIA Geforce RTX 2080 Super Review: Memories of the Future. Retrieved from https:\/\/www.anandtech.com\/show\/14663\/the-nvidia-geforce-rtx-2080-super-review\/3. Ryan Smith. 2019. The NVIDIA Geforce RTX 2080 Super Review: Memories of the Future. Retrieved from https:\/\/www.anandtech.com\/show\/14663\/the-nvidia-geforce-rtx-2080-super-review\/3."},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA\u201997)","author":"Sodani Avinash","unstructured":"Avinash Sodani and Gurindar S. Sohi . 1997. Dynamic instruction reuse . In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA\u201997) . 194--205. Avinash Sodani and Gurindar S. Sohi. 1997. Dynamic instruction reuse. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA\u201997). 194--205."},{"key":"e_1_2_1_61_1","volume-title":"Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA\u201915)","author":"Stephenson Mark","unstructured":"Mark Stephenson , Siva Kumar Sastry Hari , Yunsup Lee , Eiman Ebrahimi , Daniel R. Johnson , David Nellans , Mike O\u2019Connor , and Stephen W. Keckler . 2015. Flexible software profiling of GPU architectures . In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA\u201915) . 185--197. Mark Stephenson, Siva Kumar Sastry Hari, Yunsup Lee, Eiman Ebrahimi, Daniel R. Johnson, David Nellans, Mike O\u2019Connor, and Stephen W. Keckler. 2015. Flexible software profiling of GPU architectures. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA\u201915). 185--197."},{"key":"e_1_2_1_62_1","volume-title":"NVIDIA GeForce RTX 2080 Super Review.","author":"Walton Jarred","year":"2019","unstructured":"Jarred Walton . 2019 . NVIDIA GeForce RTX 2080 Super Review. Retrieved from https:\/\/www.pcgamer.com\/nvidia-geforce-rtx-2080-super-review\/. Jarred Walton. 2019. NVIDIA GeForce RTX 2080 Super Review. Retrieved from https:\/\/www.pcgamer.com\/nvidia-geforce-rtx-2080-super-review\/."},{"key":"e_1_2_1_63_1","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201916)","author":"Wong D.","unstructured":"D. Wong , N. S. Kim , and M. Annavaram . 2016. Approximating warps with intra-warp operand value similarity . In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201916) . 176--187. D. Wong, N. S. Kim, and M. Annavaram. 2016. Approximating warps with intra-warp operand value similarity. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201916). 176--187."},{"key":"e_1_2_1_64_1","volume-title":"Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920)","author":"Yeh Tsung Tai","unstructured":"Tsung Tai Yeh , Roland N. Green , and Timothy G. Rogers. 2020. Dimensionality-aware redundant SIMT instruction elimination . In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920) . 1327--1340. Tsung Tai Yeh, Roland N. Green, and Timothy G. Rogers. 2020. Dimensionality-aware redundant SIMT instruction elimination. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920). 1327--1340."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783723"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394284","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T09:17:57Z","timestamp":1672564677000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394284"}},"subtitle":["Exploiting Zero Valued Operands in Interactive Gaming Applications"],"short-title":[],"issued":{"date-parts":[[2020,8,3]]},"references-count":61,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,9,30]]}},"alternative-id":["10.1145\/3394284"],"URL":"https:\/\/doi.org\/10.1145\/3394284","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,3]]},"assertion":[{"value":"2020-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}