{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T06:05:52Z","timestamp":1672553152855},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2017,9,24]],"date-time":"2017-09-24T00:00:00Z","timestamp":1506211200000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["EEC-0642422 and IIP-1161022"],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"I\/UCRC"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2017,3,31]]},"abstract":"\n The modern processor landscape is a varied and diverse community. As such, developers need a way to quickly and fairly compare various devices for use with particular applications. This article expands the authors\u2019 previously published computational-density metrics and presents an analysis of a new generation of various device architectures, including CPU, DSP, FPGA, GPU, and hybrid architectures. Also, new memory metrics are added to expand the existing suite of metrics to characterize the memory resources on various processing devices. Finally, a new relational metric,\n realizable utilization (RU)<\/jats:italic>\n , is introduced, which quantifies the fraction of the computational density metric that an application achieves within an individual implementation. The RU metric can be used to provide valuable feedback to application developers and architecture designers by highlighting the upper bound on specific application optimization and providing a quantifiable measure of theoretical and realizable performance. Overall, the analysis in this article quantifies the performance tradeoffs among the architectures studied, the memory characteristics of different device types, and the efficiency of device architectures.\n <\/jats:p>","DOI":"10.1145\/2888401","type":"journal-article","created":{"date-parts":[[2016,9,29]],"date-time":"2016-09-29T19:06:10Z","timestamp":1475175970000},"page":"1-21","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Analysis of Fixed, Reconfigurable, and Hybrid Devices with Computational, Memory, I\/O, & Realizable-Utilization Metrics"],"prefix":"10.1145","volume":"10","author":[{"given":"Justin","family":"Richardson","sequence":"first","affiliation":[{"name":"NSF Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida"}]},{"given":"Alan","family":"George","sequence":"additional","affiliation":[{"name":"NSF Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida"}]},{"given":"Kevin","family":"Cheng","sequence":"additional","affiliation":[{"name":"NSF Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida"}]},{"given":"Herman","family":"Lam","sequence":"additional","affiliation":[{"name":"NSF Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida"}]}],"member":"320","published-online":{"date-parts":[[2016,9,24]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511535246"},{"key":"e_1_2_2_2_1","unstructured":"A. Athavale and C. Christensen. 2005. High-Speed Serial I\/O Made Simple A Designers\u2019 Guide with FPGA Applications. Xilinx Connectivity Solutions. A. Athavale and C. Christensen. 2005. High-Speed Serial I\/O Made Simple A Designers\u2019 Guide with FPGA Applications. Xilinx Connectivity Solutions."},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/600596"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85451-7_79"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v21:18"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_2_2_7_1","first-page":"2","article-title":"Portegies Zwart. 2008. High performance direct gravitational n-body simulations on graphics processing units II: An implementation","volume":"13","author":"Belleman Robert G.","year":"2008","unstructured":"Robert G. Belleman , Jeroen Bedorf , and Simon F . Portegies Zwart. 2008. High performance direct gravitational n-body simulations on graphics processing units II: An implementation in CUDA. New Astron. 13 ( Feb. 2008 ). Issue 2 . Robert G. Belleman, Jeroen Bedorf, and Simon F. Portegies Zwart. 2008. High performance direct gravitational n-body simulations on graphics processing units II: An implementation in CUDA. New Astron. 13 (Feb. 2008). Issue 2.","journal-title":"CUDA. New Astron."},{"key":"e_1_2_2_8_1","unstructured":"Bhaskar. 2006. Applied Mathematical Methods. Pearson Education. http:\/\/books.google.com\/books?id=D4DA7rWWWPYC. Bhaskar. 2006. Applied Mathematical Methods. Pearson Education. http:\/\/books.google.com\/books?id=D4DA7rWWWPYC."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.232983"},{"key":"e_1_2_2_10_1","unstructured":"Jose M. Cecilia. The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions. Jose M. Cecilia. The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions."},{"key":"e_1_2_2_12_1","volume-title":"Society for Industrial, and Applied Mathematics","author":"Dongarra J. J.","year":"1979","unstructured":"J. J. Dongarra , Society for Industrial, and Applied Mathematics . 1979 . Linpack Users\u2019 Guide. Society for Industrial and Applied Mathematics . http:\/\/books.google.com\/books?id=AmSm1n3Vw0cC. J. J. Dongarra, Society for Industrial, and Applied Mathematics. 1979. Linpack Users\u2019 Guide. Society for Industrial and Applied Mathematics. http:\/\/books.google.com\/books?id=AmSm1n3Vw0cC."},{"key":"e_1_2_2_13_1","volume-title":"The chamomile scheme: An optimized algorithm for n-body simulations on programmable graphics processing units. NewAstron. (March 5","author":"Hamada Tsuyoshi","year":"2007","unstructured":"Tsuyoshi Hamada and Toshiaki Iitaka . March 5, 2007. The chamomile scheme: An optimized algorithm for n-body simulations on programmable graphics processing units. NewAstron. (March 5 , 2007 ). Tsuyoshi Hamada and Toshiaki Iitaka. March 5, 2007. The chamomile scheme: An optimized algorithm for n-body simulations on programmable graphics processing units. NewAstron. (March 5, 2007)."},{"key":"e_1_2_2_14_1","volume-title":"MKL 11.2","year":"2014","unstructured":"Intel. 2014. Intel math kernel library reference manual. 072 , MKL 11.2 ( 2014 ). https:\/\/software.intel.com\/en-us\/mkl_11.2_ref_pdf. Intel. 2014. Intel math kernel library reference manual. 072, MKL 11.2 (2014). https:\/\/software.intel.com\/en-us\/mkl_11.2_ref_pdf."},{"key":"e_1_2_2_15_1","unstructured":"P. Lancaster and M. Tismenetsky. 1985. The Theory of Matrices: With Applications. Academic Press. http:\/\/books.google.com\/books?id=m8z6Xh1A3t8C. P. Lancaster and M. Tismenetsky. 1985. The Theory of Matrices: With Applications. Academic Press. http:\/\/books.google.com\/books?id=m8z6Xh1A3t8C."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2014.7041002"},{"key":"e_1_2_2_17_1","unstructured":"NVidia. 2010a. NVidia SDK Core. http:\/\/developer.nvidia.com\/cuda-toolkit. (2010). NVidia. 2010a. NVidia SDK Core. http:\/\/developer.nvidia.com\/cuda-toolkit. (2010)."},{"key":"e_1_2_2_18_1","unstructured":"NVidia. 2010b. NVidia SDK DirectCompute Core. http:\/\/developer.nvidia.com\/cuda-toolkit. (2010). NVidia. 2010b. NVidia SDK DirectCompute Core. http:\/\/developer.nvidia.com\/cuda-toolkit. (2010)."},{"key":"e_1_2_2_19_1","unstructured":"NVidia. 2015. CUBLAS LIBRARY. 7.0 (2015). http:\/\/docs.nvidia.com\/cuda\/pdf\/CUBLAS_Library.pdf. NVidia. 2015. CUBLAS LIBRARY. 7.0 (2015). http:\/\/docs.nvidia.com\/cuda\/pdf\/CUBLAS_Library.pdf."},{"key":"e_1_2_2_20_1","volume-title":"Fast n-body simulation with CUDA. GPU Gems 3","author":"Nyland Lars","year":"2007","unstructured":"Lars Nyland , Mark Harris , and Jan Prins . 2007. Fast n-body simulation with CUDA. GPU Gems 3 ( 2007 ). Lars Nyland, Mark Harris, and Jan Prins. 2007. Fast n-body simulation with CUDA. GPU Gems 3 (2007)."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917757"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1345206.1345220"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.232984"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/106975.106980"},{"key":"e_1_2_2_25_1","unstructured":"Vasily Volkov and James Demmel. 2008a. LU QR and Cholesky Factorizations using Vector Capabilities of GPUs. http:\/\/www.eecs.berkeley.edu\/Pubs\/TechRpts\/2008\/EECS-2008-49.html. (May 2008). Vasily Volkov and James Demmel. 2008a. LU QR and Cholesky Factorizations using Vector Capabilities of GPUs. http:\/\/www.eecs.berkeley.edu\/Pubs\/TechRpts\/2008\/EECS-2008-49.html. (May 2008)."},{"key":"e_1_2_2_26_1","volume-title":"Storage and Analysis, 2008. SC 2008. International Conference for (Nov. 15--21","author":"Volkov Vasily","year":"2008","unstructured":"Vasily Volkov and James W. Demmel . Nov. 15--21, 2008b. Benchmarking GPUs to tune dense linear algebra. High Performance Computing, Networking , Storage and Analysis, 2008. SC 2008. International Conference for (Nov. 15--21 , 2008 ). Vasily Volkov and James W. Demmel. Nov. 15--21, 2008b. Benchmarking GPUs to tune dense linear algebra. High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. International Conference for (Nov. 15--21, 2008)."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(00)00086-7"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1862648.1862649"},{"key":"e_1_2_2_29_1","volume-title":"Proc. of Reconfigurable Systems Summer Institute 2008 (RSSI) (July 7--10","author":"Williams J.","year":"2008","unstructured":"J. Williams , A. George , J. Richardson , K. Gosrani , and S. Suresh . July 7--10, 2008a. Computational density of fixed and reconfigurable multi-core devices for application acceleration . Proc. of Reconfigurable Systems Summer Institute 2008 (RSSI) (July 7--10 , 2008 ). J. Williams, A. George, J. Richardson, K. Gosrani, and S. Suresh. July 7--10, 2008a. Computational density of fixed and reconfigurable multi-core devices for application acceleration. Proc. of Reconfigurable Systems Summer Institute 2008 (RSSI) (July 7--10, 2008)."},{"key":"e_1_2_2_30_1","volume-title":"Proc. of High-Performance Embedded Computing Workshop (HPEC) (Sep. 23--25","author":"Williams J.","year":"2008","unstructured":"J. Williams , A. George , J. Richardson , K. Gosrani , and S. Suresh . Sep . 23--25, 2008b. Fixed and reconfigurable multi-core device characterization for HPEC . Proc. of High-Performance Embedded Computing Workshop (HPEC) (Sep. 23--25 , 2008 ). J. Williams, A. George, J. Richardson, K. Gosrani, and S. Suresh. Sep. 23--25, 2008b. Fixed and reconfigurable multi-core device characterization for HPEC. Proc. of High-Performance Embedded Computing Workshop (HPEC) (Sep. 23--25, 2008)."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2888401","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2888401","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,31]],"date-time":"2022-12-31T07:12:28Z","timestamp":1672470748000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2888401"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,24]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2017,3,31]]}},"alternative-id":["10.1145\/2888401"],"URL":"https:\/\/doi.org\/10.1145\/2888401","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,9,24]]},"assertion":[{"value":"2015-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-09-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}