{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,5]],"date-time":"2024-07-05T21:55:56Z","timestamp":1720216556122},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2013,10]]},"abstract":"The choice of routing algorithm plays a vital role in the performance of on-chip interconnection networks. Adaptive routing is appealing because it offers better latency and throughput than oblivious routing, especially under nonuniform and bursty traffic. The performance of an adaptive routing algorithm is determined by its ability to accurately estimate congestion in the network. In this regard, maintaining global congestion state using a separate monitoring network offers better congestion visibility into distant parts of the network compared to solutions relying only on local congestion. However, the main challenge in designing such routing schemes is to keep the logic and bandwidth overhead as low as possible to fit into the tight power, area, and delay budgets of on-chip routers. In this article, we propose a minimal destination-based adaptive routing strategy (DAR), where every node estimates the delay to every other node in the network, and routing decisions are based on these per-destination delay estimates. DAR outperforms Regional Congestion Awareness (RCA), the best previously known adaptive routing algorithm that uses nonlocal congestion state. The performance improvement is brought about by maintaining fine-grained per-destination delay estimates in DAR that are more accurate than regional congestion metrics measured in RCA. The increased accuracy is a consequence of the fact that the per-destination delay estimates are not corrupted by congestion on links outside the admissible routing paths to the destination. A scalable version of DAR, referred to as SDAR, is also proposed for minimizing the overheads associated with DAR in large network topologies. We show that DAR outperforms local adaptive routing by up to 79% and RCA by up to 58% in terms of latency on SPLASH-2 benchmarks. DAR and SDAR also outperform existing adaptive and oblivious routing algorithms in latency and throughput under synthetic traffic patterns on 8\u00d78 and 16times;16 mesh topologies, respectively.<\/jats:p>","DOI":"10.1145\/2505055","type":"journal-article","created":{"date-parts":[[2013,11,6]],"date-time":"2013-11-06T14:09:19Z","timestamp":1383746959000},"page":"1-27","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Destination-based congestion awareness for adaptive routing in 2D mesh networks"],"prefix":"10.1145","volume":"18","author":[{"given":"Rohit Sunkam","family":"Ramanujam","sequence":"first","affiliation":[{"name":"University of California, San Diego, CA"}]},{"given":"Bill","family":"Lin","sequence":"additional","affiliation":[{"name":"University of California, San Diego, CA"}]}],"member":"320","published-online":{"date-parts":[[2013,10,25]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183401.1183430"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/325164.325115"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.473515"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the International Symposium on High-Performance Computer Architecture.","author":"Gratz P.","unstructured":"Gratz , P. , Grot , B. , and Keckler , S. W . 2008. Regional congestion awareness for load balance in networks-on-chip . In Proceedings of the International Symposium on High-Performance Computer Architecture. Gratz, P., Grot, B., and Keckler, S. W. 2008. Regional congestion awareness for load balance in networks-on-chip. In Proceedings of the International Symposium on High-Performance Computer Architecture."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the IEEE International Conference on Computer Design.","author":"Gratz P.","unstructured":"Gratz , P. , Kim , C. , McDonald , R. , Keckler , S. W. , and Burger , D . 2006. Implementation and evaluation of on-chip network architectures . In Proceedings of the IEEE International Conference on Computer Design. Gratz, P., Kim, C., McDonald, R., Keckler, S. W., and Burger, D. 2006. Implementation and evaluation of on-chip network architectures. In Proceedings of the IEEE International Conference on Computer Design."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/996566.996638"},{"key":"e_1_2_1_7_1","unstructured":"IBM. IBM Blue Gene project. http:\/\/www.research.ibm.com\/bluegene\/. IBM. IBM Blue Gene project. http:\/\/www.research.ibm.com\/bluegene\/."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555783"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1148882.1148891"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.15"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065579.1065726"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1521747.1521786"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250681"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1146909.1147125"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2009.13"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000113"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the Annual International Symposium on Computer Architecture.","author":"Mullins R.","unstructured":"Mullins , R. , West , A. , and Moore , S . 2004. Low-latency virtual-channel routers for on-chip networks . In Proceedings of the Annual International Symposium on Computer Architecture. Mullins, R., West, A., and Moore, S. 2004. Low-latency virtual-channel routers for on-chip networks. In Proceedings of the Annual International Symposium on Computer Architecture."},{"key":"e_1_2_1_18_1","unstructured":"Netmaker. 2009. Netmaker. http:\/\/www-dyn.cl.cam.ac.uk\/∼rdm34\/wiki\/index.php?title=Main_Page. Netmaker. 2009. Netmaker. http:\/\/www-dyn.cl.cam.ac.uk\/∼rdm34\/wiki\/index.php?title=Main_Page."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/269790.269792"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1872007.1872030"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the Annual Symposium on High Performance Interconnects.","author":"Scott S. L.","unstructured":"Scott , S. L. and Thorson , G . 1996. The Cray T3E network: Adaptive routing in a high-performance 3D torus . In Proceedings of the Annual Symposium on High Performance Interconnects. Scott, S. L. and Thorson, G. 1996. The Cray T3E network: Adaptive routing in a high-performance 3D torus. In Proceedings of the Annual Symposium on High Performance Interconnects."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1399504.1360617"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.37"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the International Symposium on High-Performance Computer Architecture.","author":"Shang L.","unstructured":"Shang , L. , Peh , L.-S. , and Jha , N. K . 2003. Dynamic voltage scaling with links for power optimization of interconnection networks . In Proceedings of the International Symposium on High-Performance Computer Architecture. Shang, L., Peh, L.-S., and Jha, N. K. 2003. Dynamic voltage scaling with links for power optimization of interconnection networks. In Proceedings of the International Symposium on High-Performance Computer Architecture."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007912.1007915"},{"key":"e_1_2_1_26_1","unstructured":"SPLASH-2. http:\/\/www-flash.stanford.edu\/apps\/SPLASH\/. SPLASH-2. http:\/\/www-flash.stanford.edu\/apps\/SPLASH\/."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2002.997877"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/777412.777444"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/800076.802479"},{"key":"e_1_2_1_30_1","volume-title":"CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference.","author":"Vangal S.","unstructured":"Vangal , S. , Howard , J. , An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference. Vangal, S., Howard, J., et al. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference."}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2505055","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T08:47:17Z","timestamp":1672390037000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2505055"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,10]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,10]]}},"alternative-id":["10.1145\/2505055"],"URL":"https:\/\/doi.org\/10.1145\/2505055","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,10]]},"assertion":[{"value":"2012-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-10-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}