Abstract
Communication latency has become one of the determining factors for the performance of parallel clusters. To design low-latency network topologies for high-performance computing clusters, we optimize the diameters, mean path lengths, and bisection widths of circulant topologies. We obtain a series of optimal circulant topologies of size \(2^5\) through \(2^{10}\) and compare them with torus and hypercube of the same sizes and degrees. We further benchmark on a broad variety of applications including effective bandwidth, FFTE, Graph 500 and NAS parallel benchmarks to compare the optimal circulant topologies and Cartesian products of optimal circulant topologies and fully connected topologies with corresponding torus and hypercube. Simulation results demonstrate superior potentials of the optimal circulant topologies for communication-intensive applications. We also find the strengths of the Cartesian products in exploiting global communication with data traffic patterns of specific applications or internal algorithms.
Similar content being viewed by others
References
Top 500 supercomputer site (2021) http://www.top500.org
Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117
Deng Y, Ramos AF, Hornos JEM (2012) Symmetry insights for design of supercomputer network topologies: roots and weights lattices. Int J Mod Phys B 26(31):1250169. https://doi.org/10.1142/s021797921250169x
Dally W, Towles B (2003) Principles and practices of interconnection networks. Elsevier Science & Technology, Amsterdam
Brightwell R, Pedretti K, Underwood K et al (2006) SeaStar interconnect: balanced bandwidth for scalable performance. IEEE Micro 26(3):41–57. https://doi.org/10.1109/mm.2006.65
IBM Blue Gene Team (2008) Overview of the IBM Blue Gene/P project. IBM J Res Dev 52(1.2):199–220. https://doi.org/10.1147/rd.521.0199
Alverson R, Roweth D, Kaplan L (2010) The gemini system interconnect. In: (2010) 18th IEEE Symposium on High Performance Interconnects. IEEE, Mountain View, CA, USA,. https://doi.org/10.1109/hoti.2010.23
Chen D, Parker JJ, Eisley NA et al (2011) The IBM Blue Gene/Q interconnection network and message unit. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC\(^{\prime }\)11. ACM Press, Seattle, WA, USA, https://doi.org/10.1145/2063384.2063419
Ajima Y, Sumimoto S, Shimizu T (2009) Tofu: a 6D mesh/torus interconnect for exascale computers. Computer 42(11):36–40. https://doi.org/10.1109/mc.2009.370
Ajima Y, Kawashima T, Okamoto T et al (2018) The tofu interconnect d. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, Belfast, UK, https://doi.org/10.1109/cluster.2018.00090
Hayes J, Mudge T (1989) Hypercube supercomputers. Proc IEEE 77(12):1829–1841. https://doi.org/10.1109/5.48826
Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput C 34(10):892–901. https://doi.org/10.1109/tc.1985.6312192
Fu H, Liao J, Yang J et al (2016) The sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci. https://doi.org/10.1007/s11432-016-5588-7
Wu CL, Feng TY (1980) On a class of multistage interconnection networks. IEEE Trans Comput C 29(8):694–702. https://doi.org/10.1109/tc.1980.1675651
Dally W (1990) Performance analysis of k-ary n-cube interconnection networks. IEEE Trans Comput 39(6):775–785. https://doi.org/10.1109/12.53599
Kim J, Dally WJ, Scott S, (2008) Technology-driven, highly-scalable dragonfly topology. In: (2008) International Symposium on Computer Architecture. IEEE, Beijing, China. https://doi.org/10.1109/isca.2008.19
Faanes G, Bataineh A, Roweth D, (2012) Cray cascade: a scalable HPC system based on a dragonfly network. In: (2012) International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Salt Lake City, UT, USA. https://doi.org/10.1109/sc.2012.39
Sensi DD, Girolamo SD, McMahon KH et al (2020) An in-depth analysis of the slingshot interconnect. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Atlanta, GA, USA, https://doi.org/10.1109/sc41405.2020.00039
Cerf VG, Cowan DD, Mullin RC et al (1974) A lower bound on the average shortest path length in regular graphs. Networks 4(4):335–342. https://doi.org/10.1002/net.3230040405
Zhang P, Powell R, Deng Y (2011) Interlacing bypass rings to torus networks for more efficient networks. IEEE Trans Parallel Distrib Syst 22(2):287–295. https://doi.org/10.1109/tpds.2010.89
Zhang P, Deng Y (2012) An analysis of the topological properties of the interlaced bypass torus (iBT) networks. Appl Math Lett 25(12):2147–2155. https://doi.org/10.1016/j.aml.2012.05.013
Zhang P, Deng Y (2012) Design and analysis of pipelined broadcast algorithms for the all-port interlaced bypass torus networks. IEEE Trans Parallel Distrib Syst 23(12):2245–2253. https://doi.org/10.1109/tpds.2012.93
Zhang P, Deng Y, Feng R et al (2015) Evaluation of various networks configurated by adding bypass or torus links. IEEE Trans Parallel Distrib Syst 26(4):984–996. https://doi.org/10.1109/tpds.2014.2315201
Feng R, Zhang P, Deng Y (2013) Deadlock-free routing algorithms for 6d mesh/iBT interconnection networks. In: 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, Honolulu, HI, USA, https://doi.org/10.1109/snpd.2013.43
Feng R, Zhang P, Deng Y (2012) Simulated performance evaluation of a 6d mesh/iBT interconnect. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, Kyoto, Japan, https://doi.org/10.1109/snpd.2012.19
Xu Z, Huang X, Jimenez F et al (2019) A new record of graph enumeration enabled by parallel processing. Mathematics 7(12):1214. https://doi.org/10.3390/math7121214
Deng Y, Guo M, Ramos AF et al (2020) Optimal low-latency network topologies for cluster performance enhancement. J Supercomput 76(12):9558–9584. https://doi.org/10.1007/s11227-020-03216-y
Zhang Y, Huang X, Xu Z et al (2019) A structured table of graphs with symmetries and other special properties. Symmetry 12(1):2. https://doi.org/10.3390/sym12010002
Truong NT, Fujiwara I, Koibuchi M et al (2017) Distributed shortcut networks: low-latency low-degree non-random topologies targeting the diameter and cable length trade-off. IEEE Trans Parallel Distrib Syst 28(4):989–1001. https://doi.org/10.1109/tpds.2016.2613043
Yasudo R, Koibuchi M, Nakano K et al (2019) Designing high-performance interconnection networks with host-switch graphs. IEEE Trans Parallel Distrib Syst 30(2):315–330. https://doi.org/10.1109/tpds.2018.2864286
Sabino AU, Vasconcelos MFS, Deng Y et al (2018) Symmetry-guided design of topologies for supercomputer networks. Int J Mod Phys C 29(07):1850048. https://doi.org/10.1142/s0129183118500481
Nakao M, Sakai M, Hanada Y et al (2021) Graph optimization algorithm for low-latency interconnection networks. Parallel Comput 106(102):805. https://doi.org/10.1016/j.parco.2021.102805
Bermond J, Comellas F, Hsu D (1995) Distributed loop computer-networks: a survey. J Parallel Distrib Comput 24(1):2–10. https://doi.org/10.1006/jpdc.1995.1002
Hwang F (2003) A survey on multi-loop networks. Theor Comput Sci 299(1–3):107–121. https://doi.org/10.1016/s0304-3975(01)00341-3
Monakhova EA (2012) A survey on undirected circulant graphs. Discret Math Algorithms Appl 04(01):1250002. https://doi.org/10.1142/s1793830912500024
Boesch F, Wang JF (1985) Reliable circulant networks with minimum transmission delay. IEEE Trans Circuits Syst 32(12):1286–1291. https://doi.org/10.1109/tcs.1985.1085667
Beivide R, Herrada E, Balcazar J et al (1991) Optimal distance networks of low degree for parallel computers. IEEE Trans Comput 40(10):1109–1124. https://doi.org/10.1109/12.93744
Lau F, Chen G (1996) Optimal layouts of midimew networks. IEEE Trans Parallel Distrib Syst 7(9):954–961. https://doi.org/10.1109/71.536939
Mukhopadhyaya K, Sinha B (1995) Fault-tolerant routing in distributed loop networks. IEEE Trans Comput 44(12):1452–1456. https://doi.org/10.1109/12.477250
Gómez D, Gutierrez J, Ibeas Á (2007) Optimal routing in double loop networks. Theor Comput Sci 381(1–3):68–85. https://doi.org/10.1016/j.tcs.2007.04.002
Obradoviç N, Peters J, Ružić G (2005) Reliable broadcasting in double loop networks. Networks 46(2):88–97. https://doi.org/10.1002/net.20076
Monakhov O, Monakhova E, (2019) A comparative analysis of bioinspired algorithms for solving the problem of optimization of circulant and hypercirculant networks. In: (2019) 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS), IEEE. https://doi.org/10.1109/opcs.2019.8880247
Feria-Purón R, Ryan J, Pérez-Rosés H (2014) Searching for large multi-loop networks. Electron Notes Discret Math 46:233–240. https://doi.org/10.1016/j.endm.2014.08.031
Bevan D, Erskine G, Lewis R (2017) Large circulant graphs of fixed diameter and arbitrary degree. Ars Math Contemp 13(2):275–291. https://doi.org/10.26493/1855-3974.969.659,
Lewis RR (2018) The degree-diameter problem for circulant graphs of degrees 10 and 11. Discret Math 341(9):2553–2566. https://doi.org/10.1016/j.disc.2018.05.024
Gross JL, Yellen J, Zhang P (2013) Handbook of graph theory. CRC Press, Boca Raton
Boesch F, Tindell R (1984) Circulants and their connectivities. J Graph Theory 8(4):487–499. https://doi.org/10.1002/jgt.3190080406
Arndt J (2010) Matters computational: ideas, algorithms, source code. Springer Science & Business Media, Berlin
Fxt: a library of algorithms (2021) https://www.jjj.de/fxt/
Ruskey F (2003) Combinatorial generation. Preliminary working draft University of Victoria, Victoria, BC, Canada 11:20
Sanders P, Schulz C (2013) Think locally, act globally: highly balanced graph partitioning. In: Experimental Algorithms. Lecture notes in computer science, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 164–175, https://doi.org/10.1007/978-3-642-38527-8_16
McKay BD, Piperno A (2014) Practical graph isomorphism, II. J Symb Comput 60:94–112. https://doi.org/10.1016/j.jsc.2013.09.003
Casanova H, Giersch A, Legrand A et al (2014) Versatile, scalable, and accurate simulation of distributed applications and platforms. J Parallel Distrib Comput 74(10):2899–2917. https://doi.org/10.1016/j.jpdc.2014.06.008
Effective Bandwidth (b_eff) Benchmark (2021) https://fs.hlrs.de/projects/par/mpi/b_eff/
Koniges A, Rabenseifner R, Solchenbach K (2001) Benchmark design for characterization of balanced high-performance architectures. In: Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001. IEEE Comput. Soc, San Francisco, CA, USA. https://doi.org/10.1109/ipdps.2001.925208
FFTE : A fast fourier transform package (2021) http://www.ffte.jp/
Takahashi D, Kanada Y (2000) High-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. J Supercomput 15(2):207–228. https://doi.org/10.1023/a:1008160021085
Graph 500 (2021) http://graph500.org/
Murphy RC, Wheeler KB, Barrett BW et al (2010) Introducing the graph 500. Cray Users Group (CUG) 19:45–74
NPB: NAS parallel benchmarks (2021) http://www.nas.nasa.gov/publications/npb.html
Bailey D, Barszcz E, Barton J et al (1991) The NAS parallel benchmarks. Int J Supercomput Appl 5(3):63–73. https://doi.org/10.1177/109434209100500306
Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in MPICH. Int J High Perform Comput Appl 19(1):49–66. https://doi.org/10.1177/1094342005051521
Deveci M, Kaya K, Ucar B (2015) Fast and high quality topology-aware task mapping. In: (2015) IEEE International Parallel and Distributed Processing Symposium. IEEE. https://doi.org/10.1109/ipdps.2015.93
Ma T, Bosilca G, Bouteiller A (2012) HierKNEM: an adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters. In: (2012) IEEE 26th International Parallel and Distributed Processing Symposium, IEEE. https://doi.org/10.1109/ipdps.2012.91
Wolf W (2003) A decade of hardware/ software codesign. Computer 36(4):38–43. https://doi.org/10.1109/mc.2003.1193227
Teich J (2012) Hardware/software codesign: the past, the present, and predicting the future. Proc IEEE 100(Special Centennial Issue):1411–1430. https://doi.org/10.1109/jproc.2011.2182009
Monakhov OG, Monakhova EA, Romanov AY et al (2021) Adaptive dynamic shortest path search algorithm in networks-on-chip based on circulant topologies. IEEE Access 9:160836–160846. https://doi.org/10.1109/access.2021.3131635
Mirsadeghi SH, Afsahi A (2016) PTRAM: a parallel topology-and routing-aware mapping framework for large-scale HPC systems. In: (2016) IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE. https://doi.org/10.1109/ipdpsw.2016.146
Parsonage E, Nguyen HX, Bowden R et al (2011) Generalized graph products for network design and analysis. In: 2011 19th IEEE International Conference on Network Protocols. IEEE, https://doi.org/10.1109/icnp.2011.6089084,
Hammack RH, Imrich W, Klavžar S et al (2011) Handbook of product graphs, vol 2. CRC Press, Boca Raton
Dandamudi S, Eager D (1990) Hierarchical interconnection networks for multicomputer systems. IEEE Trans Comput 39(6):786–797. https://doi.org/10.1109/12.53600
Abd-El-Barr M, Al-Somani TF (2011) Topological properties of hierarchical interconnection networks: a review and comparison. J Electr Comput Eng 2011:1–12. https://doi.org/10.1155/2011/189434
Acknowledgements
The authors thank Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance SeaWulf computing system, which was made possible by a $1.4M National Science Foundation grant (#1531492). The authors also thank Dr. A. Y. Romanov for beneficial suggestions on the manuscript via e-mails.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, X., F. Ramos, A. & Deng, Y. Optimal circulant graphs as low-latency network topologies. J Supercomput 78, 13491–13510 (2022). https://doi.org/10.1007/s11227-022-04396-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04396-5