Abstract
This paper addresses the well-known problem of redistributing data arrays over a multiprocessor network. The block-cyclic redistribution techniques found in the literature deal effectively with this problem, based on the assumption that there is a direct link between all the processors of the network. Most of these techniques aim at reducing the number of messages and the total redistribution cost. However, an application of these techniques on non-all-to-all communication networks, like tori, shows that these techniques suffer long delays. In this work, we try to solve the general block-cyclic redistribution problem on non-all-to-all networks, by grouping the messages into well-defined classes, and transferring them with the support of a well-specified number of virtual channels.
Similar content being viewed by others
References
Arabnia HR, Oliver MA (1987) Arbitrary rotation of raster images with SIMD machine architectures. Int J Eurogr Assoc (Comput Gr Forum) 6(1):3–12
Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–433
Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Comput Gr Forum) 8(1):3–12
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193
Arabnia HR, Smith JW (1993) A Reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium. Calgary, Alberta, Canada, June, pp 349–357
Arabnia HR (1995) A distributed stereocorrelation algorithm. In: Proceedings of computer communications and networks (ICCCN’95), IEEE, pp 479–482
Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–270
Bhandarkar SM, Arabnia HR, Smith JW (1995) A reconfigurable architecture for image processing and computer vision. Int J Pattern Recogn Artif Intell (IJPRAI) (special issue on VLSI algorithms and architectures for computer vision, image processing, pattern recognition and AI) 9(2):201–229
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor: theoretical properties and algorithms. Parallel Comput 21(11):1783–1806
Bhandarkar SM, Arabnia HR (1995) The Hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114
Chung YC, Hsu CH, Bai SW (1998) A basic-cycle calculation technique for efficient dynamic data redistribution. IEEE Trans Parallel Distrib Syst 32:359–377
Desprez F, Dongarra J, Petitet A, Randriamaro C, Robert Yves (1998) Scheduling block-cyclic array redistribution. IEEE Trans Parallel Distrib Syst 9(2):192–205
Jun D, Negishi Y (2010) Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers. International conference on high performance computing, networking, storage and analysis (SC). New Orleans, Louisiana, pp 1–9
Faraj A, Xin Y, Patarasuk P (2007) A message scheduling scheme for all-to-all personalized communication on ethernet switched clusters. IEEE Trans Parallel Distrib Syst 18(2):264–276
Feng R, Zhang P, Deng Y (2013) Deadlock-free routing algorithms for 6D Mesh/iBT interconnection networks. In: Proceedings of the 5th IEEE international conference on information technology: new generations. Las Vegas, Nevada, pp 253–258
Kalns ET, Ni LM (1995) Processor mapping techniques toward efficient data redistribution. IEEE Trans Parallel Distrib Syst 6:1234–1247
Hsu C-H, Chung Y-C, Yang D-L, Dow C-R (2001) A generalized processor mapping technique for array redistribution. IEEE Trans Parallel Distrib Syst 12(7):743–757
Huang JW, Chu CP (2006) An efficient communication scheduling method for the processor mapping technique applied data redistribution. J Supercomput 37:297–318
Kumar S, Sabharwal Y, Garg R, Heidelberger P (2008) Optimization of all-to-all communication on the blue Gene/L supercomputer. In: Proceedings of the 37th international conference on parallel processing. Washington, DC, USA, pp 320–329
Lim YW, Bhat PB, Prasanna VK (1998) Efficient algorithms for block cyclic redistribution of arrays. Algorithmica 24:298–330
Liu G, Gu N, Ren K, Tao Y (2006) Optimal all-to-all personalized communication in All-Port Tori. First international multi-symposiums on computer and computational sciences, IMSCCS ’06. Hanzhou, Zhejiang, pp 20–24
Luo W, Xiang D (2012) Deadlock-free routing algorithm for torus networks. IEEE Trans Parallel Distrib Syst 23(5):800–808
Ramanujam RS, Lin B (2013) Randomized throughput-optimal oblivious routing for torus networks. IEEE Trans Comput 62(6):561–574
Ramaswamy S, Simons P, Banerjee P (1996) Optimization for efficient array redistribution on distributed memory multicomputers. J Parallel Distrib Comput 38:217–228
Souravlas SI, Roumeliotis M (2004) A pipeline technique for dynamic data transfer on a multiprocessor grid. Int J Parallel Progr 32(5):361–388
Thakur R, Choudhary A, Ramanujam J (1996) Efficient algorithms for array redistribution. IEEE Trans Parallel Distrib Syst 7:587–594
Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63
Yang Y, Wang J (2002) Near-optimal all-to-all broadcast in multidimensional all-port meshes and tori. IEEE Comput Trans Parallel Distrib Syst 13(2):128–141
Zhang P, Deng Y (2012) Design and analysis of pipelined broadcast algorithms for the all-port interlaced bypass torus networks. IEEE Comput Trans Parallel Distrib Syst 23(12):2245–2253
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Souravlas, S., Roumeliotis, M. Scheduling array redistribution with virtual channel support. J Supercomput 71, 4215–4234 (2015). https://doi.org/10.1007/s11227-015-1519-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1519-4