Abstract
In distributed-memory multicomputers, minimizing interprocessor communication is the key to the efficient execution of parallel programs. In order to reduce the amount of communication overhead, parallel programs on multicomputers must be carefully scheduled by parallelizing compilers. This paper proposes some compilation techniques for partitioning and mapping nested loops with constant data dependences onto linear array multicomputers. First, a systematic partition strategy is proposed to project ann-dimensional computational structure, representing ann-nested loop, onto a line to form a one-dimensional projected structure with low communication overhead. Then, a mapping algorithm is proposed for mapping the partitioned loops onto linear arrays in a way that balances the workload and minimizes the communication cost among processors. Finally, parallel execution codes can be automatically generated for such linear array multicomputers.
Similar content being viewed by others
References
Bannerjee, U., Chen, S.C., Kuck, D.J., and Towle, R.A. 1979. Time and parallel processor bounds for Fortran-like loops.IEEE Trans. Comps., C-28, 9(Sept.): 660–670.
Bokhari, S.H. 1988. Partitioning problems in parallel, pipelined, and distributed computing.IEEE Trans. Comps., 37(Jan.): 48–57.
Chen, T.S., and Sheu, J.P. 1994. Communication-free data allocation techniques for parallelizing compilers on multicomputers.IEEE Trans. Parallel and Distr. Systems, 5, 9(Sept.): 924–938.
D'Hollander, E.H. 1989. Partitioning and labeling of index sets in do loops with constant dependence. InProc., 1989 Internat. Conf. on Parallel Processing, vol. 2 (Aug.), pp. 139–144.
Ercal, F., Ramanujam, J., and Sadayappan, P. 1990. Task allocation onto a hypercube by recursive mincut bipartitioning.J. Parallel and Distr. Comp., 10, 1(Sept.): 35–44.
Friedberg, S.H., Insel, A.J., and Spence, L.E. 1979.Linear Algebra. Prentice-Hall, Englewood Cliffs, N.J.
Gupta, R. 1992. Synchronization and communication costs of loop partitioning on shared-memory multiprocessor systems.IEEE Trans. Parallel and Distr. Systems, 3, 4(July): 505–512.
King, C.T., Chou, W.H., and Ni, L.M. 1990. Pipelined data-parallel algorithms: Part II, design.IEEE Trans. Parallel and Distr. Systems, 1, 4(Oct.): 486–499.
Kung, S.Y. 1987.VLSI Array Processors. Prentice-Hall, Englewood Cliffs, N.J.
Lamport, L. 1974. The parallel execution of do loops.CACM, 17, 2(Feb.): 83–93.
Liu, L.S., Ho, C.W., and Sheu, J.P. 1990. On the parallelism of nested for-loops using index shift method. InProc., 1990 Internat. Conf. on Parallel Processing, vol. 2(Aug.), pp. 119–123.
Lu, M., and Fang, J.Z. 1992. A solution of the cache ping-pong problem in multiprocessor systems.J. Parallel and Distr. Comp., 16(Oct.): 158–171.
Padua, D.A., and Wolfe, M.J. 1986. Advanced compiler optimizations for supercomputers.CACM (Dec.): 1184–1201.
Peir, J.-K., and Cytron, R. 1989. Minimum distance: A method for partitioning recurrences for multiprocessors.IEEE Trans. Comps., 38, 8(Aug.): 1203–1211.
Ramanujam, J., and Sadayappan, P. 1989. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. InProc., ACM Internat. Conf. on Supercomputing (Nov.), pp. 637–646.
Ramanujam, J., and Sadayappan, P. 1990. Tiling of iteration spaces for multicomputers. InProc., 1990 Internat. Conf. on Parallel Processing (Aug.), pp. 23–36.
Sadayappan, P., and Ercal, F. 1987. Nearest-neighbor mapping of finite element graphs onto processor meshes.IEEE Trans. Comps., C-36, 12(Dec.): 1408–1424.
Shang, W., and Fortes, J.A.B. 1988. Independent partitioning of algorithms with uniform dependencies. InProc., 1988 Internat. Conf. on Parallel Processing (Aug.), pp. 26–33.
Shang, W., and Fortes, J.A.B. 1991. Time optimal linear schedules for algorithms with uniform dependencies.IEEE Trans. Comps., 40, 6(June): 723–742.
Sheu, J.P., and Tai, T.H. 1991. Partitioning and mapping nested loops on multiprocessor systems.IEEE Trans. Parallel and Distr. Systems, 2, 4(Oct).: 430–439.
Wolf, M.E., and Lam, M.S. 1991. A loop transformation theory and an algorithm to maximize parallelism.IEEE Trans. Parallel and Distr. Systems, 2, 4(Oct.): 452–471.
Wolfe, M.J. 1989a. More iteration space tiling. InProc., ACM Internat. Conf. on Supercomputing (Nov.), pp. 655–664.
Wolfe, M.J. 1989b.Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, Mass.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sheu, JP., Chen, TS. Partitioning and mapping of nested loops for linear array multicomputers. J Supercomput 9, 183–202 (1995). https://doi.org/10.1007/BF01245404
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01245404