Abstract
Loop alignment is a classical program transformation that can enable the fusion of parallel loops, thereby increasing locality and reducing the number of synchronizations. Although the problem is quite old in the one-dimensional case (i.e., no nested loops), it came back recently — with a multi-dimensional form — when trying to refine parallelization algorithms based on multi-dimensional schedules. The main result of this paper is that, unlike the problem in 1D, finding a multi-dimensional shift of statements that makes an innermost loop parallel is strongly NP-complete. Nevertheless, we identify some polynomially-solvable cases that can occur in practice and we show that the general problem can be stated as a system of integer linear constraints.
Shifting is sometimes called alignment. But we prefer to make a distinction between shifting, a technique that consists in shifting some statements with respect to the loop iterations, and alignment, which is one particular goal achieved by this technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In PoPL’87, pages 63–76, Munich, Jan. 1987.
P. Boulet, A. Darte, G.-A. Silber, and F. Vivien. Loop parallelization algorithms: Fromparallelismextraction to code generation. Journal of Parallel Computing, 24(3), 1998. Special issue on Languages and Compilers for Parallel Computers.
J.-F. Collard, P. Feautrier, and T. Risset. Construction of DO loops from systems of affine constraints. Parallel Processing Letters (PPL), 5(3):421–436, Sept. 1995.
A. Darte. On the complexity of loop fusion. In PACT’99, pages 149–157, Newport Beach, CA, Oct. 1999.
A. Darte and G. Huard. Loop shifting for loop parallelization. Technical Report RR2000-22, LIP, ENS-Lyon, France, May 2000.
A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser, 2000. ISBN 0-8176-4149-1.
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. PPL, 7(4):379–392, 1997.
P. Feautrier. Some efficient solutions to the affine scheduling problem, Part II. Internationl Journal of Parallel Programming, 21(6):389–420, Dec. 1992.
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979.
W. Kelly and B. Pugh. Selecting affine mappings based on performance estimation. Parallel Processing Letters, 4(3):205–219, Sept. 1994.
W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Frontiers of Massively Parallel Computation, pages 332–341, McLean, 1995.
K. Kennedy and K. S. McKinley. Loop distribution with arbitrary control flow. In Supercomputing’90, Aug. 1990.
L. Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83–93, Feb. 1974.
C. Lang, N. Passos, and E. Sha. Polynomial-time nested loop fusion with full parallelism. In ICPP’96, volume 3, pages 9–16, Bloomingdale, IL, 1996.
C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1):5–35, 1991.
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In PoPL’97. ACM Press, Jan. 1997.
K. S. McKinley and K. Kennedy. Maximizing loop parallelism and improving data locality via loop fusion and distribution. LCPC’93, volume 768 of Lecture Notes in Computer Science, pages 301–320. Springer-Verlag, 1993.
K. Okuda. Cycle shrinking by dependence reduction. In Euro-Par’96, volume 1123 of Lecture Notes in Computer Science, pages 398–401. Springer-Verlag, 1996.
J. K. Peir. Program Partitioning and Synchronization on Multiprocessor Systems. PhD thesis, University of Illinois at Urbana-Champaign, 1986.
J. K. Peir and R. Cytron. Minimum distance: A method for partitioning recurrences for multiprocessors. IEEE Trans. on Computers, 38(8):1203–1211, Aug. 1989.
F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28(5):469–498, 2000.
R. Schreiber, S. Aditya, B. R. Rau, V. Kathail, S. Mahlke, S. Abraham, and G. Snider. High-level synthesis of nonprogrammable hardware accelerators. In Application-Specific Systems, Architectures, and Processors, pages 113–124, 2000.
M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE TPDS, 2(4):452–471, Oct. 1991.
M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1996.
H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Darte, A., Huard, G. (2002). Complexity of Multi-dimensional Loop Alignment. In: Alt, H., Ferreira, A. (eds) STACS 2002. STACS 2002. Lecture Notes in Computer Science, vol 2285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45841-7_14
Download citation
DOI: https://doi.org/10.1007/3-540-45841-7_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43283-8
Online ISBN: 978-3-540-45841-8
eBook Packages: Springer Book Archive