Abstract
The paper presents a new method to derive data distributions for parallel computers with distributed memory organization by a mathematical optimization technique. Prerequisites for this approach are a parameterized data distribution and a rigorous performance prediction technique that allows us to derive runtime formulas containing the parameters of the data distribution. A mathematical optimization technique can then be used to determine the parameters in such a way that the total runtime is minimized, thus also minimizing the communication overhead and the load imbalance penalty. The method is demonstrated by using it to determine a data distribution for the LU decomposition of a matrix.
Similar content being viewed by others
References
J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 112–125, 1993.
P. Banerjee, J. Chandy, M. Gupta, E. Hodge, J. Holm, A. Lain, D. Palermo, S. Ramaswamy, and E. Su. The paradigm compiler for distributed-memory multicomputers. IEEE Computer, 28(10): 37–47, 1995.
D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation. Prentice Hall, 1989.
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proc. Int. Conf. on Parallel Architectures and Compilation Techniques (PACT94), 1994.
Parasoft, Co. EXPRESS User Manual. Parasoft Co., 1989.
A. Dierstein, R. Hayer, and T. Rauber. The ADDAP system on the iPSC/860: Automatic data distribution and parallelization. Journal of Parallel and Distributed Computing, 32(1): 1–10, 1996.
R. Foschia, T. Rauber, and G. Rünger. Modeling the communication behavior of the Intel paragon. In Proc. 5th Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'97), IEEE, pp. 117–124, 1997.
G. Fox, R. Williams, and P. Messina. Parallel Computing Works! Morgan Kaufmann Publishers, 1994.
J. Garcia, E. Ayguade, and J. Labarta, A novel approach towards automatic data distribution. In Proc. Supercomputing '95, 1995.
M. Gupta. Automatic data partitioning on distributed memory multicomputers. Ph.D. thesis, University of Illinois, Urbana-Champaign, 1992.
Y. Hu, D. Emerson, and R. Blake. The communication performance of the Cray T3D and its effect on iterative solvers. Parallel Computing, 22: 829–844, 1996.
K. Hwang, Z. Xu, and M. Arakawa. Benchmark evaluation of the IBM SP2 for parallel signal processing. IEEE Transactions on Parallel and Distributed Systems, 7(5): 522–536, 1996.
K. Ikudome, G. Fox, A. Kolawa, and J. Flower. An automatic and symbolic parallelization system for distributed memory parallel computers. In Proc. 5th Distributed Memory Computing Conference, pp. 1105–1114, 1990.
S. Johnsson. Performance modeling of distributed memory architecture. Journal of Parallel and Distributed Computing, 12: 300–312, 1991.
S. Johnsson and C. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9): 1249–1268, 1989.
K. Knobe, J. Lukas, and G. Steele. Data optimizations: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8: 102–118, 1990.
V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing. Benjamin/Cummings, 1993.
J. Li and M. Chen. Index domain alignment: Minimizing costs of cross-referencing between distributed arrays. In Third Symposium on the Frontiers of Massively Parallel Computation, pp. 424–433, 1990.
M. Mace, Memory Storage Patterns in Parallel Processing. Kluwer Academic, 1987.
D. Palermo, Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers. Ph.D. thesis, University of Illinois at Urbana-Champaign, 1996.
J. Ramanujan and P. Sadayappan. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In Proc. Supercomputing 89, pp. 637–646, 1989.
J. Ramanujan and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 2(4): 472–481, 1991.
T. Rauber and G. Rünger. Comparing task and data parallel execution schemes for the DIIRK method. In Proc. EuroPar'96, Springer LNCS 1124, pp. 52–61, 1996.
T. Rauber and G. Rünger, Deriving structured parallel implementations for numerical methods. Microprocessing and Microprogramming, 41: 589–608, 1996.
T. Rauber and G. Rünger, Parallel iterated Runge-Kutta methods and applications. International Journal of Supercomputer Applications, 10(1): 62–90, 1996.
T. Rauber and G. Rünger. Load balancing schemes for extrapolation methods. Concurrency: Practice and Experience, 9(3): 181–202, 1997.
J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer, 1990.
E. van de Velde. Data redistribution and concurrency. Parallel Computing, 16: 125–138, 1990.
E. van de Velde. Concurrent Scientific Computing. Springer, 1994.
S. Wholey. Automatic data mapping for distributed-memory parallel computers. In Ph.D. thesis, Carnegie Mellon University, Pittsburgh, 1991.
S. Wholey. Automatic data mapping for distributed-memory parallel computers. In Proc. Int. Conf. on Supercomputing, 1992.
Z. Xu and K. Hwang. Early prediction of MPP performance: SP2, T3D and paragon experiences. Parallel Computing, 22: 917–942, 1996.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rauber, T., Rünger, G. Deriving Array Distributions by Optimization Techniques. The Journal of Supercomputing 15, 271–293 (2000). https://doi.org/10.1023/A:1008164427332
Issue Date:
DOI: https://doi.org/10.1023/A:1008164427332