Abstract
In this paper, we propose a prefix code matching parallel load-balancing method (PCMPLB) to efficiently deal with the load imbalance of solution-adaptive finite element application programs on distributed memory multicomputers. The main idea of the PCMPLB method is first to construct a prefix code tree for processors. Based on the prefix code tree, a schedule for performing load transfer among processors can be determined by concurrently and recursively dividing the tree into two subtrees and finding a maximum matching for processors in the two subtrees until the leaves of the prefix code tree are reached. We have implemented the PCMPLB method on an SP2 parallel machine and compared its performance with two load-balancing methods, the directed diffusion method and the multilevel diffusion method, and five mapping methods, the AE/ORB method, the AE/MC method, the MLkP method, the PARTY library method, and the JOSTLE-MS method. An unstructured finite element graph Truss was used as a test sample. During the execution, Truss was refined five times. Three criteria, the execution time of mapping/load-balancing methods, the execution time of an application program under different mapping/load-balancing methods, and the speedups achieved by mapping/load-balancing methods for an application program, are used for the performance evaluation. The experimental results show that (1) if a mapping method is used for the initial partitioning and this mapping method or a load-balancing method is used in each refinement, the execution time of an application program under a load-balancing method is less than that of the mapping method. (2) The execution time of an application program under the PCMPLB method is less than that of the directed diffusion method and the multilevel diffusion method.
Similar content being viewed by others
References
I. G. Angus, G. C. Fox, J. S. Kim, and D. W. Walker. Solving Problems on Concurrent Processors, Vol. 2. Prentice-Hall, Englewood Cliffs, N. J., 1990.
C. Aykanat, F. Ozguner, F. Ercal, and P. Sadayaooan. Iterative algorithms for solution of large sparse systems of linear equations on hypercubes. IEEE Trans. on Computers, 37(12):1554–1568, 1988.
C. Aykanat, F. Ozguner, S. Martin, and S. M. Doraivelu. Parallelization of a finite element application program on a hypercube multiprocessor. Hypercube Multiprocessor, 662–673, 1987.
S. B. Baden. Programming abstractions for dynamically partitioning and coordinating localized scientific calculations running on multiprocessors. SIAM Journal on Scientific and Statistical Computing, 12(1):145–157, 1991.
S. T. Barnard and H. D. Simon. Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency: Practice and Experience, 6(2):101–117, 1994.
S. T. Barnard and H. D. Simon. A parallel implementation of multilevel recursive spectral bisection for application to adaptive unstructured meshes. Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, pp. 627–632. San Francisco, Feb. 1995.
J. A. Bondy and U. S. R. Murty. Graph Theory with Applications. Elsevier North Holland, New York, 1976.
Y. C. Chung and C. J. Liao, A processor oriented partitioning method for mapping unstructured finite element models on SP2 parallel machines. Technical report. Institute of Information Engineering, Feng Chia University, Taichung, Taiwan, 1996.
G. Cybenko. Dynamic load balancing for distributed memory multiprocessors. Journal of Parallel and Distributed Computing, 7(2):279–301, 1989.
F. Ercal, J. Ramanujam, and P. Sadayappan. Task allocation onto a hypercube by recursive mincut bipartitioning. Journal of Parallel and Distributed Computing, 10:35–44, 1990.
C. Farhat and H. D. Simon. TOP/DOMDEC—a software tool for mesh partitioning and parallel processing. Technical report RNR–93–011. NASA Ames Research Center, 1993.
C. M. Fiduccia and R. M. Mattheyes. A linear-time heuristic for improving network partitions. Proceeding of the 19th IEEE Design Automation Conference, pp. 175–181, 1982.
M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to Theory of NP-Completeness. Freeman, San Francisco, 1979.
J. R. Gilbert and E. Zmijewski. A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming, 16(6):427–449, 1987.
J. R. Gilbert, G. L. Miller, and S. H. Teng. Geometric mesh partitioning: implementation and experiments. Proceedings of 9th International Parallel Processing Symposium, pp. 418–427. Santa Barbara, Calif. Apr. 1995.
A. Heirich and S. Taylor. A Parabolic Load Balancing Method, Proceeding of ICPP' 95, pp. 192–202, 1995.
B. Hendrickson and R. Leland. The Chaco user's guide: version 2.0. Technical report SAND94–2692. Sandia National Laboratories, Albuquerque, NM, Oct. 1994.
B. Hendrickson and R. Leland. An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM Journal on Scientific Computing, 16(2):452–469, 1995.
B. Hendrickson and R. Leland. An multilevel algorithm for partitioning graphs. Proceeding of Supercomputing' 95, Dec. 1995.
G. Horton. A multi-level diffusion method for dynamic load balancing. Parallel Computing, 19:209–218, 1993.
S. H. Hosseini, B. Litow, M. Malkawi, J. Mcpherson, and K. Vairavan. Analysis of a graph coloring based distributed load balancing algorithm. Journal of Parallel and Distributed Computing, 10(2):160–166, 1990.
Y. F. Hu and R. J. Blake. An optimal dynamic load balancing algorithm. Technical report DL-P–95–011. Daresbury Laboratory, Warrington, UK, 1995.
D. A. Huffman. A method for the construction of minimum redundancy codes. Proceedings of the IRE 40, pp. 1098–1101, 1952.
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Technical report 95–064. Department of Computer Science, University of Minnesota, Minneapolis, 1995.
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. Technical report 95–035. Department of Computer Science, University of Minnesota, Minneapolis, 1995.
B. W. Kernigham and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2):292–370, 1970.
L. Lapidus and C. F. Pinder. Numerical Solution of Partial Differential Equations in Science and Engineering. Wiley, New York, 1983.
F. C. H. Lin, and R. M. Keller. The gradient model load balancing method. IEEE Trans. Software Engineering, SE-13(1):32–38, 1987.
D. M. Nicol. Rectilinear partitioning of irregular data parallel computations. Journal of Parallel and Distributed Computing, 23(2):119–134, 1994.
L. Oliker and R. Biswas. Efficient load balancing and data remapping for adaptive grid calculations. Technical report, NASA Ames Research Center, Moffett Field, Calif., 1997.
C. W. Ou, S. Ranka, and G. Fox. Fast and parallel mapping algorithms for irregular problems. The Journal of Supercomputing, 10(2):119–140, 1996.
C. W. Ou and S. Ranka. Parallel incremental graph partitioning. IEEE Trans. Parallel and Distributed Systems, 8(8):884–896, 1997.
F. Pellegrini and J. Roman. Scotch: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. Proceedings of HPCN' 96, pp. 493–498, Apr. 1996.
J. R. Pilkington and S. B. Baden. Dynamic partitioning of non-uniform structured workloads with spacefilling curves. IEEE Trans. Parallel and Distributed Systems, 7(3):288–300, 1996.
R. Preis and R. Diekmann. The PARTY partitioning- - library user guide- - version 1.1. Heniz Nexdorf Institute Universitat, Paderborn, Germany, Sep. 1996.
S. Ranka, Y. Won, and S. Sahni. Programming a hypercube multicomputer. IEEE Software, 5(5):69–77, 1988.
K. Schloegel, G. Karypis, and V. Kumar. Parallel multilevel diffusion algorithms for repartitioning of adaptive meshes. Technical report #97–014. University of Minnesota, Department of Computer Science and Army HPC Center, 1997.
K. Schloegel, G. Karypis, and V. Kumar. Multilevel diffusion schemes for repartitioning of adaptive meshes. Technical report #97–013. University of Minnesota, Department of Computer Science, Jun. 1997.
W. Shu and M. Y. Wu. Runtime incremental parallel scheduling RIPS on distributed memory computers. IEEE Trans. Parallel and Distributed Systems, 7(6):637–649, 1996.
W. Shu and M. Y. Wu. The direct dimension exchange method for load balancing in k-ary n-cubes. Proceedings of Eighth IEEE Symposium on Parallel and Distributed Processing, pp. 366–369, New Orleans, Oct. 1996.
H. D. Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2(2/3):135–148, 1991.
C. H. Walshaw and M. Berzins. Dynamic load-balancing for PDE solvers on adaptive unstructured meshes. Concurrency: Practice and Experience, 7(1):17–28, 1995.
C. H. Walshaw, M. Cross, and M. G. Everett. A localized algorithm for optimizing unstructured mesh partitions. The International Journal of Supercomputer Applications, 9(4):280–295, 1995.
C. Walshaw, M. Cross, and M. G. Everett. Dynamic mesh partitioning: a unified optimisation and load-balancing algorithm. Technical Report 95/IM/06. University of Greenwich, London, SE18 6PF, UK, Dec. 1995.
C. Walshaw. The Jostle User Manual: Version 2.0. University of Greenwich, London, UK, July, 1997.
M. Willebeek-LeMair and A. P. Reeves. Strategies for dynamic load balancing on highly parallel computers. IEEE Trans. Parallel and Distributed Systems, 4(9):979–993, 1993.
R. D. Williams. Performance of dynamic load balancing algorithms for unstructured mesh calculations. Councurrency: Practice and Experience, 3(5):457–481, 1991.
R. D. Williams. DIME: Distributed Irregular Mesh Environment. California Institute of Technology, 1990.
M. Y. Wu. On runtime parallel scheduling for processor load balancing. IEEE Trans. Parallel and Distributed Systems, 8(2):173–186, 1997.
C. Z. Xu and F. C. M. Lau. Analysis of the generalized dimension exchange method for dynamic load balancing. Journal of Parallel and Distributed Computing, 16(4):385–393, 1992.
C. Z. Xu and F. C. M. Lau. The generalized dimension exchange method for load balancing in k-ary n-cubes and variants. Journal of Parallel and Distributed Computing, 24(1):72–85, 1995.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chung, YC., Liao, CJ. & Yang, DL. A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers. The Journal of Supercomputing 15, 25–49 (2000). https://doi.org/10.1023/A:1008117625893
Issue Date:
DOI: https://doi.org/10.1023/A:1008117625893