A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers | The Journal of Supercomputing Skip to main content
Log in

A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we propose a prefix code matching parallel load-balancing method (PCMPLB) to efficiently deal with the load imbalance of solution-adaptive finite element application programs on distributed memory multicomputers. The main idea of the PCMPLB method is first to construct a prefix code tree for processors. Based on the prefix code tree, a schedule for performing load transfer among processors can be determined by concurrently and recursively dividing the tree into two subtrees and finding a maximum matching for processors in the two subtrees until the leaves of the prefix code tree are reached. We have implemented the PCMPLB method on an SP2 parallel machine and compared its performance with two load-balancing methods, the directed diffusion method and the multilevel diffusion method, and five mapping methods, the AE/ORB method, the AE/MC method, the MLkP method, the PARTY library method, and the JOSTLE-MS method. An unstructured finite element graph Truss was used as a test sample. During the execution, Truss was refined five times. Three criteria, the execution time of mapping/load-balancing methods, the execution time of an application program under different mapping/load-balancing methods, and the speedups achieved by mapping/load-balancing methods for an application program, are used for the performance evaluation. The experimental results show that (1) if a mapping method is used for the initial partitioning and this mapping method or a load-balancing method is used in each refinement, the execution time of an application program under a load-balancing method is less than that of the mapping method. (2) The execution time of an application program under the PCMPLB method is less than that of the directed diffusion method and the multilevel diffusion method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. I. G. Angus, G. C. Fox, J. S. Kim, and D. W. Walker. Solving Problems on Concurrent Processors, Vol. 2. Prentice-Hall, Englewood Cliffs, N. J., 1990.

    Google Scholar 

  2. C. Aykanat, F. Ozguner, F. Ercal, and P. Sadayaooan. Iterative algorithms for solution of large sparse systems of linear equations on hypercubes. IEEE Trans. on Computers, 37(12):1554–1568, 1988.

    Google Scholar 

  3. C. Aykanat, F. Ozguner, S. Martin, and S. M. Doraivelu. Parallelization of a finite element application program on a hypercube multiprocessor. Hypercube Multiprocessor, 662–673, 1987.

  4. S. B. Baden. Programming abstractions for dynamically partitioning and coordinating localized scientific calculations running on multiprocessors. SIAM Journal on Scientific and Statistical Computing, 12(1):145–157, 1991.

    Google Scholar 

  5. S. T. Barnard and H. D. Simon. Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency: Practice and Experience, 6(2):101–117, 1994.

    Google Scholar 

  6. S. T. Barnard and H. D. Simon. A parallel implementation of multilevel recursive spectral bisection for application to adaptive unstructured meshes. Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, pp. 627–632. San Francisco, Feb. 1995.

  7. J. A. Bondy and U. S. R. Murty. Graph Theory with Applications. Elsevier North Holland, New York, 1976.

    Google Scholar 

  8. Y. C. Chung and C. J. Liao, A processor oriented partitioning method for mapping unstructured finite element models on SP2 parallel machines. Technical report. Institute of Information Engineering, Feng Chia University, Taichung, Taiwan, 1996.

    Google Scholar 

  9. G. Cybenko. Dynamic load balancing for distributed memory multiprocessors. Journal of Parallel and Distributed Computing, 7(2):279–301, 1989.

    Google Scholar 

  10. F. Ercal, J. Ramanujam, and P. Sadayappan. Task allocation onto a hypercube by recursive mincut bipartitioning. Journal of Parallel and Distributed Computing, 10:35–44, 1990.

    Google Scholar 

  11. C. Farhat and H. D. Simon. TOP/DOMDEC—a software tool for mesh partitioning and parallel processing. Technical report RNR–93–011. NASA Ames Research Center, 1993.

  12. C. M. Fiduccia and R. M. Mattheyes. A linear-time heuristic for improving network partitions. Proceeding of the 19th IEEE Design Automation Conference, pp. 175–181, 1982.

  13. M. R. Garey and D. S. Johnson. Computers and Intractability, A Guide to Theory of NP-Completeness. Freeman, San Francisco, 1979.

    Google Scholar 

  14. J. R. Gilbert and E. Zmijewski. A parallel graph partitioning algorithm for a message-passing multiprocessor. International Journal of Parallel Programming, 16(6):427–449, 1987.

    Google Scholar 

  15. J. R. Gilbert, G. L. Miller, and S. H. Teng. Geometric mesh partitioning: implementation and experiments. Proceedings of 9th International Parallel Processing Symposium, pp. 418–427. Santa Barbara, Calif. Apr. 1995.

  16. A. Heirich and S. Taylor. A Parabolic Load Balancing Method, Proceeding of ICPP' 95, pp. 192–202, 1995.

  17. B. Hendrickson and R. Leland. The Chaco user's guide: version 2.0. Technical report SAND94–2692. Sandia National Laboratories, Albuquerque, NM, Oct. 1994.

    Google Scholar 

  18. B. Hendrickson and R. Leland. An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM Journal on Scientific Computing, 16(2):452–469, 1995.

    Google Scholar 

  19. B. Hendrickson and R. Leland. An multilevel algorithm for partitioning graphs. Proceeding of Supercomputing' 95, Dec. 1995.

  20. G. Horton. A multi-level diffusion method for dynamic load balancing. Parallel Computing, 19:209–218, 1993.

    Google Scholar 

  21. S. H. Hosseini, B. Litow, M. Malkawi, J. Mcpherson, and K. Vairavan. Analysis of a graph coloring based distributed load balancing algorithm. Journal of Parallel and Distributed Computing, 10(2):160–166, 1990.

    Google Scholar 

  22. Y. F. Hu and R. J. Blake. An optimal dynamic load balancing algorithm. Technical report DL-P–95–011. Daresbury Laboratory, Warrington, UK, 1995.

    Google Scholar 

  23. D. A. Huffman. A method for the construction of minimum redundancy codes. Proceedings of the IRE 40, pp. 1098–1101, 1952.

  24. G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Technical report 95–064. Department of Computer Science, University of Minnesota, Minneapolis, 1995.

    Google Scholar 

  25. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. Technical report 95–035. Department of Computer Science, University of Minnesota, Minneapolis, 1995.

    Google Scholar 

  26. B. W. Kernigham and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2):292–370, 1970.

    Google Scholar 

  27. L. Lapidus and C. F. Pinder. Numerical Solution of Partial Differential Equations in Science and Engineering. Wiley, New York, 1983.

    Google Scholar 

  28. F. C. H. Lin, and R. M. Keller. The gradient model load balancing method. IEEE Trans. Software Engineering, SE-13(1):32–38, 1987.

    Google Scholar 

  29. D. M. Nicol. Rectilinear partitioning of irregular data parallel computations. Journal of Parallel and Distributed Computing, 23(2):119–134, 1994.

    Google Scholar 

  30. L. Oliker and R. Biswas. Efficient load balancing and data remapping for adaptive grid calculations. Technical report, NASA Ames Research Center, Moffett Field, Calif., 1997.

    Google Scholar 

  31. C. W. Ou, S. Ranka, and G. Fox. Fast and parallel mapping algorithms for irregular problems. The Journal of Supercomputing, 10(2):119–140, 1996.

    Google Scholar 

  32. C. W. Ou and S. Ranka. Parallel incremental graph partitioning. IEEE Trans. Parallel and Distributed Systems, 8(8):884–896, 1997.

    Google Scholar 

  33. F. Pellegrini and J. Roman. Scotch: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. Proceedings of HPCN' 96, pp. 493–498, Apr. 1996.

  34. J. R. Pilkington and S. B. Baden. Dynamic partitioning of non-uniform structured workloads with spacefilling curves. IEEE Trans. Parallel and Distributed Systems, 7(3):288–300, 1996.

    Google Scholar 

  35. R. Preis and R. Diekmann. The PARTY partitioning- - library user guide- - version 1.1. Heniz Nexdorf Institute Universitat, Paderborn, Germany, Sep. 1996.

    Google Scholar 

  36. S. Ranka, Y. Won, and S. Sahni. Programming a hypercube multicomputer. IEEE Software, 5(5):69–77, 1988.

    Google Scholar 

  37. K. Schloegel, G. Karypis, and V. Kumar. Parallel multilevel diffusion algorithms for repartitioning of adaptive meshes. Technical report #97–014. University of Minnesota, Department of Computer Science and Army HPC Center, 1997.

  38. K. Schloegel, G. Karypis, and V. Kumar. Multilevel diffusion schemes for repartitioning of adaptive meshes. Technical report #97–013. University of Minnesota, Department of Computer Science, Jun. 1997.

  39. W. Shu and M. Y. Wu. Runtime incremental parallel scheduling RIPS on distributed memory computers. IEEE Trans. Parallel and Distributed Systems, 7(6):637–649, 1996.

    Google Scholar 

  40. W. Shu and M. Y. Wu. The direct dimension exchange method for load balancing in k-ary n-cubes. Proceedings of Eighth IEEE Symposium on Parallel and Distributed Processing, pp. 366–369, New Orleans, Oct. 1996.

  41. H. D. Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2(2/3):135–148, 1991.

    Google Scholar 

  42. C. H. Walshaw and M. Berzins. Dynamic load-balancing for PDE solvers on adaptive unstructured meshes. Concurrency: Practice and Experience, 7(1):17–28, 1995.

    Google Scholar 

  43. C. H. Walshaw, M. Cross, and M. G. Everett. A localized algorithm for optimizing unstructured mesh partitions. The International Journal of Supercomputer Applications, 9(4):280–295, 1995.

    Google Scholar 

  44. C. Walshaw, M. Cross, and M. G. Everett. Dynamic mesh partitioning: a unified optimisation and load-balancing algorithm. Technical Report 95/IM/06. University of Greenwich, London, SE18 6PF, UK, Dec. 1995.

    Google Scholar 

  45. C. Walshaw. The Jostle User Manual: Version 2.0. University of Greenwich, London, UK, July, 1997.

    Google Scholar 

  46. M. Willebeek-LeMair and A. P. Reeves. Strategies for dynamic load balancing on highly parallel computers. IEEE Trans. Parallel and Distributed Systems, 4(9):979–993, 1993.

    Google Scholar 

  47. R. D. Williams. Performance of dynamic load balancing algorithms for unstructured mesh calculations. Councurrency: Practice and Experience, 3(5):457–481, 1991.

    Google Scholar 

  48. R. D. Williams. DIME: Distributed Irregular Mesh Environment. California Institute of Technology, 1990.

  49. M. Y. Wu. On runtime parallel scheduling for processor load balancing. IEEE Trans. Parallel and Distributed Systems, 8(2):173–186, 1997.

    Google Scholar 

  50. C. Z. Xu and F. C. M. Lau. Analysis of the generalized dimension exchange method for dynamic load balancing. Journal of Parallel and Distributed Computing, 16(4):385–393, 1992.

    Google Scholar 

  51. C. Z. Xu and F. C. M. Lau. The generalized dimension exchange method for load balancing in k-ary n-cubes and variants. Journal of Parallel and Distributed Computing, 24(1):72–85, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chung, YC., Liao, CJ. & Yang, DL. A Prefix Code Matching Parallel Load-Balancing Method for Solution-Adaptive Unstructured Finite Element Graphs on Distributed Memory Multicomputers. The Journal of Supercomputing 15, 25–49 (2000). https://doi.org/10.1023/A:1008117625893

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008117625893

Navigation