On the Performance of Parallel Matrix Factorisation on the Hypermesh | The Journal of Supercomputing Skip to main content
Log in

On the Performance of Parallel Matrix Factorisation on the Hypermesh

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Most common multicomputer networks, e.g. d-ary h-cubes, are graph topologies where an edge (channel) interconnects exactly two vertices (nodes). Hypergraphs are a generalisation of the graph model, where a channel interconnects an arbitrary number of nodes. Previous studies have used synthetic workloads (e.g. statistical distributions) to stress the superior performance characteristics of regular multi-dimensional hypergraphs, also known as hypermeshes, over d-ary h-cubes. There has been, however, hardly any study that has considered real-world parallel applications. This paper contributes towards filling this gap by providing a comparative study of the performance of one of the most common numerical problems, namely matrix factorisation, on the hypermesh, hypercube, and d-ary h-cube. To this end, the paper first introduces orthogonal networks as a unified model for describing both the graph and hypergraph topologies. It then develops a generalised parallel algorithm for matrix factorisation and evaluates its performance on the hypermesh, hypercube and d-ary h-cube. The results reveal that the hypermesh supports matrix computation more efficiently, and therefore provides more evidence of the hypermesh as a viable network for future large-scale multicomputers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Agrawal. Limits on interconnection network performance, IEEE Trans. Parallel & Distributed Systems, 2: 398-412, 1991.

    Google Scholar 

  2. A. Al-Ayyoub and K. Day. Matrix decomposition on the star graph, IEEE Trans. Parallel & Distributed Systems, 8(8):803-812, 1997.

    Google Scholar 

  3. M. Angelacci and M. Colajanni. Subcube matrix decomposition: A unifying view of LU factorization on multicomputers, Parallel Computing 20(2):257-270, 1994.

    Google Scholar 

  4. M. Angelacci and M. Colajanni. Unifying and optimizing parallel linear algebra algorithms, IEEE Trans. Parallel & Distributed Systems, 4(12):1382-1397, 1993.

    Google Scholar 

  5. C. Berge. Graphs and Hypergraphs, North-Holland, 1973.

  6. L. M. Bhuyan and P. D. Agarwal, Generalised hypercube and hyperbus structures for a computer network, IEEE Trans. Computers, 33(4):323-333, 1984.

    Google Scholar 

  7. T. Boku, K. Itakura, H. Nakamura, and K. Nakazawa. CP-CAPS: A massively parallel processor for large-scale scientific calculations, Proc. ACM Supercomputing '97, pp. 108-115, 1997.

  8. P. Cappello. A Mesh Automaton for Solving Dense Linear Systems, Proc. Int. Conference Parallel Processing, pp. 418-425, 1985.

  9. P. Cappello. Gaussian elimination on hypercube automaton, Journal of Parallel and Distributed Computing, 3(4):288-308, 1987.

    Google Scholar 

  10. K. Day and A. Al-Ayyoub. The Cross product of interconnection networks, IEEE Trans. Parallel & Distributed Systems, 8: 109-118, 1997.

    Google Scholar 

  11. K. Dackland, E. Elmroth, B. Kågström, and C. Van Loan. Design and Evaluation of Parallel Block Algorithms: LU Factorization on the IBM 3090 VF/600J, Proc. 5th SIAM Conf. Parallel Processing for Scientific Computing, pp. 3-10, 1992.

  12. H. Fujii, Y. Yasuda, H. Akashi, Y. Inagami, M. Koga, O. Ishihara, M. Kashiyama, H. Wada, and T. Sumimoto. Architecture and performance of the Hitachi SR2201 massively parallel processor system, Proc. Parallel Processing Symposium, pp. 233-241, 1997.

  13. W. Giloi and S. Montenegro. Choosing the interconnect of distributed-memory systems by cost and blocking behaviour, Proc. Int. Parallel Processing Symposium, pp. 438-444, 1991.

  14. S. Graham and S. Seidel. The cost of broadcasting on star graphs and k-ary hypercubes, IEEE Transactions on Computers, 42(6):756-759, 1993.

    Google Scholar 

  15. D. B. Gustavson and D.B. Theus. Wire-or logic on transmission lines, IEEE Micro, 3: 51-55, 1983.

    Google Scholar 

  16. G. Laszewski, M. Patasher, A. G. Mohamed, and G. C. Fox. On the parallelization of blocked LU factorization algorithms on distributed memory architectures, Proc. Supercomputing'92, pp. 170-179, 1992.

  17. W. Lichtenstein and S. Johnsson. “Block cyclic dense linear algebra”. SIAM J. Sci. Comp. 14(6):1257-1286, 1993.

    Google Scholar 

  18. S. F. Nugent. The iPSC/2 direct-connect communication technology, Proc. Conf. Hypercube Concurrent Computers and Applications, pp. 51-60, 1989.

  19. M. Ould-Khaoua, L. Mackenzie and R. Sotudeh, Comparative evaluation of hypermesh and multi-Stage interconnection networks, Computer Journal, 39(3):232-240, 1996.

    Google Scholar 

  20. M. Ould-Khaoua and R. Sotudeh. Performance evaluation of hypermeshes and meshes with wormhole routing, The Journal of Systems Architecture, 43(1–5):345-353, 1997.

    Google Scholar 

  21. M. Ould-Khaoua and R. Sotudeh. Communication locality in hypermeshes and tori, Proc. 2nd IEEE Int. Conf. Algorithms & Architectures for Parallel Processing, pp. 256-262, 1996.

  22. B. Purushotham, A. Basu and P. Kumar. Performance estimation of LU factorization of message passing multiprocessors, Parallel Processing Letters, 21(1):51-60, 1992.

    Google Scholar 

  23. Y. Saad. Gaussian elimination on hypercubes, in parallel algorithms and architectures, M. Cosnard, Y. Robert, P. Quinton, and M. Tchuente, eds., Elsevier Science Publishers, pp. 5-18, 1986.

  24. I. Scherson. Orthogonal graphs for a class of interconnection networks, IEEE Trans. Parallel & Distributed Systems, 2(1):, 13-19, 1991.

    Google Scholar 

  25. C. L. Seitz. The Cosmic Cube, CACM, 28: 22-33, 1985.

    Google Scholar 

  26. S. Starts and A. Beris. LU decomposition optimized for parallel computer with a hierarchical distributed memory, Parallel Computing, 18: 959-971, 1992.

    Google Scholar 

  27. T. Szymanski. The complexity of FFT and related butterfly algorithms on meshes and hypermeshes, Proc. Int. Conference on Parallel Processing, pp. 77-81, 1992.

  28. T. Szymanski. Hypermeshes: optical interconnection networks for parallel processing, Journal of Parallel & Distributed Computing, 26: 1-26, 1995.

    Google Scholar 

  29. N. Tanabe, T. Suzuoka, S. Nakamura, Y. Kawakura, and S. Oyanagi. Base-m n-cube: High performance interconnection networks for highly parallel computers PRODIGY, Proc. Int. Conf. Parallel Processing, pp. 509-516, 1991.

  30. L. Wittie. Communication structures for large networks of microcomputers, IEEE Trans. Computers C30(4):264-273, 1981.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Ayyoub, A., Ould-Khaoua, M. & Day, K. On the Performance of Parallel Matrix Factorisation on the Hypermesh. The Journal of Supercomputing 20, 37–53 (2001). https://doi.org/10.1023/A:1011140203528

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011140203528