Abstract
This paper introduces a novel master-multi-SIMD on-chip multi-core architecture for embedded signal processing. The parallel architecture and its memory subsystem are described in this paper. We evaluate the large size matrix multiplication performance on this parallel architecture and compare it with a SIMD-extended data parallel architecture. We also examine how well the new architecture scales for different numbers of SIMD co-processors. The experimental results show that the ePUMA architecture’s memory subsystem can effectively hide the data access overhead. With its 8-way SIMD data path and multi-SIMD parallel execution, the ePUMA architecture improves the performance of matrix multiplication with a speedup of 45x from the conventional SIMD extension.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liu, D.: Embedded DSP Processor Design, ch. 20. Morgen-Kaufmann, Linköping (2008)
ARM Media Extensions, http://www.arm.com/products/CPUs/arch-simd.html
Tyler, J., Lent, J., Mather, A., Nauyen, H.: AltiVecTM: Bringing Vector Technology to the PowerPCTM Processor Family. In: IEEE International IPCCC 1999, February 10-12, pp. 437–444 (1999)
Kumura, T., Ikekawa, M., Yosbida, M., Kuroda, I.: VLIW DSP for mobile applications. IEEE Signal Processing Magazine 19(4), 10–21 (2002)
Chang, H., Cho, J., Sung, W.: Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit. IEEE SIPS, Banff, 71–76 (2006)
Weiss, M., Fettweis, G.: Dynamic Codewidth Reduction for VLIW Instruction Set Architectures in Digital Signal Processors. In: 3rd International Workshop on Image ana’ Signal Processing, pp. 517–520 (1996)
Ainsworth, T.W., Pinkston, T.M.: Characterizing The Cell Eib On-Chip Network. IEEE Micro 27(5), 6–14 (2007)
Gössel, M., Rebel, B., Creutzburg, R.: Memory Architecture and Parallel Access. Elsevier Science, Amsterdam (1994)
Lundgren, B., Ödlund, A.: Expose of patterns in parallel memory access. Master thesis, Linköping university, LiTH-ISY-EX–07/4005-SE
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sohl, J., Wang, J., Liu, D. (2009). Large Matrix Multiplication on a Novel Heterogeneous Parallel DSP Architecture. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-03644-6_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)