Abstract
The compiler is generally regarded as the most important software component that supports a processor design to achieve success. This paper describes our application of the open research compiler infrastructure to a novel VLIW DSP (known as the PAC DSP core) and the specific design of code generation for its register file architecture. The PAC DSP utilizes port-restricted, distributed, and partitioned register file structures in addition to a heterogeneous clustered data-path architecture to attain low power consumption and a smaller die. As part of an effort to overcome the new challenges of code generation for the PAC DSP, we have developed a new register allocation scheme and other retargeting optimization phases that allow the effective generation of high quality code. Our preliminary experimental results indicate that our developed compiler can efficiently utilize the features of the specific register file architectures in the PAC DSP. Our experiences in designing compiler support for the PAC VLIW DSP with irregular resource constraints may also be of interest to those involved in developing compilers for similar architectures.
References
The SUIF 2 compiler system, http://suif.stanford.edu/suif/suif2.
P.P. Chang et al., “IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors,” in Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Canada, vol. 28, no. 5 1991, pp. 266–275.
ReaCT-ILP Laboratory, “Trimaran: An Infrastructure for Research in Instruction-Level Parallelism,” http://www.trimaran.org.
A. Andrew et al., “The Zephyr Compiler Infrastructure,” http://www.cs.virginia.edu/zephyr/.
The GNU Compiler Collection, http://gcc.gnu.org.
R. Ju, S. Chan and C. Wu, “Open Research Compiler for the Itanium Family,” Tutorial at the 34th Annual International Symposium on Microarchitecture, Dec. 2001.
G.R. Gao, J.N. Amaral, J. Dehnert and R. Towle, “The SGI Pro64 compiler infrastructure: A tutorial,” in Tutorial at the International Conference on Parallel Architecture and Compilation Techniques, Oct. 2000.
T.-J. Lin, C.-C. Lee, C.-W. Liu and C.-W. Jen, “A Novel Register Organization for VLIW Digital Signal Processors,” in Proc. of 2005 IEEE Int. Symp. on VLSI Design, Automation, and Test, 2005, pp. 335–338.
T.-J. Lin, P.-C. Hsiao, C.-W. Liu and C.-W. Jen, “Area-Efficient Register Organization for Fully-Synthesizable VLIW DSP Cores”, International Journal of Electrical Engineering, vol. 13, May 2006.
D. Chang and M. Baron, “Taiwan’s Roadmap to Leadership in Design,” Microprocessor Report, In-Stat/MDR, Dec. 2004. http://www.mdronline.com/mpr/archive/mpr\_2004.html.
D.C.-W. Chang, C.-W. Jen, I-T. Liao, J.-K. Lee, W.-F. Chen and S.-Y. Tseng, “ PAC DSP Core and Application Processors,” in Proc. of the IEEE Int. Conf. on Multimedia & Expo, Toronto, July 9–12, 2006.
T.-J. Lin, C.-C. Chang, C.-C. Lee and C.-W. Jen, “An Efficient VLIW DSP Architecture for Baseband Processing,” in Proceedings of the 21th International Conference on Computer Design, 2003.
T.-J. Lin, C.-M. Chao, C.-H. Liu, P.-C. Hsiao, S.-K. Chen, L.-C. Lin, C.-W. Liu, C.-W. Jen, “Computer Architecture: A Unified Processor Architecture for RISC & VLIW DSP,” in Proceedings of the 15th ACM Great Lakes symposium on VLSI, April 2005.
TMS320DM6443 Digital Media System-on-Chip Datasheet, Texas Instruments, 2006.
S. Rixner, W.J. Dally, B. Khailany, P. Mattson, U.J. Kapasi and J.D. Owens, “Register organization for media processing,” in International Symposium on High Performance Computer Architecture (HPCA), pp. 375–386, 2000.
A. Capitanio, N. Dutt and A. Nicolau, “Partitioned register files for VLIW’s: A preliminary analysis of tradeoffs,” in Procs. of the 25th Int. Symp. on Microarchitecture: Portland, OR, December 1–4, 1992, pp. 292–300.
A. Terechko, E.L. Thenaff, M. Garg, Eijndhoven and H. Corporaal, “Inter-cluster communication models for clustered VLIW processors,” in Procs. HPCA, 2003, pp. 354–364.
WHIRL Intermediate Language Specification, “SGI,” 2000.
Y.-P. You, C.-R. Lee and J.K. Lee, “Compiler Analysis and Supports for Leakage Power Reduction on Microprocessors,” in LCPC’02, USA, July 2002.
C.-R. Lee, J.-K. Lee, T.-T. Hwang and S.-C. Tsai, “Compiler Optimizations on VLIW Instruction Scheduling for Low Power,” ACM Transact. Des. Automat. Electron. Syst., vol. 8, no. 2, 2003, pp. 252–268.
Y.-P. You, C.-W. Huang and J.-K. Lee, A Sink-N-Hoist Framework for Leakage Power Reduction,” in Proceedings of ACM EMSOFT 2005, September 2005.
P.-S. Chen, M.-Y. Hung, Y.-S. Hwang, R. D.-C. Ju and J.K. Lee, “Compiler Support for Speculative Multithreading Architecture with Probabilistic Points-To Analysis,” in Proceedings of ACM Principles and Practices of Parallel Programming (ACM PPoPP), San Diego, 2003.
P.-S. Chen, Y.-S. Hwang, D.-C. Ju and J.K. Lee, “Interprocedural Probabilistic Pointer Analysis,” IEEE Trans. Parallel Distrib. Syst., vol. 15, no. 10, Oct. 2004, pp. 893–907.
Y.-C. Lin, Y.-S. Hwang and J.K. Lee, “Compiler Optimizations with DSP-Specific Semantic Descriptions,” in LCPC’02, USA, July 2002.
John R. Hauser. SoftFloat. http://www.jhauser.us/arithmetic/SoftFloat.html.
C.-W. Chen, C.-L. Tang, Y.-C. Lin and J.-K. Lee, “ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors,” in Proceedings of 2005 IEEE International Symposium on VLSI Design, Automation, and Test, 2005, pp. 224–227.
S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi, “Optimization by Simulated Annealing,” Science, vol. 220, no. 4598, 1983, pp. 671–680.
P. Salamon, P. Sibani and R. Frost, “Facts, Conjectures, and Improvements for Simulated Annealing. ser. Monographs on Mathematical Modeling and Computation,” Society for Industrial and Applied Mathematics, no. 7, 2002.
R. Leupers, “Instruction scheduling for clustered VLIW DSPs,” in Proc. Int’l Conference on Parallel Architecture and Compilation Techniques, Oct. 2000, pp. 291–300.
Y.-C. Lin, Y.-P. You and J.-K. Lee, “Register Allocation for VLIW DSP Processors with Irregular Register Files,” in CPC 2006, Spain, Jan. 2006.
A.V. Aho, R. Sethi and J.D. Ullman, “Compilers: Principles, Techniques and Tools,” Addison-Wesley, November 1985.
M.E. Wolf, D.E. Maydan and D.-K. Chen, “Combining loop transformations considering caches and scheduling,” International Journal of Parallel Programming, vol. 26, no. 4, 1998.
V. Zivojnovic, J. Martinez, C. Schläger and H. Meyr, “DSPstone: A DSP-Oriented Benchmarking Methodology,” Proc. of ICSPAT, Dallas, 1994.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is being submitted to the Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology.
Rights and permissions
About this article
Cite this article
Lin, YC., Lu, C.H., Wu, CJ. et al. Effective Code Generation for Distributed and Ping-Pong Register Files: A Case Study on PAC VLIW DSP Cores. J Sign Process Syst Sign Image 51, 269–288 (2008). https://doi.org/10.1007/s11265-007-0059-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-007-0059-4