A High Performance Heterogeneous Architecture and Its Optimization Design | SpringerLink
Skip to main content

A High Performance Heterogeneous Architecture and Its Optimization Design

  • Conference paper
High Performance Computing and Communications (HPCC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4208))

  • 813 Accesses

Abstract

The widely adoption of media processing applications provides great challenges to high performance embedded processor design. This paper studies a Data Parallel Coprocessor architecture based on SDTA and architecture de-cisions are made for the best performance/cost ratio. Experimental results on a prototype show that SDTA has high performance to run many embedded media processing applications. The simplicity and flexibility of SDTA encourages for further development for its reconfigurable functionality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Fritts, J.E., Steiling, F.W., Tucek, J.A.: MediaBench II Video: Expediting the next generation of video systems research. In: Embedded Processors for Multimedia and Communications II. San Jose, California, March 8, pp. 79–93 (2005) ISBN / ISSN: 0-8194-5656-X

    Google Scholar 

  2. Berry, M.W.: Scientific Workload Characterization By Loop-Based Analyses. SIGMETRICS Performance Evaluation Review 19(3), 17–29 (1992)

    Article  Google Scholar 

  3. Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell multiprocessor. IBM Journal. Research & Development 49(4/5) (July/September 2005)

    Google Scholar 

  4. Krewell, K.: Cell moves into the limelight. Microprocessor Report. February 14 (2005)

    Google Scholar 

  5. Fritts, J.: Multi-level Memory Prefetching for Media and Stream Processing. In: Proc. of the IEEE International Conference on Multimedia and Expo (ICME2002), pp. 101–104 (August 2002)

    Google Scholar 

  6. Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proc. of the 17th Annual International Symposium on Computer Architecture, pp. 364–373 (May 1990)

    Google Scholar 

  7. Palacharla, S., Kessler, R.: Evaluating stream buffers as a secondary cache replacement. In: Proc. of the 21st Annual International Symposium on Computer Architecture, pp. 24–33 (April 1994)

    Google Scholar 

  8. Fu, J.W.C., Patel, J.H.: Data prefetching in multi-processor vector cache memories. In: Proc. of the 18th Annual International Symposium on Computer Architecture, pp. 54–63 (May 1991)

    Google Scholar 

  9. Fu, J., Patel, J., Janssens, B.: Stride directed prefetching in scalar processors. In: Proc. of the 25th International Symposium on Microarchitecture, pp. 102–110 (December 1992)

    Google Scholar 

  10. Zucker, D., Flynn, M., Lee, R.: A Comparison of Hardware Prefetching Techniques For Multimedia Benchmarks. In: 3rd. IEEE International Conference on Multimedia Computing and Systems, Hiroshima, Japan, pp. 236–244 (June 1996)

    Google Scholar 

  11. Jain, M.K., Balakrishnan, M.: ASIP Design Methodologies: Survey and Issues. In: Proc. of the 14th International Conference on VLSI Design (VLSID 2001), pp. 76–81 (January 2001)

    Google Scholar 

  12. Corporaal, H., Mulder, H.: MOVE: A framework for high-performance processor design. In: Supercomputing 1991, pp. 692–701 (November 1991)

    Google Scholar 

  13. Hoogerbrugge, J.: Code generation for Transport Triggered Architectures. PhD thesis, Delft Univ.of Technology (February 1996) ISBN 90-9009002-9

    Google Scholar 

  14. Leon3 Processor Introduction, http://www.gaisler.com/cms4_5_3/index.php?option=com_content&task=view&id=13&Itemid=53

  15. Volder, J.E.: The CORDIC trigonometric computing technique. IRE Transactions on Electronic Computers 8, 330–334 (1959)

    Article  Google Scholar 

  16. Ye, T.T.: 0n-chip multiprocessor communication network design and analysis. PhD thesis, Stanford University (December 2003)

    Google Scholar 

  17. TMS320C64x CPU and Instruction Set Reference Guide. Texas Instruments, Inc., USA (2000)

    Google Scholar 

  18. TMS320C64x DSP library programmer’s reference. Texas Instruments, Inc., USA (2003)

    Google Scholar 

  19. Hofstee, H.P.: Power Efficient Processor Architecture and The Cell Processor. In: Proc. of the 11th International Symposium on High-Performance Computer Architecture (HPCA 2005), San Francisco, CA, USA, pp. 258–262 (February 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, J., Dai, K., Wang, Z. (2006). A High Performance Heterogeneous Architecture and Its Optimization Design. In: Gerndt, M., Kranzlmüller, D. (eds) High Performance Computing and Communications. HPCC 2006. Lecture Notes in Computer Science, vol 4208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847366_31

Download citation

  • DOI: https://doi.org/10.1007/11847366_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39368-9

  • Online ISBN: 978-3-540-39372-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics