DOI QR코드

DOI QR Code

Scratchpad Memory Architectures and Allocation Algorithms for Hard Real-Time Multicore Processors

  • Liu, Yu (Department of Electrical and Computer Engineering, Virginia Commonwealth University) ;
  • Zhang, Wei (Department of Electrical and Computer Engineering, Virginia Commonwealth University)
  • Received : 2015.02.27
  • Accepted : 2015.03.31
  • Published : 2015.06.30

Abstract

Time predictability is crucial in hard real-time and safety-critical systems. Cache memories, while useful for improving the average-case memory performance, are not time predictable, especially when they are shared in multicore processors. To achieve time predictability while minimizing the impact on performance, this paper explores several time-predictable scratch-pad memory (SPM) based architectures for multicore processors. To support these architectures, we propose the dynamic memory objects allocation based partition, the static allocation based partition, and the static allocation based priority L2 SPM strategy to retain the characteristic of time predictability while attempting to maximize the performance and energy efficiency. The SPM based multicore architectural design and the related allocation methods thus form a comprehensive solution to hard real-time multicore based computing. Our experimental results indicate the strengths and weaknesses of each proposed architecture and the allocation method, which offers interesting on-chip memory design options to enable multicore platforms for hard real-time systems.

Keywords

References

  1. R. Banakar, S. Steinke, B. S. Lee, M. Balakrishnan, and P. Marwedel, "Scratchpad memory: design alternative for cache on-chip memory in embedded systems," in Proceedings of the 10th International Symposium on Hardware/Software Codesign, Estes Park, CO, 2002, pp. 73-78.
  2. L. Wehmeyer and P. Marwedel, "Influence of onchip scratchpad memories on WCET prediction," in Proceedings of the 4th International Workshop on Worst-Case Execution Time (WCET) Analysis, Catania, Italy, 2004, pp. 1-4.
  3. S. Steinke, L. Wehmeyer, B. S. Lee, and P. Marwedel, "Assigning program and data objects to scratchpad for energy reduction," in Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE), Paris, 2002, pp. 409-415.
  4. O. Avissar, R. Barua, and D. Stewart, "An optimal memory allocation scheme for scratchpad-based embedded systems," ACM Transactions on Embedded Computing Systems, vol. 1, no. 1, pp. 6-26, 2002. https://doi.org/10.1145/581888.581891
  5. Y. Liu and W. Zhang, "Exploiting time predictable two-level scratchpad memory for real-time systems," in Proceedings of the 2011 ACM Symposium on Applied Computing, Tai-Chung, Taiwan, 2011, pp. 395-396.
  6. Freescale Semiconductor Inc., MC2114 MC2113 MC2112 Advance Information, http://www.datasheetarchive.com/dl/Datasheet-02/DSA0023664.pdf.
  7. ARM Inc., ARM7TDMI technical reference manual, http://infocenter.arm.com/help/topic/com.arm.doc.ddi0210c/DDI0210B.pdf.
  8. IBM Systems and Technology Group, Cell architecture, http://moss.csc.ncsu.edu/-mueller/cluster/ps3/workshop/Day1_03_CourseCode_L1T1H1-10_CellArchitecture.pdf.
  9. Tile64 Processor Product Brief, file:///C:/Users/KIM/Downloads/1187632077329ProBrief_Tile64_Web.pdf.
  10. NVIDIA Fermi Compute Architecture, http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf.
  11. J. Cong, H. Huang, C. Liu, and Y. Zou, "A reuse-aware prefetching scheme for scratchpad memory," in Proceedings of the 48th Design Automation Conference (DAC), San Diego, CA, 2011, 960-965.
  12. T. Chen, T. Zhang, Z. Sura, and M. Tallada, "Prefetching irregular references for software cache on cell," in Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Boston, MA, 2008, pp. 155-164.
  13. Y. Liu and W. Zhang, "Exploiting multi-level scratchpad memories for time-predictable multicore computing," in Proceedings of the IEEE 30th International Conference on Computer Design (ICCD), Montreal, Canada, 2012, pp. 61-66.
  14. R. Jayaseelan, T. Mitra, and X. Li, "Estimating the worstcase energy consumption of embedded software," in Proceedings of the 12th IEEE Real-Time Technology and Applications Symposium, San Jose, CA, 2006, pp. 81-90.
  15. J. F. Deverge and I. Puaut, "WCET-directed dynamic scratchpad memory allocation of data," in Proceedings of 19th Euromicro Conference on Real-Time Systems (ECRTS'07), Pisa, Italy, 2007, pp. 179-190.
  16. M. Verma and P. Marwedel, "Overlay techniques for scratchpad memories in low power embedded processors," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 8, pp. 802-815, 2006. https://doi.org/10.1109/TVLSI.2006.878469
  17. R. A. Ravindran, P. D. Nagarkar, G.S. Dasika, E. D. Marsman, R. M. Senger, S. A. Mahlke, and R. B. Brown, "Compiler managed dynamic instruction placement in a lowpower code cache," in Proceedings of the International Symposium on Code Generation and Optimization (CGO), San Jose, CA, 2005, pp. 179-190.
  18. L. Li, L. Gao, and J. Xue, "Memory coloring: a compiler approach for scratchpad memory management," in Proceedings of 14th International Conference on Parallel Architec-tures and Compilation Techniques (PACT'05), St. Louis, MO, 2005, pp. 329-338.
  19. S. Metzlaff, S. Uhrig, J. Mische, and T. Ungerer, "Predictable dynamic instruction scratchpad for simultaneous multithreaded processors," in Proceedings of the 9th Workshop on Memory Performance: Dealing with Applications, Systems and Architecture (MEDEA'08), Toronto, Canada, 2008, pp. 38-45.
  20. H. Cho, B. Egger, J. Lee, and H. Shin, "Dynamic data scratchpad memory management for a memory subsystem with an MMU," in Proceedings of the 2007 ACM SIGPLAN/ SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), San Diego, CA, 2007, pp. 195-206.
  21. J. Whitham and N. Audsley, "The scratchpad memory management unit for microblaze, implementation, testing and case study," University of York, Technical Report YCS-2009-439, 2009.
  22. M. Verma, K. Petzold, L. Wehmeyer, H. Falk, and P. Marwedel, "Scratchpad sharing strategies for multiprocess embedded systems: a first approach," in Proceedings of 3rd Workshop on Embedded Systems for Real-Time Multimedia (ESTMEDIA), New York, NY, 2005, pp. 115-120.
  23. R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, et al., "The worst-case execution-time problem: overview of methods and survey of tools," ACM Transactions on Embedded Computing Systems, vol. 7, no. 3, article no. 36, 2008.
  24. R. Arnold, F. Mueller, D. Whalley, and M. Harmon, "Bounding worst-case instruction cache performance," in Proceedings of Real-Time Systems Symposium, San Juan, Puerto Rico, 1994, pp. 172-181.
  25. Y. T. S. Li, and S. Malik, "Performance analysis of embedded software using implicit path enumeration," in Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Real-Time Systems (LCTES), La Jolla, CA, 1995, pp. 88-98.
  26. Y. T. S. Li, S. Malik, and A. Wolfe, "Cache modeling and path analysis for modern hardware architectures," in Proceedings of the 17th IEEE Real-Time Systems Symposium, Los Alamitos, CA, 1996, pp. 254-263.
  27. M. Alt, F. Christian, M. Florian, and R. Wilhelm, "Cache behavior prediction by abstract interpretation," in Proceedings of the Static Analysis Symposium (SAS'96), Aachen, Germany, 1996, pp. 52-66.
  28. F. Sebek and J. Gustafsson, "Determining the worst-case instruction cache miss-ratio," in Proceedings of Workshop on Embedded System Codesign (ESCODES'02), San Jose, CA, 2002, pp. 1-6.
  29. Y. Liu and W. Zhang, "Stack distance based worst-case instruction cache performance analysis," in Proceedings of the 26th Annual ACM Symposium on Applied Computing (SAC'11), Taichung, Taiwan, 2011, pp. 723-728.
  30. Y. Liu and W. Zhang, "Bounding worst-case data cache performance by using stack distance," Journal of Computing Science and Engineering, vol. 3, no. 4, pp. 195-215, 2009. https://doi.org/10.5626/JCSE.2009.3.4.195
  31. J. Yan and W. Zhang, "WCET analysis for multi-core processors with shared L2 instruction caches," in Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS'08), St. Louis, MO, 2008, pp. 80-89.
  32. J. Yan and W. Zhang, "Accurately estimating worst-case execution time for multi-core processors with shared directmapped instruction caches," in Proceedings of the 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'09), Beijing, China, 2009, pp. 455-463.
  33. Y. Li, V. Suhendra, Y. Liang, T. Mitra, and A. Roychoudhury, "Timing analysis of concurrent programs running on shared cache multicores," in Proceedings of 30th IEEE Real-time System Symposium (RTSS'09), Washington, DC, 2009, pp. 57-67.
  34. M. Lv, W. Yi, N. Guan, and G. Yu,0" Combining abstract interpretation with model checking for timing analysis of multicore software," in Proceedings of 31st IEEE International Real-time System Symposium (RTSS), San Diego, CA, 2010, pp. 339-349.
  35. X. Vera, B. Lisper, and J. Xue, "Data cache locking for higher program predictability," in Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, San Diego, CA, 2003, pp. 272-282.
  36. M. Paolieri, E. Quinones, and F. Cazorla, "Hardware support for WCET analysis of hard real-time multicore systems," in Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09), Austin, TX, 2009, pp. 57-68.
  37. S. Plazar, J. Kleinsorge, P. Marwedel, and H. Falk, "WCETaware static locking of instruction caches," in Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO'12), San Jose, CA, 2012, pp. 44-52.
  38. J. Yan, W. Zhang, and Y. Liu, "Time-predictable and highperformance cache architectures for multi-core processors," in Proceedings of the WiP Session of the 30th IEEE Real-Time Systems Symposium (RTSS'09), Washington, DC, 2009, pp. 9-12.
  39. C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, and R. Wilhelm, "Predictability considerations in the design of multi-core embedded systems," in Proceedings of Embedded Real Time Software and Systems, Toulouse, France, 2010.
  40. V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen, "WCET centric data allocation to scratchpad memory," in Proceedings of the 26th IEEE Real-Time Systems Symposium (RTSS'05), Miami, FL, 2005.
  41. P. Marwedel, L. Wehmeyer, M. Verma, S. Steinke, and U. Helmig, "Fast, predictable and low energy memory references through architecture-aware compilation," in Proceedings of Asia and South Pacific Design Automation Conference, Yokohama, Japan, 2004, pp. 4-11.
  42. M. Kandemir, I. Kadayif, A. Choudhary, J. Ramanujam, and I. Kolcu, "Compiler-directed scratch pad memory optimization for embedded multiprocessors," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 3, pp. 281-287, 2004. https://doi.org/10.1109/TVLSI.2004.824299
  43. R. Banakar, S. Steinke, B. S. Lee, M. Balakrishnan, and P. Marwedel, "Comparison of cache-and scratch pad based memory systems with respect to performance, area and energy consumption," University of Dortmund, Technical Report No. 762, 2001.
  44. M. Zahran, K. Albayraktaroglu, and M. Franklin, "Noninclusion property in multi-level caches revisited," International Journal of Computers and Their Applications, vol. 14, no. 2, pp. 1-10, 2007.
  45. Trimaran, http://www.Trimaran.org.
  46. M. B. Kamble and K. Ghose, "Analytical energy dissipation models for low power cache," in Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), Monterey, CA, 1997, pp. 143-148.
  47. G. Ascia, V. Catania, M. Palesi, and D. Patti, "EPICexplorer: a parameterized VLIW-based platform framework for design space exploration," in Proceedings of the 1st Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia), Newport Beach, CA, 2003, pp. 65-72.
  48. CACTI, http://www.cacti.net/.
  49. Mälardalen Real-Time Research Center, "Real-time benchmarks," http://www.mrtc.mdh.se/projects/wcet/benchmarks.html.
  50. Powerstone Benchmarks, http://www.cse.iitd.ac.in/-asahu/cs718/BenchMarks/others/PowerStone/pocsag.c.
  51. O. Golubeva, M. Loghi, M. Poncino, and E. Macii, "Architectural leakage-aware management of partitioned scratchpad memories," in Proceedings of the Conference on Design, Automation and Test in Europe, Nice Acropolis, France, 2007, pp. 1665-1670.
  52. M. I. Aouad, R. Schott, and O. Zendra, "A tabu search heuristic for scratch-pad memory management," in Proceedings of the International Conference on Software Engineering and Technology (ICSET'10), Kandy, Sri Lanka, 2010, pp. 386-390).
  53. Maxim Integrated Product Inc., Ultra-high-speed flash controller user guide, http://www.maximintegrated.com/en/appnotes/index.mvp/id/4833.
  54. S. Udayakumaran, A. Dominguez, and R. Barua, "Dynamic allocation for scratchpad memory using compile time decisions," ACM Transactions on Embedded Computing Systems (TECS), vol. 5, no. 2, pp. 472-511, 2006. https://doi.org/10.1145/1151074.1151085
  55. Coin_OR, https://projects.coin-or.org/Cbc.
  56. lp_solve, http://lpsolve.sourceforge.net/5.5/.

Cited by

  1. NPAM: NVM-Aware Page Allocation for Multi-Core Embedded Systems vol.66, pp.10, 2017, https://doi.org/10.1109/TC.2017.2703824
  2. Thread-level priority assignment in global multiprocessor scheduling for DAG tasks vol.113, 2016, https://doi.org/10.1016/j.jss.2015.12.004
  3. GPU-SAM: Leveraging multi-GPU split-and-merge execution for system-wide real-time support vol.117, 2016, https://doi.org/10.1016/j.jss.2016.02.009