Abstract
We consider the energy saving problem for caches on a multi-core processor. In the previous research on low power processors, there are various methods to reduce power dissipation. Tag reduction is one of them. This paper extends the tag reduction technique on a single-core processor to a multi-core processor and investigates the potential of energy saving for multi-core processors. We formulate our approach as an equivalent problem which is to find an assignment of the whole instruction pages in the physical memory to a set of cores such that the tag-reduction conflicts for each core can be mostly avoided or reduced. We then propose three algorithms using different heuristics for this assignment problem. We provide convincing experimental results by collecting experimental data from a real operating system instead of the traditional way using a processor simulator that cannot simulate operating system functions and the full memory hierarchy. Experimental results show that our proposed algorithms can save total energy up to 83.93% on an 8-core processor and 76.16% on a 4-core processor in average compared to the one that the tag-reduction is not used for. They also significantly outperform the tag reduction based algorithm on a single-core processor.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Spracklen L, Abraham S G. Chip multithreading: Opportunities and challenges. In Proc. the 11th International Symposium on High-Performance Computer Architecture (HPCA), San Francisco, USA, Feb. 12–16, 2005, pp.248–252.
Held J, Bautista J, Koehl S. From a few cores to many: A terascale computing research overview. Research at Intel White Paper, 2006.
Edmondson J H, Rubinfeld P I, Bannon P J, Benschneider B J, Bernstein D, Castelino R W, Cooper E M, Dever D E, Donchin D R, Fischer T C et al. Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, 1995, 7(1): 119–135.
Montanaro J, Witek R T, Anne K, Black A J, Cooper E M, Dobberpuhl D W, Donahue P M, Eno J, Hoeppner W, Kruckemyer D et al. A 160-mhz, 32-b, 0.5-w CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 1996, 31(11): 1703–1714.
Petrov P, Orailoglu A. Dynamic tag reduction for low-power caches in embedded systems with virtual memory. International Journal of Parallel Programming, 2007, 35(2): 157–177.
Burger D, Austin T M. The SimpleScalar tool set, version 2.0. ACM SIGARCH Computer Architecture News, 1997, 25(3): 13–25.
Henning J L. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 2006, 34(4): 17.
Du Z H, Lim C C, Li X F, Yang C, Zhao Q, Ngai T F. A costdriven compilation framework for speculative parallelization of sequential programs. ACM SIGPLAN Notices, 2004, 39(6): 71–81.
Chen M K, Olukotun K. The JRPM system for dynamically parallelizing Java programs. In Proc. the 30th Annual International Symposium on Computer Architecture, San Diego, USA, Jun. 9–11, 2003, pp.434–446.
Congy J, Hany G, Jagannathan A, Reinmany G, Rutkowski K. Accelerating sequential applications on CMPs using core spilling. IEEE Transactions on Parallel and Distributed Systems, 2007, 18(8): 1094–1107.
Huh J, Kim C, Shafi H, Zhang L, Burger D, Keckler S W. A NUCA substrate for flexible CMP cache sharing. IEEE Transactions on Parallel and Distributed Systems, 2007, 18(8): 1028–1040.
Monchiero M, Canal R, Gonzalez A. Power/performance/thermal design-space exploration for multicore architectures. IEEE Transactions on Parallel and Distributed Systems, 2008, 19(5): 666–681.
Huang W, Stant M R, Sankaranarayanan K, Ribando R J, Skadron K. Many-core design from a thermal perspective. In Proc. the 45th Annual Design Automation Conference (DAC 2008), Anaheim, USA, Jun. 8–13, 2008, pp.746–749.
Herbert S, Marculescu D. Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In Proc. the 2007 International Symposium on Low Power Electronics and Design (ISLPED 2007), Portland, USA, Aug. 27–29, 2007, pp.38–43.
Chen Y, Shao Z, Zhuge Q, Xue C, Xiao B, Edwin H M S. Minimizing energy via loop scheduling and DVS for multicore embedded systems. In Proc. the 11th International Conference on Parallel and Distributed Systems-Workshops (ICPADS 2005), Fuduoka, Japan, Jul. 20–22, 2005, pp.2–6.
Shirako J, Oshiyama N,Wada Y, Shikano H. Compiler control power saving scheme for multi core processors. In Proc. the 18th International Workshop Languages and Compilers for Parallel Computing (LCPC 2005), Hawthorne, USA, Oct. 20-22, 2005: Revised Selected Papers, p.362.
Hsu C H, Kremer U. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proc. the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, San Diego, USA, Jun. 9–11, 2003, pp.38–48.
Inc R. Rambus 128/144-Mbit Direct RDRAM Data Sheet, 2000.
Delaluz V, Sivasubramaniam A, Kandemir M, Vijaykrishnan N, Irwin M J. Scheduler-based DRAM energy management. In Proc. the 39th Conference on Design Automation, New Orleans, USA, Jun. 10–14, 2002, pp.697–702.
Delaluz V, Kandemir M, Vijaykrishnan N, Sivasubramaniam A, Irwin M J. DRAM energy management using software and hardware directed powermode control. In Proc. the 7th International Symposium on High-Performance Computer Architecture, Nuevo Leone, Mexico, Jan. 20–24, 2001, pp.159–169.
Delaluz V, Kandemir M, Vijaykrishnan N, Sivasubramaniam A, Irwin M J. Hardware and software techniques for controlling DRAM power modes. IEEE Transactions on Computers, 2001, 50(11): 1154–1173.
Powell M, Yang S H, Falsafi B et al. Gated-V DD: A circuit technique to reduce leakage in deep-submicron cache memories. In Proc. the 2000 International Symposium on Low Power Electronics and Design, 2000, pp.90–95.
Flautner K, Kim N S, Martin S et al. caches: Simple techniques for reducing leakage power. In Proc. the 29th Annual International Symposium on Computer Architecture, Saint Malo, France, Jun. 19–23, 2002, pp.148–157.
Nicolaescu A V D. Low energy, highly-associative cache design for embedded processors. In Proc. IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD2004), San Jose, USA, Oct. 11–13, 2004, p.332.
Petrov P, Orailoglu A. Virtual page tag reduction for lowpower TLBs. In Proc. the 21st Int. Conf. Computer Design, San Jose, USA, Oct. 13–15, 2003, pp.371–374.
Zhou X, Petrov P. Heterogeneously tagged caches for lowpower embedded systems with virtual memory support. ACM Transactions on Design Automation of Electronic Systems, 2008, 13(2): 32.
Petrov P, Orailoglu A. Tag compression for low power in dynamically customizable embedded processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2004, 23(7): 1031–1047.
Petrov P, Orailoglu A. Low-power data memory communication for application-specific embedded processors. In Proc. the 15th International Symposium on System Synthesis (ISSS 2002), Tokyo, Japan, Nov. 8–10, 2002, pp.219–224.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Basic Research 973 Program of China under Grant No. 2007CB310900, the National Natural Science Foundation of China under Grant No. 60725208, and Fellowships of the Japan Society for the Promotion of Science for Young Scientists Program.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zheng, L., Dong, MX., Ota, K. et al. Energy Efficiency of a Multi-Core Processor by Tag Reduction. J. Comput. Sci. Technol. 26, 491–503 (2011). https://doi.org/10.1007/s11390-011-1149-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-011-1149-0