Abstract
Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturing process lead to a significant spread in the operating speeds of cores within homogeneous multi-core processors. Task scheduling approaches, which do not consider such heterogeneity caused by within-die variations, can lead to an overly pessimistic result in terms of performance. To realize an optimal performance according to the actual maximum clock frequencies at which cores can run, we present a heterogeneity-aware schedule refining (HASR) scheme by fully exploiting the heterogeneities of homogeneous multi-core processors in embedded domains. We analyze and show how the actual maximum frequencies of cores are used to guide the scheduling. In the scheme, representative chip operating points are selected and the corresponding optimal schedules are generated as candidate schedules. During the booting of each chip, according to the actual maximum clock frequencies of cores, one of the candidate schedules is bound to the chip to maximize the performance. A set of applications are designed to evaluate the proposed scheme. Experimental results show that the proposed scheme can improve the performance by an average value of 22.2%, compared with the baseline schedule based on the worst case timing analysis. Compared with the conventional task scheduling approach based on the actual maximum clock frequencies, the proposed scheme also improves the performance by up to 12%.
Similar content being viewed by others
References
Aguilera, P., Lee, J., Farmahini-Farahani, A., et al., 2014. Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking. Design, Automation and Test in Europe Conf. and Exhibition, p.176.1–176.6. [doi:10.7873/date.2014.189]
Bell, S., Edwards, B., Amann, J., et al., 2008. TILE64 processor: a 64-core SoC with mesh interconnect. IEEE Int. Solid-State Circuits Conf., p.588–598. [doi:10.1109/isscc.2008.4523070]
Bowman, K.A., Duvall, S.G., Meindl, J.D., 2002. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE J. Solid-State Circ., 37(2):183–190. [doi:10.1109/4.982424]
Bowman, K.A., Alameldeen, A.R., Srinivasan, S.T., et al., 2009. Impact of die-to-die and within-die parameter variations on the clock frequency and throughput of multi-core processors. IEEE Trans. VLSI Syst., 17(12):1679–1690. [doi:10.1109/TVLSI.2008.2006057]
Chon, H., Kim, T., 2009. Timing variation-aware task scheduling and binding for MPSoC. Proc. Asia and South Pacific Design Automation Conf., p.137-142. [doi:10.1109/aspdac.2009.4796470]
Dick, R.P., Rhodes, D.L., Wolf, W., 1998. TGFF: task graphs for free. Proc. 6th Int. Workshop on Hardware/Software Codesign, p.97–101. [doi:10.1109/hsc.1998.666245]
Dietrich, M., Haase, J., 2012. Process Variations and Probabilistic Integrated Circuit Design. Springer, New York, p.69–89. [doi:10.1007/978-1-4419-6621-6]
Ferrandi, F., Lanzi, P.L., Pilato, C., et al., 2010. Ant colony heuristic for mapping and scheduling tasks and communications on heterogeneous embedded systems. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst., 29(6):911–924. [doi:10.1109/tcad.2010.2048354]
Huang, L., Xu, Q., 2010. Performance yield-driven task allocation and scheduling for MPSoCs under process variation. Proc. 47th Design Automation Conf., p.326–331. [doi:10.1145/1837274.1837358]
Huang, W., Rajamani, K., Stan, M.R., et al., 2011. Scaling with design constraints: predicting the future of big chips. IEEE Micro, 31(4):16–29. [doi:10.1109/MM. 2011.42]
ITRS, 2013. International Technology Roadmap for Semiconductors. Available from http://www.itrs.net/reports. html [Accessed on Feb. 1, 2015]
Khailany, B., Dally, W.J., Kapasi, U.J., et al., 2001. Imagine: media processing with streams. IEEE Micro, 21(2):35–46. [doi:10.1109/40.918001]
Khodabandeloo, B., Khonsari, A., Gholamian, F., et al., 2014. Scenario-based quasi-static task mapping and scheduling for temperature-efficient MPSoC design under process variation. Microprocess. Microsyst., 38(5):399–414. [doi:10.1016/j.micpro.2014.05.006]
Lin, Y.C., Lu, F., Cheng, K.T., 2005. Pseudo-functional scan-based BIST for delay fault. Proc. 23rd IEEE VLSI Test Symp., p.229–234. [doi:10.1109/vts.2005.69]
Mirzoyan, D., Akesson, B., Goossens, K., 2012. Processvariation aware mapping of real-time streaming applications to MPSoCs for improved yield. Proc. 13th Int. Symp. on Quality Electronic Design, p.41–48. [doi:10.1109/isqed.2012.6187472]
Mirzoyan, D., Akesson, B., Goossens, K., 2014. Processvariation-aware mapping of best-effort and real-time streaming applications to MPSoCs. ACM Trans. Embed. Comput. Syst., 13(2s):61.1–61.24. [doi:10.1145/2490819]
Momtazpour, M., Goudarzi, M., Sanaei, E., 2010a. Variation-aware task and communication scheduling in MPSoCs for power-yield maximization. IEICE Trans. Fundament. Electron. Commun. Comput. Sci., 93(12):2542–2550. [doi:10.1587/transfun.e93.a.2542]
Momtazpour, M., Sanaei, E., Goudarzi, M., 2010b. Poweryield optimization in MPSoC task scheduling under process variation. Proc. 11th Int. Symp. on Quality Electronic Design, p.747–754. [doi:10.1109/isqed.2010. 5450497]
Momtazpour, M., Ghorbani, M., Goudarzi, M., et al., 2011. Simultaneous variation-aware architecture exploration and task scheduling for MPSoC energy minimization. Proc. 21st Symp. on GLSVLSI, p.271–276. [doi:10.1145/1973009.1973063]
Momtazpour, M., Goudarzi, M., Sanaei, E., 2013. Static statistical MPSoC power optimization by variation-aware task and communication scheduling. Microprocess. Microsyst., 37(8B):953–963. [doi:10.1016/j.micpro.2012. 02.008]
Omara, F.A., Arafa, M.M., 2010. Genetic algorithms for task scheduling problem. J. Parall. Distrib. Comput., 70(1):13–22. [doi:10.1016/j.jpdc.2009.09.009]
Ramamritham, K., 1995. Allocation and scheduling of precedence-related periodic tasks. IEEE Trans. Parall. Distrib. Syst., 6(4):412–420. [doi:10.1109/71.372795]
Raychowdhury, A., Ghosh, S., Roy, K., 2005. A novel on-chip delay measurement hardware for efficient speed-binning. Proc. 11th IEEE Int. On-Line Testing Symp., p.287–292. [doi:10.1109/iolts.2005.10]
Sarangi, S.R., Greskamp, B., Teodorescu, R., et al., 2008. VARIUS: a model of process variation and resulting timing errors for microarchitects. IEEE Trans. Semicond. Manufact., 21(1):3–13. [doi:10.1109/tsm.2007.913186]
Singhal, L., Bozorgzadeh, E., 2008. Process variation aware system-level task allocation using stochastic ordering of delay distributions. Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, p.570–574. [doi:10.1109/iccad.2008.4681633]
Stuijk, S., Geilen, M., Basten, T., 2006. SDF3: SDF for free. Proc. 6th Int. Conf. on Application of Concurrency to System Design, p.276–278. [doi:10.1109/acsd.2006.23]
Taylor, M.B., Kim, J., Miller, J., et al., 2002. The raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2):25–35. [doi:10.1109/mm.2002.997877]
Topcuoglu, H., Hariri, S., Wu, M.Y., 2002. Performanceeffective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parall. Distrib. Syst., 13(3):260–274. [doi:10.1109/71.993206]
Von Mises, R., 1964. Mathematical Theory of Probability and Statistics. Academic Press, New York, p.329–367. [doi:10.1016/b978-1-4832-3213-3.50010-5]
Wang, F., Chen, Y., Nicopoulos, C., et al., 2011. Variationaware task and communication mapping for MPSoC architecture. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst., 30(2):295–307. [doi:10.1109/tcad.2010. 2077830]
Yi, Y., Han, W., Zhao, X., et al., 2009. An ILP formulation for task mapping and scheduling on multi-core architectures. Design, Automation and Test in Europe Conf. and Exhibition, p.33–38. [doi:10.1109/date.2009.5090629]
Yu, Z., Baas, B.M., 2009. High performance, energy efficiency, and scalability with GALS chip multiprocessors. IEEE Trans. VLSI Syst., 17(1):66–79. [doi:10.1109/tvlsi.2008.2001947]
Zhao, W., Liu, F., Agarwal, K., et al., 2009. Rigorous extraction of process variations for 65-nm CMOS design. IEEE Trans. Semicond. Manufact., 22(1):196–203. [doi:10.1109/tsm.2008.2011182]
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (Nos. 61225008, 61373074, and 61373090), the National Basic Research Program (973) of China (No. 2014CB349304), the Specialized Research Fund for the Doctoral Program of Higher Education, the Ministry of Education of China (No. 20120002110033), and the Tsinghua University Initiative Scientific Research Program
ORCID: Zhi-xiang CHEN, http://orcid.org/0000-0001-7986-030X
Rights and permissions
About this article
Cite this article
Chen, Zx., Li, Zl., Cao, S. et al. Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity. Frontiers Inf Technol Electronic Eng 16, 1018–1033 (2015). https://doi.org/10.1631/FITEE.1500035
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1500035