Abstract
Multiple resource co-scheduling algorithms and pipelined execution models are becoming increasingly popular, as they better capture the heterogeneous nature of modern architectures. The problem of scheduling tasks composed of multiple stages tied to different resources goes under the name of “flow-shop scheduling”. This problem, studied since the ‘50s to optimize production plants, is known to be NP-hard in the general case. In this paper, we consider a specific instance of the flow-shop task model that captures the behavior of a two-resource (DMA-CPU) system. In this setting, we study the problem of selecting the optimal operating speed of the two resources with the goal of minimizing power usage while meeting real-time schedulability constraints. In particular, we derive an algorithm that finds the optimal speed of one resource while the speed of the other resource is kept constant. Then, we discuss how to extend the proposed approach to jointly optimize the speed of the two resources. In addition, applications to multiprocessor systems and energy minimization are considered. All the proposed algorithms run in polynomial time, hence they are suitable for online operation even in the presence of variable real-time workload.
Similar content being viewed by others
Notes
Reasoning in terms of clock period \(T_{ck}\) describes well the performance of CPUs; however, it is more appropriate to reason in terms of bandwidth when describing the performance of memory subsystems. Since a dualism exists between the two concepts, we will adopt the notation \(T_{ck}\) when referring to either resource type.
Computation is performed over data that has been preloaded into local memory, while memory operations do not involve computation. Thus computation time scales linearly with clock speed as long as CPU speed and local memory are tied to the same clock. Similarly, performance of memory-only operations scales linearly with the configured transfer bandwidth.
We recall that \(\hat{\sigma }^{t}_1, \ldots , \hat{\sigma }^{t}_n\) is the permutation of jobs corresponding to the minimum makespan when \(T_{ck} = t\).
For the last element of \(\mathcal {P}_s\), we consider the interval \([\mathcal {P}_{s, i}, +\infty )\).
Here, we refer to the complete list of changing points of \(F_C(t)\), i.e., including all changing points in the interval \(T_{ck} \in (0^+, +\infty )\). This means that, when running Algorithm 1, the function Reinit at line 22 should not be executed.
In this section, we reason in terms of clock speed instead of clock period for ease of understanding, and assume \(s \in (0, 1]\).
Note that the set \(\mathcal {P}^k_{s}\) constructed in this way is a superset of the actual schedule changing points of \(\mathcal {T}_k\), hence the slope of \(F^*_{C, k}(t)\) does not necessarily change for all points in \(\mathcal {P}^k_{s}\). As we will see next, this simplification makes our results directly applicable to the multiprocessor case with no modifications.
This value can be found by applying Algorithm 3 with no modifications.
ARM Cortex-A9 CPUs implement a clock cycle counter that is accessible on a dedicated per-core register (ARM Holdings 2016).
References
Alhammad A, Pellizzoni R (2014) Schedulability analysis of global memory-predictable scheduling. In: 14th international conference on embedded software (EMSOFT)
Alhammad A, Wasly S, Pellizzoni R (2015) Memory efficient global scheduling of real-time tasks. In: 21st IEEE real-time and embedded technology and applications symposium (RTAS)
ARM Holdings (2016) ARM Cortex-A9—technical reference manual. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/index.html
Barcelo N, Kling P, Nugent M, Pruhs K, Scquizzato M (2015) On the complexity of speed scaling. In: Mathematical foundations of computer science 2015. Springer, New York, pp 75–89
Baruah S (2014) Improved multiprocessor global schedulability analysis of sporadic dag task systems. In: 2014 26th Euromicro conference on real-time systems, pp 97–105
Benini L, De Micheli G (2012) Dynamic power management: design techniques and CAD tools. Springer, New York
Bonifaci V, Marchetti-Spaccamela A, Stiller S, Wiese A (2013) Feasibility analysis in the sporadic dag task model. In: 2013 25th Euromicro conference on real-time systems, pp 225–233
Chen B (1995) Analysis of classes of heuristics for scheduling a two-stage flow shop with parallel machines at one stage. J Oper Res Soc 46(2):234–244
Cox MG (1971) An algorithm for approximating convex functions by means by first degree splines. Comput J 14(3):272–275
De Vogeleer K, Memmi G, Jouvelot P, Coelho F (2014) The energy/frequency convexity rule: modeling and experimental validation on mobile devices. In: Parallel processing and applied mathematics (Lecture notes in computer science), vol 8384. Springer, New York, pp 793–803
Devadas V, Aydin H (2012) On the interplay of voltage/frequency scaling and device power management for frame-based real-time embedded applications. IEEE Trans Comput 61(1):31–44
Edelsbrunner H, Guibas LJ, Sharir M (1989) The upper envelope of piecewise linear functions: algorithms and applications. Discret Comput Geom 4(1):311–336
Egger B, Lee J, Shin H (2008) Scratchpad memory management in a multitasking environment. In: 8th ACM international conference on embedded software (EMSOFT)
Garey M, Johnson D, Sethi R (1976) The complexity of flowshop and jobshop scheduling. Math Oper Res 1(2):117–129
Hoogeveen J, Lenstra J, Veltman B (1996) Preemptive scheduling in a two-stage multiprocessor shop is NP-hard. Eur J Oper Res 89:172–175
Imamoto A, Tang B (2008) Optimal piecewise linear approximation of convex functions. In: World Congress on engineering and computer science (WCECS)
Ishihara T, Yasuura H (1998) Voltage scheduling problem for dynamically variable voltage processors. In: International symposium on power electronics and design (InLow)
Jejurikar R, Pereira C, Gupta R (2004) Leakage aware dynamic voltage scaling for real-time embedded systems. In: 41st design automation conference (DAC)
Jia Z, Li Y, Wang Y, Wang M, Shao Z (2015) Temperature-aware data allocation for embedded systems with cache and scratchpad memory. ACM Trans Embed Comput Syst 14(2):30
Johnson S (1954) Optimal two- and three-stage production schedules with setup times included. Naval Res Logist Q 1:61–68
Kaup F, Melnikowitsch S, Hausheer D (2014) Measuring and modeling the power consumption of openflow switches. In: 2014 10th international conference on network and service management (CNSM), pp 181–186
Melani A, Mancuso R, Cullina D, Caccamo M, Thiele L (2016) Speed optimization for tasks with two resources. In: 19th international conference on design, automation and test in Europe (DATE)
Melani A, Bertogna M, Bonifaci V, Marchetti-Spaccamela A, Buttazzo G (2015) Memory-processor co-scheduling in fixed priority systems. In: 23rd international conference on real-time networks and systems (RTNS)
Pellizzoni R, Betti E, Bak S, Yao G, Criswell J, Caccamo M, Kegley R (2011) A predictable execution model for COTS-based embedded systems. In: 17th IEEE real-time and embedded technology and applications symposium (RTAS)
Rabaey JM, Chandrakasan A, Nikolic B (2003) Digital integrated circuits, 2nd edn., Prentice Hall electronics and VLSI seriesPrentice Hall, Upper Saddle River
Schranzhofer A, Chen JJ, Thiele L (2010) Timing analysis for TDMA arbitration in resource sharing systems. In: 16th IEEE real-time and embedded technology and applications symposium (RTAS)
Schuurman P, Woeginger GJ (2000) A polynomial time approximation scheme for the two-stage multiprocessor flow shop problem. Theor Comput Sci 237(1–2):105–122
Serrano MA, Melani A, Bertogna M, Quinones E (2016) Response-time analysis of DAG tasks under fixed priority scheduling with limited preemptions. In: 2016 design, automation test in Europe conference exhibition (DATE), pp 1066–1071
Tabish R, Mancuso R, Wasly S, Alhammad A, Phatak SS, Pellizzoni R, Caccamo M (2016) A real-time scratchpad-centric OS for multi-core embedded systems. In: 2016 IEEE real-time and embedded technology and applications symposium (RTAS), pp 1–11
Vandewalle J (1975) On the calculation of the piecewise linear approximation to a discrete function. IEEE Trans Comput 8:843–846
Venkata S, Ahn I, Jeon D, Gupta A, Louie C, Garcia S, Belongie S, Taylor M (2009) SD-VBS: the San Diego vision benchmark suite. In: IEEE international symposium on workload characterization, 2009. IISWC 2009, pp 55–64
Vogelsang T (2010) Understanding the energy consumption of dynamic random access memories. In: 2010 43rd annual IEEE/ACM international symposium on microarchitecture (MICRO), pp 363–374
Wang Y, Shao Z, Chan HCB, Liu D, Guan Y (2014) Memory-aware task scheduling with communication overhead minimization for streaming applications on bus-based multiprocessor system-on-chips. IEEE Trans Parallel Distrib Syst 25(7):1797–1807
Wasly S, Pellizzoni R (2013) A dynamic scratchpad memory unit for predictable real-time embedded systems. In: 25th Euromicro conference on real-time systems (ECRTS)
Wasly S, Pellizzoni R (2014) Hiding memory latency using fixed priority scheduling. In: 20th IEEE real-time and embedded technology and applications symposium (RTAS)
Yao G, Pellizzoni R, Bak S, Betti E, Caccamo M (2011) Memory-centric scheduling for multicore hard real-time systems. In: Real-time systems. Springer, New York
Author information
Authors and Affiliations
Corresponding author
Additional information
The material presented in this paper is based upon work supported by the National Science Foundation (NSF) under Grant Numbers CNS-1219064 and CNS-1302563. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the NSF.
Rights and permissions
About this article
Cite this article
Melani, A., Mancuso, R., Cullina, D. et al. Optimizing resource speed for two-stage real-time tasks. Real-Time Syst 53, 82–120 (2017). https://doi.org/10.1007/s11241-016-9259-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11241-016-9259-y