Optimizing resource speed for two-stage real-time tasks | Real-Time Systems Skip to main content
Log in

Optimizing resource speed for two-stage real-time tasks

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

Multiple resource co-scheduling algorithms and pipelined execution models are becoming increasingly popular, as they better capture the heterogeneous nature of modern architectures. The problem of scheduling tasks composed of multiple stages tied to different resources goes under the name of “flow-shop scheduling”. This problem, studied since the ‘50s to optimize production plants, is known to be NP-hard in the general case. In this paper, we consider a specific instance of the flow-shop task model that captures the behavior of a two-resource (DMA-CPU) system. In this setting, we study the problem of selecting the optimal operating speed of the two resources with the goal of minimizing power usage while meeting real-time schedulability constraints. In particular, we derive an algorithm that finds the optimal speed of one resource while the speed of the other resource is kept constant. Then, we discuss how to extend the proposed approach to jointly optimize the speed of the two resources. In addition, applications to multiprocessor systems and energy minimization are considered. All the proposed algorithms run in polynomial time, hence they are suitable for online operation even in the presence of variable real-time workload.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Reasoning in terms of clock period \(T_{ck}\) describes well the performance of CPUs; however, it is more appropriate to reason in terms of bandwidth when describing the performance of memory subsystems. Since a dualism exists between the two concepts, we will adopt the notation \(T_{ck}\) when referring to either resource type.

  2. Computation is performed over data that has been preloaded into local memory, while memory operations do not involve computation. Thus computation time scales linearly with clock speed as long as CPU speed and local memory are tied to the same clock. Similarly, performance of memory-only operations scales linearly with the configured transfer bandwidth.

  3. See http://www.nvidia.com/object/tegra-k1-processor.html.

  4. See http://ark.intel.com/products/75123/Intel-Core-i7-4770K.

  5. See http://cache.freescale.com/files/32bit/doc/fact_sheet/MPC5777MFS.

  6. See http://cache.freescale.com/files/32bit/doc/prod_brief/P4080PB.

  7. We recall that \(\hat{\sigma }^{t}_1, \ldots , \hat{\sigma }^{t}_n\) is the permutation of jobs corresponding to the minimum makespan when \(T_{ck} = t\).

  8. For the last element of \(\mathcal {P}_s\), we consider the interval \([\mathcal {P}_{s, i}, +\infty )\).

  9. Here, we refer to the complete list of changing points of \(F_C(t)\), i.e., including all changing points in the interval \(T_{ck} \in (0^+, +\infty )\). This means that, when running Algorithm 1, the function Reinit at line 22 should not be executed.

  10. In this section, we reason in terms of clock speed instead of clock period for ease of understanding, and assume \(s \in (0, 1]\).

  11. Note that the set \(\mathcal {P}^k_{s}\) constructed in this way is a superset of the actual schedule changing points of \(\mathcal {T}_k\), hence the slope of \(F^*_{C, k}(t)\) does not necessarily change for all points in \(\mathcal {P}^k_{s}\). As we will see next, this simplification makes our results directly applicable to the multiprocessor case with no modifications.

  12. This value can be found by applying Algorithm 3 with no modifications.

  13. ARM Cortex-A9 CPUs implement a clock cycle counter that is accessible on a dedicated per-core register (ARM Holdings 2016).

References

  • Alhammad A, Pellizzoni R (2014) Schedulability analysis of global memory-predictable scheduling. In: 14th international conference on embedded software (EMSOFT)

  • Alhammad A, Wasly S, Pellizzoni R (2015) Memory efficient global scheduling of real-time tasks. In: 21st IEEE real-time and embedded technology and applications symposium (RTAS)

  • ARM Holdings (2016) ARM Cortex-A9—technical reference manual. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/index.html

  • Barcelo N, Kling P, Nugent M, Pruhs K, Scquizzato M (2015) On the complexity of speed scaling. In: Mathematical foundations of computer science 2015. Springer, New York, pp 75–89

  • Baruah S (2014) Improved multiprocessor global schedulability analysis of sporadic dag task systems. In: 2014 26th Euromicro conference on real-time systems, pp 97–105

  • Benini L, De Micheli G (2012) Dynamic power management: design techniques and CAD tools. Springer, New York

    MATH  Google Scholar 

  • Bonifaci V, Marchetti-Spaccamela A, Stiller S, Wiese A (2013) Feasibility analysis in the sporadic dag task model. In: 2013 25th Euromicro conference on real-time systems, pp 225–233

  • Chen B (1995) Analysis of classes of heuristics for scheduling a two-stage flow shop with parallel machines at one stage. J Oper Res Soc 46(2):234–244

    Article  MATH  Google Scholar 

  • Cox MG (1971) An algorithm for approximating convex functions by means by first degree splines. Comput J 14(3):272–275

    Article  MATH  Google Scholar 

  • De Vogeleer K, Memmi G, Jouvelot P, Coelho F (2014) The energy/frequency convexity rule: modeling and experimental validation on mobile devices. In: Parallel processing and applied mathematics (Lecture notes in computer science), vol 8384. Springer, New York, pp 793–803

  • Devadas V, Aydin H (2012) On the interplay of voltage/frequency scaling and device power management for frame-based real-time embedded applications. IEEE Trans Comput 61(1):31–44

    Article  MathSciNet  Google Scholar 

  • Edelsbrunner H, Guibas LJ, Sharir M (1989) The upper envelope of piecewise linear functions: algorithms and applications. Discret Comput Geom 4(1):311–336

    Article  MathSciNet  MATH  Google Scholar 

  • Egger B, Lee J, Shin H (2008) Scratchpad memory management in a multitasking environment. In: 8th ACM international conference on embedded software (EMSOFT)

  • Garey M, Johnson D, Sethi R (1976) The complexity of flowshop and jobshop scheduling. Math Oper Res 1(2):117–129

    Article  MathSciNet  MATH  Google Scholar 

  • Hoogeveen J, Lenstra J, Veltman B (1996) Preemptive scheduling in a two-stage multiprocessor shop is NP-hard. Eur J Oper Res 89:172–175

    Article  MATH  Google Scholar 

  • Imamoto A, Tang B (2008) Optimal piecewise linear approximation of convex functions. In: World Congress on engineering and computer science (WCECS)

  • Ishihara T, Yasuura H (1998) Voltage scheduling problem for dynamically variable voltage processors. In: International symposium on power electronics and design (InLow)

  • Jejurikar R, Pereira C, Gupta R (2004) Leakage aware dynamic voltage scaling for real-time embedded systems. In: 41st design automation conference (DAC)

  • Jia Z, Li Y, Wang Y, Wang M, Shao Z (2015) Temperature-aware data allocation for embedded systems with cache and scratchpad memory. ACM Trans Embed Comput Syst 14(2):30

    Article  Google Scholar 

  • Johnson S (1954) Optimal two- and three-stage production schedules with setup times included. Naval Res Logist Q 1:61–68

    Article  MATH  Google Scholar 

  • Kaup F, Melnikowitsch S, Hausheer D (2014) Measuring and modeling the power consumption of openflow switches. In: 2014 10th international conference on network and service management (CNSM), pp 181–186

  • Melani A, Mancuso R, Cullina D, Caccamo M, Thiele L (2016) Speed optimization for tasks with two resources. In: 19th international conference on design, automation and test in Europe (DATE)

  • Melani A, Bertogna M, Bonifaci V, Marchetti-Spaccamela A, Buttazzo G (2015) Memory-processor co-scheduling in fixed priority systems. In: 23rd international conference on real-time networks and systems (RTNS)

  • Pellizzoni R, Betti E, Bak S, Yao G, Criswell J, Caccamo M, Kegley R (2011) A predictable execution model for COTS-based embedded systems. In: 17th IEEE real-time and embedded technology and applications symposium (RTAS)

  • Rabaey JM, Chandrakasan A, Nikolic B (2003) Digital integrated circuits, 2nd edn., Prentice Hall electronics and VLSI seriesPrentice Hall, Upper Saddle River

    Google Scholar 

  • Schranzhofer A, Chen JJ, Thiele L (2010) Timing analysis for TDMA arbitration in resource sharing systems. In: 16th IEEE real-time and embedded technology and applications symposium (RTAS)

  • Schuurman P, Woeginger GJ (2000) A polynomial time approximation scheme for the two-stage multiprocessor flow shop problem. Theor Comput Sci 237(1–2):105–122

    Article  MathSciNet  MATH  Google Scholar 

  • Serrano MA, Melani A, Bertogna M, Quinones E (2016) Response-time analysis of DAG tasks under fixed priority scheduling with limited preemptions. In: 2016 design, automation test in Europe conference exhibition (DATE), pp 1066–1071

  • Tabish R, Mancuso R, Wasly S, Alhammad A, Phatak SS, Pellizzoni R, Caccamo M (2016) A real-time scratchpad-centric OS for multi-core embedded systems. In: 2016 IEEE real-time and embedded technology and applications symposium (RTAS), pp 1–11

  • Vandewalle J (1975) On the calculation of the piecewise linear approximation to a discrete function. IEEE Trans Comput 8:843–846

    Article  MATH  Google Scholar 

  • Venkata S, Ahn I, Jeon D, Gupta A, Louie C, Garcia S, Belongie S, Taylor M (2009) SD-VBS: the San Diego vision benchmark suite. In: IEEE international symposium on workload characterization, 2009. IISWC 2009, pp 55–64

  • Vogelsang T (2010) Understanding the energy consumption of dynamic random access memories. In: 2010 43rd annual IEEE/ACM international symposium on microarchitecture (MICRO), pp 363–374

  • Wang Y, Shao Z, Chan HCB, Liu D, Guan Y (2014) Memory-aware task scheduling with communication overhead minimization for streaming applications on bus-based multiprocessor system-on-chips. IEEE Trans Parallel Distrib Syst 25(7):1797–1807

    Article  Google Scholar 

  • Wasly S, Pellizzoni R (2013) A dynamic scratchpad memory unit for predictable real-time embedded systems. In: 25th Euromicro conference on real-time systems (ECRTS)

  • Wasly S, Pellizzoni R (2014) Hiding memory latency using fixed priority scheduling. In: 20th IEEE real-time and embedded technology and applications symposium (RTAS)

  • Yao G, Pellizzoni R, Bak S, Betti E, Caccamo M (2011) Memory-centric scheduling for multicore hard real-time systems. In: Real-time systems. Springer, New York

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandra Melani.

Additional information

The material presented in this paper is based upon work supported by the National Science Foundation (NSF) under Grant Numbers CNS-1219064 and CNS-1302563. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the NSF.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Melani, A., Mancuso, R., Cullina, D. et al. Optimizing resource speed for two-stage real-time tasks. Real-Time Syst 53, 82–120 (2017). https://doi.org/10.1007/s11241-016-9259-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-016-9259-y

Keywords

Navigation