Abstract
In recent years, deploying and running data-intensive workflows in cloud platform has become more and more popular in many areas. Unlike computation-intensive applications, a data-intensive workflow typically requires to deal with bulk data transferring between different resource sites, which means some traditional energy-efficiency optimization technologies are difficult to be enforced when running data-intensive workflows. In this paper, we first formulate the power model of a data-intensive workflow, which takes into account power consumption caused by data transferring. Based on this power model, we introduce a novel metric called Shortest Path in terms of Energy Consumption and design an energy-efficient heuristic scheduling algorithm, which is aiming at reducing the extra energy consumption caused by delays of bulk data transferring. Extensive experiments and performance evaluations show that the proposed scheduling algorithm can significantly reduce the overall energy consumption of running data-intensive workflows comparing with several existing algorithms. In addition, the proposed algorithm also exhibits better adaptiveness and robustness when a cloud system is facing intensive and unpredicted workloads.
Similar content being viewed by others
References
Buyya R, Yeo CS, Venugopal S et al (2009) Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener. Comput. Syst. 25(6):599–616
Murphy MA, Goasguen S (2010) Virtual organization clusters: self-provisioned clouds on the grid. Future Gener Comput Syst 26(8):271–1281
Hosny AM, Shedeed HA, Hussein AS, Tolba MF (2014) Cloud-based parallel solution for estimating statistical significance of megabyte-scale DNA sequences. Concurr Comput Pract Exp 26(1):118–133
Kim C, Jeon C, Lee W, Yang S (2015) A parallel migration scheme for fast virtual machine relocation on a cloud cluster. J Supercomput 71(12):4623–4645
Szabo C, Sheng QZ, Kroeger T et al (2014) Science in the cloud: allocation and execution of data-Intensive scientific workflows. J Grid Comput 12(2):245–264
Barham P, Dragovic B, Fraser K et al (2003) Xen and the art of virtualization. In: Proceedings of the ACM symposium on Operating systems principles (SOSP). ACM, New York, pp 164–177. https://doi.org/10.1145/1165389.945462
Bugnion E, Devine S, Rosenblum M et al (2012) Bringing virtualization to the x86 architecture with the original VMware Workstation. ACM Trans Computer Syst 30(4):1–51
Gomez-Folgar F, Garcia-Loureiro AJ, Pena TF et al (2015) Study of the KVM CPU performance of open-source cloud management platforms. In: Proceedings of IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Shenzhen, China, pp 1225–1228. https://doi.org/10.1109/CCGrid.2015.103
Govindan S, Choi J, Nath AR et al (2009) Xen and Co.: communication-aware CPU management in consolidated xen-based hosting platforms. IEEE Trans Comput 58(8):1111–1125
Sharifi M, Salimi H, Najafzadeh M (2012) Power-efficient distributed scheduling of virtual machines using workload-aware consolidation techniques. J Supercomput 61(1):6–66
Bianchini R (2012) Leveraging renewable energy in data centers: present and future. In: Proceedings of International Symposium on High Performance Distributed Computing (HPDC). ACM, Delft, pp 135-136. https://doi.org/10.1145/2287076.2287101
Wang J, Feng L (2011) A survey on energy-efficient data management. ACM SIGMOD Rec 40(2):17–23
Van Heddeghem W, Vereecken W, Colle D et al (2012) Distributed computing for carbon footprint reduction by exploiting low-footprint energy availability. Future Gener Comput Syst 28(2):405–414
Arabnia HR, Oliver MA (1986) Fast operations on raster images with SIMD machine architectures. Int J Eurograph Assoc Comput Graph Forum 5(3):179–188
Arabnia HR, Taha TR (1986) A parallel numerical algorithm on a reconfigurable multi-ring network. J Telecommun Syst 10(1–2):185–203 1998
Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63
Valafar H, Arabnia HR, Williams G (2004) Distributed global optimization and its development on the multiring network. Int J Neural Parallel Sci Comput 12(4):465–490
Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th Annual International High Performance Computing Conference, pp 349–357
Wani MA, Arabnia HR (2006) Parallel polygon approximation targeted at reconfigurable multi-ring hardware. In: Proceedings of the 2006 International Conference on Computer Graphics and Virtual Reality, pp 86–94
Gao PX, Curtis AR, Wang B et al (2012) It’s not easy being green. In: Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). ACM, Helsinki, pp 211–222. https://doi.org/10.1145/2342356.2342398
Kaur PD, Chana I (2014) A resource elasticity framework for QoS-aware execution of cloud applications. Future Gener Comput Syst 37:14–25
Shibata T, Choi SJ, Taura K (2010) File-access characteristics of data-intensive workflow applications. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 522–525
He L, Zou D, Zhang Z et al (2014) Developing resource consolidation frameworks for moldable virtual machines in clouds. Future Gener Comput Syst 32(1):69–81
Brandic I, Benkner S, Engelbrecht G, Schmidt R (2005) QoS support for time-critical grid workflow applications. In: Proceedings of International Conference on e-Science and Grid Computing (e-Science). IEEE, Melbourne, pp 108–115. https://doi.org/10.1109/E-SCIENCE.2005.69
Deelman E, Singh G, Su MH et al (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Program J 13:219–237
Frey J, Tannenbaum T, Foster I et al (2002) Condor-G: a computation management agent for multi-institutional grids. Clust Comput 5(3):237–246
Wang DL, Zender CS, Jenks SF (2008) Clustered workflow execution of retargeted data analysis scripts. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 449–458. https://doi.org/10.1109/CCGRID.2008.69
Nadeem F, Fahringer T (2009) Using templates to predict execution time of scientific workflow applications in the grid. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Shanghai, pp 316–323. https://doi.org/10.1109/CCGRID.2009.77
Dun N, Taura K, Yonezawa A (2010) Fine-grained profiling for data-Intensive workflows. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 571–572. https://doi.org/10.1109/CCGRID.2010.29
Tolosana-Calasanza R, Banares JA, Congduc P, Rana OF (2012) Enforcing QoS in scientific workflow systems enacted over Cloud infrastructures. J Comput Syst Sci 78(5):1300–1315
Emeakaroha VC, Maurer M, Stern P et al (2013) Managing and optimizing bioinformatics workflows for data analysis in clouds. J Grid Comput 11(3):407–428
Javadi B, Tomko M, Sinnott RO (2013) Decentralized orchestration of data-centric workflows in cloud environments. Future Gener Comput Syst 29(7):1826–1837
Jung IY, Han BJ, Jeong CS, Rho S (2014) Cloud-based mapreduce workflow execution platform. J Internet Technol 15(6):1059–1067
Topcuoglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Decker J, Schneider J (2007) Heuristic scheduling of grid workflows supporting co-allocation and advance reservation. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Rio de Janeiro, pp 335–342. https://doi.org/10.1109/CCGRID.2007.56
Glatard T, Montagnat J, Pennec X (2008) A probabilistic model to analyse workflow performance on production grids. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 510–517. https://doi.org/10.1109/CCGRID.2008.123
Wieczorek M, Podlipnig S, Prodan R, Fahringer T (2008) Bi-criteria scheduling of scientific workflows for the grid. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 9–16. https://doi.org/10.1109/CCGRID.2008.21
Yu J, Buyya R, Tham CK (2005) Cost-based scheduling of scientific workflow applications on utility grids. In: Proceedings of International Conference on e-Science and Grid Computing (e-Science). IEEE, Melbourne, pp 140–147. https://doi.org/10.1109/E-SCIENCE.2005.26
Yu J, Buyya R (2006) A budget constrained scheduling of workflow applications on utility grids using genetic algorithms. In: Proceedings of Workshop on Workflows in Support of Large-Scale Science (WORKS). IEEE, Paris, pp 1–10. https://doi.org/10.1109/WORKS.2006.5282330
Hunold S, Rauber T, Suter F (2008) Scheduling dynamic workflows onto clusters of clusters using postponing. In: Proceedings of International Symposium on Cluster Computing and the Grid (CCGRID). IEEE, Lyon, pp 669–674. https://doi.org/10.1109/CCGRID.2008.44
Lee YC, Subrata R, Zomaya AY (2009) On the performance of a dual-objective optimization model for workflow applications on grid platforms. IEEE Trans Parallel Distrib Syst 20(9):1273–1284
Liu X, Chen J, Wu Z et al (2010) Handling recoverable temporal violations in scientific workflow systems: a workflow rescheduling based strategy. In: Proceedings of International Conference on Cluster, Cloud and Grid Computing (CCGRID). IEEE, Melbourne, pp 534–537. https://doi.org/10.1109/CCGRID.2010.15
Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows. Future Gener Comput Syst 26(8):1200–1214
Deng K, Ren K, Song J et al (2013) A clustering based coscheduling strategy for efficient scientific workflow execution in cloud computing. Concurr Comput Pract Exp 25(18):2523–2539
Calheiros RN, Buyya R (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans Parallel Distrib Syst 25(7):1787–1796
Verma A, Kaushal S (2015) Cost-time efficient scheduling plan for executing workflows in the cloud. J Grid Comput 13(4):495–506
Zeng LB, Veeravalli B, Li X (2015) SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J Parallel Distrib Comput 75:141–151
Bryk P, Malawski M, Juve G, Deelman E (2016) Storage-aware algorithms for scheduling of workflow ensembles in clouds. J Grid Comput 14(2):359–378
Calheiros RN, Ranjan R, Beloglazov A et al (2011) CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50
Theiner D, Wieczorek M (2006) Reduction of calibration time of distributed hydrological models by use of grid computing and nonlinear optimisation algorithms. In: Proceedings of International Conference on Hydroinformatics, pp 1–8
Acknowledgements
This work was supported by the Research project of Education Department of Hunan Province (No. 17K015).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qu, X., Xiao, P. & Huang, L. Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds. J Supercomput 74, 2935–2955 (2018). https://doi.org/10.1007/s11227-018-2344-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2344-3