Abstract
Online scheduling plays a key role for big data streaming applications in a big data stream computing environment, as the arrival rate of high-velocity continuous data stream might fluctuate over time. In this paper, an elastic online scheduling framework for big data streaming applications (E-Stream) is proposed, exhibiting the following features. (1) Profile mathematical relationships between system response time, multiple application fairness, and online features of high-velocity continuous stream. (2) Scale out or scale in a data stream graph by quantifying computation and communication cost, and the vertex semantics for arrival rate of data stream, and adjust the degree of parallelism of vertices in the graph. Subgraph is further constructed to minimize data dependencies among the subgraphs. (3) Elastically schedule a graph by a priority-based earliest finish time first online scheduling strategy, and schedule multiple graphs by a max–min fairness strategy. (4) Evaluate the low system response time and acceptable applications fairness objectives in a real-world big data stream computing environment. Experimental results conclusively demonstrate that the proposed E-Stream provides better system response time and applications fairness compared to the existing Storm framework.










Similar content being viewed by others
References
Eskandari L, Huang Z, Eyers D (2016) P-Scheduler: adaptive hierarchical scheduling in apache storm. In: Proceedings of the Australasian Computer Science Week Multiconference, ACSW 2016, No. 26. ACM Press, New York
Sun DW, Zhang GY, Wu CW, Li KQ, Zheng WM (2017) Building a fault tolerant framework with deadline guarantee in big data stream computing environments. J Comput Syst Sci 89:4–23
Dayarathna M, Toyotaro S (2013) Automatic optimization of stream programs via source program operator graph transformations. Distrib Parallel Databases 31(4):543–599
Alexandrov A, Salzmann A, Krastev G, Katsifodimos A, Markl V (2016) Emma in Action: declarative dataflows for scalable data analysis. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016. ACM Press, New York, pp 2073–2076
Convolbo MW, Chou J (2016) Cost-aware DAG scheduling algorithms for minimizing execution cost on cloud resources. J Supercomput 72(3):985–1012
Kanoun K, Tekin C, Atienza D, Shaar M (2016) Big-data streaming applications scheduling based on staged multi-armed bandits. IEEE Trans Comput 65(12):3591–3605
Fu TZJ, Ding J, Ma RTB, Winslett M, Yang Y, Yin Z, Zhang Z (2015) DRS: dynamic resource scheduling for real-time analytics over fast streams. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. IEEE Press, New York, pp 411–420
Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-Storm: resource-aware scheduling in Storm. In: Proceedings of the 16th Annual Middleware Conference, Middleware 2015. ACM Press, New York, pp 149–161
Choi Y, Chang S, Kim Y, Lee H, Son W, Jin S (2016) Detecting and monitoring game bots based on large-scale user-behavior log data analysis in multiplayer online games. J Supercomput 72(9):3572–3587
Lohrmann B, Janacik P, Kao O (2015) Elastic stream processing with latency guarantees. In: Proceedings of 2015 IEEE 35th International Conference on Distributed Computing Systems, ICDCS 2015. IEEE Press, New York, pp 399–410
Ahmad SG, Liew CS, Rafique MM, Munir EU, Khan SU (2014) Data-intensive workflow optimization based on application task graph partitioning in heterogeneous computing systems. In: Proceedings of 4th IEEE International Conference on Big Data and Cloud Computing, BDCloud 2014. IEEE Press, New York, pp 129–136
Ghafarian T, Javadi B (2015) Cloud-aware data intensive workflow scheduling on volunteer computing systems. Future Gener Comput Syst 51:87–97
Gu Y, Wu CQ (2016) Performance analysis and optimization of distributed workflows in heterogeneous network environments. IEEE Trans Comput 65(4):1266–1282
Chen TW, Lee YC, Fekete A, Zomay AY (2015) Adaptive multiple-workflow scheduling with task rearrangement. J Supercomput 71(4):1297–1317
Arabnejad H, Barbosa JG (2014) A budget constrained scheduling algorithm for workflow applications. J Grid Comput 12(4):665–679
Yun D, Wu CQ, Gu Y (2015) An integrated approach to workflow mapping and task scheduling for delay minimization in distributed environments. J Parallel Distrib Comput 84:51–64
Xu J, Chen Z, Tang J, Su S (2014) T-Storm: traffic-aware online scheduling in Storm. In: Proceedings of 2014 IEEE 34th Internatoin Conference on Distributed Computing Systems, ICDCS 2014. IEEE Press, New York, pp 535–544
Aniello L, Baldoni R, Querzoni L (2013) Adaptive online scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013. ACM Press, New York, pp 207–218
Katsipoulakis NR, Thoma C, Gratta EA, Labrinidis A, Lee AJ, Chrysanthis PK (2015) CE-Storm: confidential elastic processing of data streams. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015. ACM Press, New York, pp 859–864
Chen Z, Xu J, Tang J, Kwiat K, Kamhoua C (2015) G-Storm: GPU-enabled high-throughput online data processing in Storm. In: Proceedings of the 2015 IEEE International Conference on Big Data, Big Data 2015. IEEE Press, New York, pp 307–312
Basanta-Val P, Fernández-García N, Wellings AJ, Audsley NC (2015) Improving the predictability of distributed stream processors. Future Gener Comput Syst 52:22–36
Verma A, Kaushal S (2015) Cost-time efficient scheduling plan for execution workflows in the cloud. J Grid Comput 13(4):495–506
Gu L, Zeng D, Guo S, Xiang Y, Hu J (2016) A general communication cost optimization framework for big data stream processing in geo-distributed data centers. IEEE Trans Comput 65(1):19–29
Tang S, Lee BS, He B (2017) Fair resource allocation for data-intensive computing in the cloud. IEEE Trans Serv Comput. doi:10.1109/TSC.2016.2531698
Sun DW, Huang R (2016) A stable online scheduling strategy for real-time stream computing over fluctuating big data streams. IEEE Access 4:8593–8607
Hu M, Luo J, Wang Y, Lukasiewycz M, Zeng Z (2014) Holistic scheduling of real-time applications in time-triggered in-vehicle networks. IEEE Trans Ind Inf 10(3):1817–1828
Alkhanak EN, Lee SP, Rezaei R, Parizi RM (2016) Cost optimization approaches for scientific workflow scheduling in cloud and grid computing: a review, classifications, and open issues. J Syst Softw 113:1–26
Hu M, Luo J, Wang Y, Veeravalli B (2017) Adaptive scheduling of task graphs with dynamic resilience. IEEE Trans Comput 66(1):17–23
Matei Z, Dhruba B, Joydeep SS, Khaled E, Scott S, Ion S (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of 5th European Conference on Computer systems, EuroSys 2010. ACM Press, New York, pp 265–278
Bala A, Chana I (2015) Intelligent failure prediction models for scientific workflows. Expert Syst Appl 42(3):980–989
Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments. J Netw Comput Appl 50:39–48
Shi J, Luo J, Dong F, Zhang J, Zhang J (2016) Elastic resource provisioning for scientific workflow scheduling in cloud under budget and deadline constraints. Clust Comput 19(1):167–182
Zhu Z, Zhang G, Li M, Liu X (2016) Evolutionary multi-objective workflow scheduling in cloud. IEEE Trans Parallel Distrib Syst 27(5):1344–1357
Toshniwal A, Taneja S, Shukla A, Ramasamy K, Patel JM, Kulkarni S, Jackson J, Gade K, Fu M, Donham J, Bhagat N, Mittal S, Ryaboy D (2014) Storm@twitter. In: Proceedings of 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014. ACM Press, New York, pp 147–156
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant No. 61602428; the Fundamental Research Funds for the Central Universities under Grant No. 2652015338; and Melbourne-Chindia Cloud Computing (MC3) Research Network. We are grateful to Prof. Satish Srirama for his comments on improving the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, D., Yan, H., Gao, S. et al. Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams. J Supercomput 74, 615–636 (2018). https://doi.org/10.1007/s11227-017-2151-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2151-2