Abstract
MapReduce is a programming model widely used in big data processing. Reduce tasks scheduling in MapReduce is a key issue which affect the performance significantly. Unfortunately, because of the complication of reduce tasks scheduling, there are no acknowledged solution in this issue. Main ideas in optimizing reduce tasks scheduling emphasizes features of computation or data locality. Although few researches tried to explore solutions with theoretical modeling, their models are oversimplified. Aiming to optimizing reduce tasks scheduling, we propose a method of modeling node’s computation and communication capability uniformly based on analyzing the procedure of reduce phase theoretically. In the analysis, cost of reduce tasks in intermediate data fetching and processing are integrated. With the proposed model, the optimal load balance of reduce phase is concluded and proved. Evaluations under different environments show that load balance of reduce phase is improved significantly with the scheduling method instructed by the optimal principle.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. J. Commun. ACM. 51, 107–113 (2008)
Hadoop. http://hadoop.apache.org
Applications powered by Hadoop: https://wiki.apache.org/hadoop/PoweredBy
Yahoo! Launches World’s Largest Hadoop Production Application. https://developer.yahoo.com/blogs/hadoop/yahoo-launches-world-largest-hadoop-production-application-398.html
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a Mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)
Kalyanaraman, A., Cannon, W.R., Latt, B., Baxter, D.J.: MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification. Bioinformatics 27, 3072–3073 (2011)
Stuart, J.A., Owerns, J.D.: Multi-GPU MapReduce on GPU clusters. In: 2011 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 1068–1079. IEEE (2011)
Srirama, S.N., Jakovits, P., Vainikko, E.: Adapting scientific computing problems to clouds using MapReduce. Future Gener. Comput. Syst. 28(1), 184–192 (2012)
Nguyen, P., Simon, T., Halem, M., Chapman, D., Le, Q.: A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment. In: Proceedings of the 5th International Conference on Utility and Cloud Computing, Chicago, IL, USA, 5–8 November 2012
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, Paris, France, 13–16 April 2010
Zhang, X., Zhong, Z., Feng, S., Tu, B., Fan, J.: Improving data locality of Mapreduce by scheduling in homogeneous computing environments. In: Proceedings of the 9th International Symposium on Parallel and Distributed Processing with Applications, Busan, Korea, 26–28 May 2011
Tang, Z., Zhou, J., Li, K., et al.: A MapReduce task scheduling algorithm for deadline constraints. Cluster Comput. 16(4), 651–662 (2013)
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving Mapreduce performance through data placement in heterogeneous hadoop clusters. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, 19–23 April 2010
Abad, C.L., Lu, Y., Campbell, R.H.: DARE: adaptive data replication for efficient cluster scheduling. In: Proceedings of IEEE International Conference on Cluster Computing, Austin, TX, USA, 26–30 September 2011
Palanisamy, B., Singh, A., Liu, L., et al.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 58. ACM (2011)
Lin, H., Ma, X., Archuleta, J., Feng, W., Gardner, M., Zhang, Z.: Moon: Mapreduce on opportunistic environments. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, USA, 21–25 June 2010
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user Mapreduce clusters. Technical report, UCB/EECS-2009–55 (2009)
Hammoud, M, Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576. IEEE (2011)
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for Mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, Karlsruhe, Germany, 14–18 June 2011
Tan, J., Meng, S., Meng, X., Zhang, L.: Improving ReduceTask data locality for sequential MapReduce jobs. In: Proceedings of the IEEE INFOCOM, Turin, Italy, 14–19 April 2013
Yuan, Y, Wang, D, Liu, J.: Joint Scheduling of MapReduce jobs with servers: performance bounds and experiments
Berlińska, J., Drozdowski, M.: Scheduling divisible MapReduce computations. J. Parallel Distrib. Comput. 71, 450–459 (2011)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Cambridge (2012)
Moges, M., Yu, D., Robertazzi, T.G.: Grid scheduling divisible loads from two sources. Comput. Math. Appl. 58, 1081–1092 (2009)
Piriyakumar, A., Murthy, C.S.R.: Distributed computation for a hypercube network of sensor-driven processors with communication delays including setup time. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 28, 245–251 (1998)
Hung, J., Robertazzi, T.: Scalable scheduling for clusters and grids using cut through switching. Int. J. Comput. Appl. 26, 147–156 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zuo, C., Liao, Q., Gu, T., Li, T., Yang, Y. (2015). Node Capability Modeling for Reduce Phase’s Scheduling in MapReduce Environment. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-28430-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28429-3
Online ISBN: 978-3-319-28430-9
eBook Packages: Computer ScienceComputer Science (R0)