Abstract
Large-scale data analysis has increasingly come to rely on MapReduce and its open-source implementation Hadoop. Recently, Hadoop has not only been used for running single batch jobs but it has also been optimized to simultaneously support the execution of multiple jobs belonging to multiple concurrent users. Several schedulers (i.e., Fifo, Fair, and Capacity schedulers) have been proposed to optimize locality executions of tasks but do not consider failures, although, evidence in the literature shows that faults do occur and can probably result in performance problems.
In this paper, we have designed a set of experiments to evaluate the performance of Hadoop under failure when applying several schedulers (i.e., explore the conflict between job scheduling, exposing locality executions, and failures). Our results reveal several drawbacks of current Hadoop’s mechanism in prioritizing failed tasks. By trying to launch failed tasks as soon as possible regardless of locality, it significantly increases the execution time of jobs with failed tasks, due to two reasons: (1) available resources might not be freed up as quickly as expected and (2) failed tasks might be re-executed on machines with no data on it, introducing extra cost for data transferring through network, which is normally the most scarce resource in today’s data-centers. Our preliminary study with Hadoop not only helps us to understand the interplay between fault-tolerance and job scheduling, but also offers useful insights into optimizing the current schedulers to be more efficient in case of failures.
This work was done while Tran Anh Phuong was an intern at Inria Rennes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amazon Elastic MapReduce. http://aws.amazon.com/elasticmapreduce/. Accessed May 2015
Apache Hadoop Welcome page. http://hadoop.apache.org. Accessed May 2015
Grid’5000 Home page. https://www.grid5000.fr/. Accessed May 2015
Size matters: yahoo claims 2-petabyte database is world’s biggest, busiest. http://www.computerworld.com/s/article/9087918/. Accessed May 2015
Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: Puma: purdue mapreduce benchmarks suite. ECE Technical reports. Paper 437 (2012)
Bicer, T., Jiang, W., Agrawal, G.: Supporting fault tolerance in a data-intensive computing middleware. In: 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), pp. 1–12. IEEE (2010)
Borthakur, D.: Facebook has the world’s largest Hadoop cluster! http://hadoopblog.blogspot.fr/2010/05/facebook-has-worlds-largest-hadoop.html. Accessed May 2015
Dean, J.: Large-scale distributed systems at google: current systems and future directions. In: Keynote speech at the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS) (2009)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th USENIX Conference on Symposium on Opearting Systems Design & Implementation (OSDI 2004), San Francisco, CA, USA, pp. 137–150 (2004)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dinu, F., Eugene Ng, T.S.: Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012, pp. 187–198. ACM, New York (2012)
Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I.: Above the clouds: a berkeley view of cloud computing. Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Rep. UCB/EECS, 28:13 (2009)
Gottfrid, D.: Self-service, prorated supercomputing fun! http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/. Accessed May 2015
Huang, D., Shi, X., Ibrahim, S., Lu, L., Liu, H., Wu, S., Jin, H.: Mr-scope: a real-time tracing tool for mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, pp. 849–855 (2010)
Ibrahim, S., He, B., Jin, H.: Towards pay-as-you-consume cloud computing. In: Proceedings of the 2011 IEEE International Conference on Services Computing (SCC 2011), Washington, DC, USA, pp. 370–377 (2011)
Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., Song, W.: Maestro: replica-aware map scheduling for mapreduce. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), Ottawa, Canada, pp. 59–72 (2012)
Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S., Shi, X.: Evaluating mapreduce on virtual machines: the hadoop case. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 519–528. Springer, Heidelberg (2009)
Jin, H., Ibrahim, S., Qi, L., Cao, H., Wu, S., Shi, X.: The mapreduce programming model and implementations. In: Buyya, R., Broberg, J., Goscinski, A.M. (eds.) Cloud Computing: Principles and Paradigms, pp. 373–390. John Wiley & Sons, USA (2011)
Jindal, A., Quiané-Ruiz, J.-A., Dittrich, J.: Trojan data layouts: right shoes for a running elephant. In: The 2nd ACM Symposium on Cloud Computing, SOCC 2011, pp. 21:1–21:14. ACM, New York (2011)
Ko, S.Y., Hoque, I., Cho, B., Gupta, I.: Making cloud intermediate data fault-tolerant. In: The 1st ACM Symposium on Cloud computing (SOCC 2010), pp. 181–192. ACM (2010)
Lai, E.: Companies are spending a lot on Big Data. http://sites.tcs.com/big-data-study/spending-on-big-data/. Accessed May 2015
Logothetis, D., Olston, C., Reed, B., Webb, K.C., Yocum, K.: Stateful bulk processing for incremental analytics. In: The 1st ACM Symposium on Cloud Computing (SOCC 2010), pp. 51–62. ACM (2010)
Schad, J., Dittrich, J., Quiané-Ruiz, J.-A.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. PVLDB 3(1), 460–471 (2010)
Thirumala Rao, B., Sridevi, N.V., Krishna Reddy, V., Reddy, L.S.S.: Performance issues of heterogeneous hadoop clusters in cloud computing. ArXiv e-prints, July 2012
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 996–1005. IEEE (2010)
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems (EuroSys 2010), pp. 265–278. ACM (2010)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI 2008), San Diego, California, pp. 29–42 (2008)
Zhu H., Chen, H.: Adaptive failure detection via heartbeat under hadoop. In: 2011 IEEE Asia-Pacific Services Computing Conference (APSCC), pp. 231–238. IEEE (2011)
Acknowledgments
Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see http://www.grid5000.fr/).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ibrahim, S., Phuong, T.A., Antoniu, G. (2015). An Eye on the Elephant in the Wild: A Performance Evaluation of Hadoop’s Schedulers Under Failures. In: Pop, F., Potop-Butucaru, M. (eds) Adaptive Resource Management and Scheduling for Cloud Computing. ARMS-CC 2015. Lecture Notes in Computer Science(), vol 9438. Springer, Cham. https://doi.org/10.1007/978-3-319-28448-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-28448-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28447-7
Online ISBN: 978-3-319-28448-4
eBook Packages: Computer ScienceComputer Science (R0)