Abstract
The big data applications are a resource and energy intensive applications. Cloud providers wish to better utilize the technologies of virtualization in order to solve the evolving needs of infrastructures, alongside the growing demand. The virtualization technology based on container is increasingly popular in the high performance domain, this work is the evaluation of this technology in the context of big data and cloud computing domains. It focuses on the software Hadoop, as a big data application, it evaluates the performance impact and energy consumption. The objective is to understand the tradeoff between performance and energy efficiency depending on the technology of virtualization. The outcomes of this paper are: Firstly, the evaluation of the technology of virtualization based on containers on the cloud using Hadoop as a big data application. Secondly, the comparison of the traditional virtualization with the merging container technology. We analyze the impact of the coexistence of virtual machines (or containers) on the CPU, memory, hard disk throughput and network bandwidth. Thirdly, the reduction of the big data application deployment cost using the cloud. Fourthly, the Hadoop community finds an in-depth study of the resource consumption depending on the deployment environment. Our evaluation shows that: (i) The container (Docker) technology is a performance enhancement and energy saving technology compared to the traditional technology of virtualization. (ii) Performance of Hadoop cluster based on containers is significantly better than the traditional virtualization technology. (iii) Data replication rate influences the completion date of job. (vi) Coexisting containers (or virtual machines) influence the energy consumption and the completion time of the applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fadika, Z., Govindaraju, M., Canon, R., Ramakrishnan, L.: Evaluating Hadoop for data-intensive scientific operations. In: 5th IEEE International Conference on Cloud Computing, pp. 67–74. IEEE Press, Honolulu (2012)
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? J. Commun. ACM. 53, 64–71 (2010)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Yunhong, G., Grossman, R.L.: Lessons learned from a year’s worth of benchmarks of large data clouds. In: 2nd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 3:1–3:6. ACM, New York (2009)
Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: Grid information services for distributed resource sharing. In: 12th International Conference on Grid Computing, pp. 90–97. IEEE Computer Society, Washington, D.C. (2011)
Shafer, J., Rixner, S., Cox, A.L.: The Hadoop distributed filesystem: balancing portability and performance. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 122–133. IEEE Press, White Plains (2010)
Kontagora, M., Gonzalez-Velez, H.: Benchmarking a MapReduce environment on a full virtualisation platform. In: 10th International Conference on Complex Intelligent and Software Intensive Systems, pp. 433–438. IEEE Computer Society, Washington, D.C. (2010)
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of MapReduce: an in-depth study. J. Proc. VLDB Endow. 3, 472–483 (2010)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the mapreduce-based data analysis. In: Agrawal, D., Candan, K.S., Li, W.-S. (eds.) New Frontiers in Information and Software as Services. LNBIP, vol. 74, pp. 209–228. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19294-4_9
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Understanding Full Virtualization, Paravirtualization and Hardware Assist. http://ww.vmware.com/files/pdf/VMware_paravirtualization.pdf
Intel Virtualization Technology (Intel VT). http://www.intel.com/content/www/us/en/virtualization/virtualization-technology/intel-virtualization-technology.html
Massie, M., Li, B., Nicholes, B., Vuksan, V., Alexander, R., Buchbinder, J., Costa, F., Dean, A., Josephsen, D., Phaal, P., Pocock, D.: Monitoring with Ganglia. O’Reilly Media Inc., Sebastopol (2012)
Baru, C., Bhandarkar, M., Nambiar, R., Poess, M., Rabl, T.: Setting the direction for Big Data benchmark standards. In: Nambiar, R., Poess, M. (eds.) TPCTC 2012. LNCS, vol. 7755, pp. 197–208. Springer, Heidelberg (2013)
Hadoop Wiki PowerBy. https://wiki.apache.org/hadoop/PoweredBy
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. J. Commun. ACM. 51, 107–113 (2008)
Joe, L., Steve, C., Bruce, H., Rebecca, D., Evan, H., Danielle, S., Danielle, S., Andrew, F.: Report to Congress on Server and Data Center Energy Efficiency. U.S. Environmental Protection Agency, New York (2007)
Pierre, D.: American Data Centers Are Wasting Huge Amounts of Energy. U.S. Environmental Protection Agency, New York (2014). www.nrdc.org/energy
Data Centres Energy Efficiency. http://iet.jrc.ec.europa.eu/energyefficiency/ict-codes-conduct/data-centres-energy-efficiency
Xu, G., Xu, F., Ma, H.: Deploying and researching Hadoop in virtual machines. In: IEEE International Conference on Automation and Logistics, pp. 395–399. IEEE Press, Zhengzhou (2012)
Peinl, R., Holzschuher, F.: The Docker ecosystem needs consolidation. In: 5th International Conference on Cloud Computing and Services Science, Lisbon, pp. 535–542 (2015)
Reshetova, E., Karhunen, J., Nyman, T., Asokan, N.: Security of OS-level virtualization technologies: Technical report. CoRR (2014)
Surviving the Zombie Apocalypse Containers, KVM, Xen, and Security. https://archive.fosdem.org/2015/schedule/event/zombieapocalypse/
Xavier, M.G., Neves, M.V., Rossi, F.D., Ferreto, T.C., Lange, T., De Rose, C.A.F.: Performance evaluation of container-based virtualization for high performance computing environments. In: 21st IEEE Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 233–240. IEEE Press, Belfast (2013)
Wen, Y., Zhao, J., Zhao, G., Chen, H., Wang, D.: A survey of virtualization technologies focusing on untrusted code execution. In: 6th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 378–383. IEEE Press, Palermo (2012)
Jlassi, A., Martineau, P., Tkindt, V.: Offline scheduling of map and reduce tasks on Hadoop systems. In: 5th International Conference on Cloud Computing and Services Science, Lisbon, pp. 178–185 (2015)
Getting Started with systemd. https://coreos.com/docs/launching-containers/launching/getting-started-with-systemd/
Hadoop Performance Tuning Guide - AMD. http://www.admin-magazine.com/HPC/Vendors/AMD/Whitepaper-Hadoop-Performance-Tuning-Guide
Gandomi, A., Haide, M.: Beyond the hype: Big Data concepts, methods, and analytics. J. Int. J. Inf. Manag. 35, 137–144 (2015)
Xavier, M.G., Neves, M.V., De Rose, C. A. F.: A Performance comparison of container-based virtualization systems for MapReduce clusters. In: 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 299–306. IEEE Press, Torino (2014)
Acknowledgements
This work was sponsored in part by the CYRES GROUP in France and French National Research Agency under the grant CIFRE n\(^\mathrm{o}\) 2012/1403.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Jlassi, A., Martineau, P. (2017). Experimental Study on Performance and Energy Consumption of Hadoop in Cloud Environments. In: Helfert, M., Ferguson, D., Méndez Muñoz, V., Cardoso, J. (eds) Cloud Computing and Services Science. CLOSER 2016. Communications in Computer and Information Science, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-62594-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-62594-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62593-5
Online ISBN: 978-3-319-62594-2
eBook Packages: Computer ScienceComputer Science (R0)