Abstract
Modern block-oriented distributed storage systems like Hadoop distributed file system have proliferated in this era of big data and cloud computing. These systems feature block-level replication in which their files are partitioned into equal-sized blocks and multiple copies for each block are then arbitrarily distributed across nodes for fault tolerance and data availability. However, many storage volumes are just wasted only for keeping block copies whose data may not be accessed frequently in the strategy. Therefore, distributed storage systems begin to adopt erasure codes. However, classical parity encoding scheme are hard to be directly applied to the distributed storage systems since block copies are arbitrarily placed across nodes in the systems. We present a novel technique, called DynaEC, to address the issues in modern block-oriented distributed storage systems. DynaEC provides a unique parity encoding algorithm that encodes data blocks arbitrarily distributed across machines to parities and then places the parities guaranteeing fault tolerance. Parity encoding in DynaEC is performed without any change of the original block placement policy in Hadoop distributed file system. This makes DynaEC work seamlessly with Hadoop distributed file system. Finally, during the encoding procedure each data node encodes each own data blocks, not requiring any information about other blocks located in other data nodes. As such, the encoding procedure in DynaEC is fully performed in parallel without any synchronization issue. With extensive experiments, we show that DynaEC saves storage volumes up to the theoretical limit while outperforming previous approaches by multiple orders of magnitude.
Similar content being viewed by others
References
Abad CL, Lu Y, Campbell RH (2011) Dare: adaptive data replication for efficient cluster scheduling. In: 2011 IEEE international conference on cluster computing (CLUSTER). IEEE, pp 159–168
André F, Kermarrec AM, Le Merrer E, Le Scouarnec N, Straub G, Van Kempen A (2014) Archiving cold data in warehouses with clustered network coding. In: Proceedings of the ninth European conference on computer systems. ACM, New York, p 21
Baker BS (1985) A new proof for the first-fit decreasing bin-packing algorithm. J Algorithms 6(1):49–70
Baker BS, Coffman EG Jr, Rivest RL (1980) Orthogonal packings in two dimensions. SIAM J Comput 9(4):846–855
Borthakur D (2007) The hadoop distributed file system: architecture and design. Hadoop Proj Website 11:21
Cheng Z, Luan Z, Meng Y, Xu Y, Qian D, Roy A, Zhang N, Guan G (2012) Erms: an elastic replication management system for hdfs. In: 2012 IEEE international conference on cluster computing workshops (CLUSTER WORKSHOPS). IEEE, pp 32–40
Dimakis AG, Godfrey PB, Wu Y, Wainwright MJ, Ramchandran K (2010) Network coding for distributed storage systems. IEEE Trans Inf Theory 56(9):4539–4551
Dimakis AG, Ramchandran K, Wu Y, Suh C (2011) A survey on network codes for distributed storage. Proc IEEE 99(3):476–489
Fan B, Tantisiriroj W, Xiao L, Gibson G (2009) Diskreduce: raid for data-intensive scalable computing. In: Proceedings of the 4th annual workshop on petascale data storage. ACM, New York, pp 6–10
Fan B, Tantisiriroj W, Xiao L, Gibson G (2011) Diskreduce: replication as a prelude to erasure coding in data-intensive scalable computing. In: SC high performance computing networking, storage and analysis, pp 6–8
Friedman R, Kantor Y, Kantor A (2014) Replicated erasure codes for storage and repair-traffic efficiency. In: 14th IEEE international conference on peer-to-peer computing (P2P). IEEE, pp 1–10
Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: ACM SIGOPS operating systems review, vol 37. ACM, New York, pp 29–43
Gibson G et al (2010) Hdfs-raid wiki. http://wiki.apache.org/hadoop/HDFS-RAID. Accessed Feb 2016
Hafner JL (2005) Weaver codes: highly fault tolerant erasure codes for storage systems. FAST 5:211–224
Hazan E, Safra S, Schwartz O (2006) On the complexity of approximating \(k\)-set packing. Comput Complex 15(1):20–39
Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S et al (2012) Erasure coding in windows azure storage. In: USENIX ATC, vol 12
Hwang K, Jin H, Ho R (2000) Raid-x: a new distributed disk array for i/o-centric cluster computing. In: Proceedings of the ninth international symposium on high-performance distributed computing. IEEE, pp 279–286
Hwang K, Jin H, Ho RS (2002) Orthogonal striping and mirroring in distributed raid for i/o-centric cluster computing. IEEE Trans Parallel Distrib Syst 13(1):26–44
Jiang D, Ooi BC, Shi L, Wu S (2010) The performance of mapreduce: an in-depth study. Proc VLDB Endow 3(1–2):472–483
Kaushik RT, Abdelzaher T, Egashira R, Nahrstedt K (2011) Predictive data and energy management in greenhdfs. In: 2011 international on green computing conference and workshops (IGCC). IEEE, pp 1–9
Kaushik RT, Bhandarkar M (2010) Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the USENIX annual technical conference, p 109
Khan O, Burns R, Plank J, Pierce W, Huang C (2012) Rethinking erasure codes for cloud file systems: minimizing i/o for recovery and degraded reads. In: Proc. of USENIX FAST
Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. SIGMOD Rec 40(4):11–20
MacCormick J, Murphy N, Ramasubramanian V, Wieder U, Yang J, Zhou L (2009) Kinesis: a new approach to replica placement in distributed storage systems. ACM Trans Storage (TOS) 4(4):11
Muralidhar S, Lloyd W, Roy S, Hill C, Lin E, Liu W, Pan S, Shankar S, Sivakumar V, Tang L et al (2014) F4: Facebooks warm blob storage system. In: Proceedings of the 11th USENIX conference on operating systems design and implementation. USENIX Association, pp 383–398
Pamies-Juarez L, Datta A, Oggier F (2013) Rapidraid: pipelined erasure codes for fast data archival in distributed storage systems. In: INFOCOM, 2013 Proceedings IEEE. IEEE, pp 1294–1302
Pamies-Juarez L, Oggier F, Datta A (2013) Decentralized erasure coding for efficient data archival in distributed storage systems. In: Distributed computing and networking. Springer, New York, pp 42–56
Patterson DA, Gibson G, Katz RH (1988) A case for redundant arrays of inexpensive disks (RAID), vol 17. ACM, New York
Plank JS (2009) The raid-6 liber8tion code. Int J High Perform Comput Appl 23(3):242–251
Plank JS, Luo J, Schuman CD, Xu L, Wilcox-O’Hearn Z et al (2009) A performance evaluation and examination of open-source erasure coding libraries for storage. FAST 9:253–265
Plank JS, Simmerman S, Schuman CD (2008) Jerasure: a library in C/C++ facilitating erasure coding for storage applications-version 1.2. University of Tennessee, Tech. Rep. CS-08-627 23
Rashmi K, Shah NB, Gu D, Kuang H, Borthakur D, Ramchandran K (2014) A hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers. In: Proceedings of the 2014 ACM conference on SIGCOMM. ACM, New York, pp 331–342
Sathiamoorthy M, Asteris M, Papailiopoulos D, Dimakis AG, Vadali R, Chen S, Borthakur D (2013) Xoring elephants: novel erasure codes for big data. In: Proceedings of the 39th international conference on very large data bases. VLDB Endowment, pp 325–336
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, pp 1–10
Stonebraker M, Schloss GA (1990) Distributed raid-a new multiple copy algorithm. In: Proceedings of sixth international conference on data engineering. IEEE, pp 430–437
Wang J, Gong W, Varman P, Xie C (2012) Reducing storage overhead with small write bottleneck avoiding in cloud raid system. In: Proceedings of the 2012 ACM/IEEE 13th international conference on grid computing. IEEE Computer Society, pp 174–183
Weatherspoon H, Kubiatowicz JD (2002) Erasure coding vs. replication: a quantitative comparison. In: Peer-to-peer systems. Springer, New York, pp 328–337
Weil SA, Brandt SA, Miller EL, Long DD, Maltzahn C (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th symposium on operating systems design and implementation. USENIX Association, pp 307–320
Welch B, Unangst M, Abbasi Z, Gibson GA, Mueller B, Small J, Zelenka J, Zhou B (2008) Scalable performance of the panasas parallel file system. FAST 8:1–17
Xia M, Saxena M, Blaum M, Pease DA (2015) A tale of two erasure codes in hdfs. In: 13th USENIX conference on file and storage technologies (FAST’15). USENIX Association, pp 213–226
Zaman S, Grosu D (2011) A distributed algorithm for the replica placement problem. IEEE Trans Parallel Distrib Syst 22(9):1455–1468
Zhang Z, Deshpande A, Ma X, Thereska E, Narayanan D (2010) Does erasure coding have a role to play in my data center. Microsoft research MSR-TR-2010 52
Acknowledgments
This work was partly supported by two grants (K-16-L03-C01, R0126-15-1082) funded by the ministry of science, ICT, and future planning, Korea.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahn, HY., Lee, KH. & Lee, YJ. Dynamic erasure coding decision for modern block-oriented distributed storage systems. J Supercomput 72, 1312–1341 (2016). https://doi.org/10.1007/s11227-016-1661-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1661-7