Abstract
The overload of data always threatens the reliability of storage system. The RAID-6 storage system provides higher reliability and flexible scalability. RAID-6 scaling can rapidly relieve the insufficient storage capacity in a short time. Therefore, this paper proposes Horizontal Data migration Scaling (HDS), an efficient RAID-6 scaling scheme, for HDP Code. First, it only migrates a small amount of data from the old disk to the new disk to regain I/O load balancing among all disks including old and new. Second, it optimizes the update order of anti-diagonal parity data to reduce the cost of parity data update. By numerical results and real experimental data analysis, this paper compares the performance of HDS to Round-Robin and Semi-RR. Compared with Round-Robin and Semi-RR, the final analysis results indicate: (1) HDS reduces the data migration by 59.9 \(\sim\) 83.3%; (2) HDS decreases the total cost of XOR operations by 36.84 \(\sim\) 71.43% and 66.04 \(\sim\) 76.92%; (3) HDS improves the total scaling time by 43.78 \(\sim\) 61.83% and 16.39 \(\sim\) 48.89% under offline.
Similar content being viewed by others
Data availibility
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Pinheiro, E., Weber, W.-D., Barroso, L.A.: Failure trends in a large disk drive population. In: USENIX Conference on File and Storage Technologies (FAST), San Jose, 13–16 Feb 2007, pp. 17–23. USENIX
Gibson, B.S.G.A.: Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In: USENIX Conference on File and Storage Technologies (FAST), San Jose, 13–16 Feb 2007, pp. 1–16. USENIX
Wu, S., Yi, Y., Xiao, J., Jin, H., Ye, M.: A large-scale study of I/O workload’s impact on disk failure. IEEE Access 6, 47385–47396 (2018)
Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: a quantitative comparison. In: International Workshop on Peer-to-Peer Systems, Cambridge, 7–8 Mar 2002, pp. 328–337. Springer
Mohan, L.J., Harold, R.L., Caneleo, P.I.S., Parampalli, U., Harwood, A.: Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud. In: 2015 international symposium on Network Coding (NetCod), Sydney, 22–24 June 2015, pp. 61–65. IEEE
Ahn, C., Pirahandeh, M., Kim, D.-H.: Dynamic allocation of replication and erasure codes for enhancing storage efficiency in OpenStack swift. In: 2020 International Conference on Electronics, Information, and Communication (ICEIC), Barcelona, 19–22 Jan 2020, pp. 1–2. IEEE
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, Bolton Landing, 19–22 Oct 2003, pp. 29–43
Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11(2007), 1–10 (2007)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)
Rodeh, O.: The Write-Anywhere-File-Layout (WAFL) (2014)
Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in windows azure storage. In: Presented as Part of the 2012 USENIX Annual Technical Conference (ATC), Boston, 13–15 June 2012, pp. 15–26. USENIX
Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W.: Oceanstore: an architecture for global-scale persistent storage. ACM SIGOPS Oper. Syst. Rev. 34(5), 190–201 (2000)
Rizzo, L.: Effective erasure codes for reliable computer communication protocols. ACM SIGCOMM Comput. Commun. Rev. 27(2), 24–36 (1997)
Plank, J.S.: T1: erasure codes for storage applications. In: Proc. of the 4th USENIX Conference on File and Storage Technologies, San Francisco, 13–16 Dec 2005, pp. 1–74. USENIX
Xianghong, L., Jiwu, S.: Summary of research for erasure code in storage system. J. Comput. Res. Dev. 49(1), 1–11 (2012)
Patterson, D.A., Gibson, G., Katz, R.H.: A case for redundant arrays of inexpensive disks (RAID). In: ACM SIGMOD Chicago, Illinois, 1–3 June 1988, pp. 109–116. ACM
Blaum, M., Brady, J., Bruck, J., Menon, J.: EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)
Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S.: Row-diagonal parity for double disk failure correction. In: File and Storage Technologies (FAST), San Francisco, California, 31 Mar-2 April 2004, pp. 1–14. USENIX
Blaum, M.: A family of MDS array codes with minimal number of encoding operations. In: IEEE International Symposium on Information Theory, The Westin Seattle, 9–14 July 2006, pp. 2784–2788. IEEE
Plank, J.S.: The RAID-6 liberation codes. In: USENIX Conference on File and Storage Technologies (FAST), San Jose, 26–29 Feb 2008, pp. 97–110. USENIX
Plank, J.S.: The raid-6 liber8tion code. Int. J. High Perform. Comput. Appl. 23(3), 242–251 (2009)
Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57(7), 889–901 (2008)
Xu, L., Bruck, J.: X-code: MDS array codes with optimal encoding. IEEE Trans. Inf. Theory 45(1), 272–276 (1999)
Wu, C., Wan, S., He, X., Cao, Q., Xie, C.: H-Code: A hybrid MDS array code to optimize partial stripe writes in RAID-6. In: IEEE International Parallel and Distributed Processing Symposium, Anchorage, Alaska, 16–20 May 2011, pp. 782–793. IEEE
Fu, Y., Shu, J., Luo, X., Shen, Z., Hu, Q.: Short code: an efficient RAID-6 MDS code for optimizing degraded reads and partial stripe writes. IEEE Trans. Comput. 66(1), 127–137 (2016)
Shen, Z., Shu, J.: Hv code: An all-around mds code to improve efficiency and reliability of raid-6 systems. In: IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, 23–26 June 2014, pp. 550–561. IEEE Computer Society
Wu, C., He, X., Wu, G., Wan, S., Liu, X., Cao, Q., Xie, C.: HDP code: a horizontal-diagonal parity code to optimize i/o load balancing in raid-6. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Hong Kong, 27–30 June 2011, pp. 209–220. IEEE Computer Society
Fu, Y., Shu, J.: D-Code: an efficient RAID-6 code to optimize I/O loads and read performance. In: IEEE International Parallel and Distributed Processing Symposium, Hyderabad, 25–29 May 2015, pp. 603–612. IEEE Computer Society
Xu, L., Bohossian, V., Bruck, J., Wagner, D.G.: Low-density MDS codes and factors of complete graphs. IEEE Trans. Inf. Theory 45(6), 1817–1826 (1999)
Hafner, J.L.: HoVer erasure codes for disk arrays. In: International Conference on Dependable Systems and Networks (DSN’06), Philadelphia, 25–28 June 2006, pp. 217–226. IEEE Computer Society
Hafner, J.L.: WEAVER Codes: Highly Fault Tolerant Erasure Codes for Storage Systems. In: Fast, San Francisco, 13–16 Dec 2005, pp. 16–16
Jin, C., Jiang, H., Feng, D., Tian, L.: P-Code: A new RAID-6 code with optimal properties. In: the 23rd International Conference on Supercomputing, Yorktown Heights, NY, 8–12 June 2009, pp. 360–369. ACM
Xie, P., Huang, J., Cao, Q., Xie, C.: Balanced p-code: a raid-6 code to support highly balanced i/os for disk arrays. In: IEEE International Conference on Networking, Architecture, and Storage, Tianjin, 6–8 Aug 2014, pp. 133–137. IEEE Computer Society
Xie, P., Yuan, Z., Huang, J., Qin, X.: N-code: an optimal RAID-6 MDS array code for load balancing and high I/O performance. In: The 48th International Conference on Parallel Processing, Kyoto, 5–8 Aug 2019, pp. 34:31–34:10. ACM
Yuan, Z., XIE, P., GENG, S.: Summary of research for RAID system scaling schemes. Acta Electron. Sin. 47(11), 2420–2431 (2019)
Zhang, G., Shu, J., Xue, W., Zheng, W.: SLAS: an efficient approach to scaling round-robin striped volumes. ACM Trans. Stor. (TOS) 3(1), 3:1–3:29 (2007)
Zhang, G., Zheng, W., Shu, J.: ALV: a new data redistribution approach to RAID-5 scaling. IEEE Trans. Comput. 59(3), 345–357 (2009)
Zheng, W., Zhang, G.: Fastscale: Accelerate raid scaling by minimizing data migration. In: USENIX Conference on File and Storage Technologies, San Jose, CA, 15–17 Feb 2011, pp. 149–161. USENIX
Zhang, G., Wang, J., Li, K., Shu, J., Zheng, W.: Redistribute data to regain load balance during raid-4 scaling. IEEE Trans. Parallel Distrib. Syst. 26(1), 219–229 (2014)
Wu, C., He, X.: GSR: A global stripe-based redistribution approach to accelerate RAID-5 scaling. In: the International Conference on Parallel Processing, Pittsburgh, PA, 10–13 Sept 2012, pp. 460–469. IEEE Computer Society
Zhang, G., Zheng, W., Li, K.: Rethinking raid-5 data layout for better scalability. IEEE Trans. Comput. 63(11), 2816–2828 (2013)
Mao, Y., Wan, J., Zhu, Y., Xie, C.: A new parity-based migration method to expand raid-5. IEEE Trans. Parallel Distrib. Syst. 25(8), 1945–1954 (2013)
Liang, J., Xu, Y., Li, Y., Pan, Y.: ISM-an intra-stripe data migration approach for RAID-5 scaling. In: International Conference on Networking, Architecture, and Storage (NAS), Shenzhen, 7–9 Aug 2017, pp. 1–10. IEEE Computer Society
Gonzalez, J.L., Cortes, T.: Increasing the capacity of RAID5 by online gradual assimilation. In: The International Workshop on Storage Network Architecture and Parallel I/O, New York, 30 Sept 2004, pp. 17–24. ACM
Goel, A., Shahabi, C., Yao, S.-Y.D., Zimmermann, R.: SCADDAR: An efficient randomized technique to reorganize continuous media blocks. In: the 18th International Conference on Data Engineering, San Jose, CA, 26 February–1 March 2002, pp. 473–482. IEEE Computer Society
Wu, C., He, X., Han, J., Tan, H., Xie, C.: SDM: A stripe-based data migration scheme to improve the scalability of RAID-6. In: IEEE International Conference on Cluster Computing, Beijing, 24–28 Sept 2012, pp. 284–292. IEEE Computer Society
Zhang, G., Wu, G., Lu, Y., Wu, J., Zheng, W.: Xscale: online X-code RAID-6 scaling using lightweight data reorganization. IEEE Trans. Parallel Distrib. Syst. 27(12), 3687–3700 (2016)
Xia, S., Mao, Y., Tan, M., Jing, W.: HCS: Expanding H-code RAID 6 without recalculating parity blocks in big data circumstance. In: International Conference of Young Computer Scientists, Engineers and Educators, Harbin, 10–12 Jan 2015, pp. 65–72. Springer
Yuan, Z., You, X., Lv, X., Li, M., Xie, P. (2021) HS6: an efficient H-code RAID-6 scaling by optimizing data migrating and parity updating. J. Supercomput. 1–31
Jin, P., Xie, P., Yuan, Z., Hu, Y., Gao, Y., Ma, J.: An approach for RAID-6 scaling based on D-code. In: International Conference on Computer and Communications (ICCC), Chengdu, 6–9 Dec 2019, pp. 545–549. IEEE
Zhong, X., Yuan, Z., Hu, Y., Xie, P.: An approach for RAID scaling based on STAR-code. In: International Conference on Computer and Communication Engineering Technology (CCET), Beijing, 16–18 Aug 2019, pp. 105–108. IEEE
Hu, Y., Xie, P., Gao, Y., Geng, S.: A scheme for RAID-6 scaling based on HoVer. In: International Conference on High Performance Compilation, Computing and Communications, Guangdong, 27–29 June 2020, pp. 168–172. ACM
Acknowledgements
A part of this work was presented at the 2019 International Conference on Communication and Information Processing (ICCIP 2019) and we have made substantial changes in this manuscript. This work was supported by the National Natural Science Foundation of China under Grants No.61762075, No.61671070, No.61972364 and No.61862055. It is also supported by the Provincial Natural Science Foundation of Qinghai under Grant No.2020-ZJ-926. The authors also acknowledge the Natural Science Foundation of Beijing under Grants No. 4212020, Defense-related Science and Technology Key Lab Fund project under Grants No. 6412006200404, Qin Xin Talents Cultivation Program of Beijing Information Science and Technology University under Grant No.QXTCP B201908 and Research Planning of Beijing Municipal Commission of Education under grant No.KM202111232001.
Funding
This work was supported by the National Natural Science Foundation of China under grants No.61762075, No.61671070, No.61972364 and No.61862055. It is also supported by the Provincial Natural Science Foundation of Qinghai under grant No.2020-ZJ-926. The authors also acknowledge the Natural Science Foundation of Beijing under Grants No. 4212020, Defense-related Science and Technology Key Lab Fund project under Grants No. 6412006200404, Qin Xin Talents Cultivation Program of Beijing Information Science and Technology University under grant No.QXTCP B201908 and Research Planning of Beijing Municipal Commission of Education under grant No.KM202111232001.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by [Zhu Yuan] and [Muyuan Li]. [Xindong You] and [Xue qiang Lv] guide the whole process of the experiment. The project comes from [Ping Xie]. [Ping Xie] participates and guides the whole work as the corresponding author. The first draft of the manuscript was written by [Zhu Yuan] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with animals performed by any of the authors. Informed consent was obtained from all individual participants included in the study.
Informed consent
For all the above contents and statements, all authors in this manuscript have informed consent.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yuan, Z., You, X., Lv, X. et al. HDS: optimizing data migration and parity update to realize RAID-6 scaling for HDP. Cluster Comput 24, 3815–3835 (2021). https://doi.org/10.1007/s10586-021-03379-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03379-0