Abstract
The emergence of novel storage medium relieves pressure caused by massive data on large-scale data centers. However, the storage cost is always a challenge that we cannot be ignored. As a trade-off between capacity and cost, RAID offers big capacity, low cost, high reliability, and flexible scaling, which occupies a large share of the storage market. Today, RAID scaling is the most frequent operation in storage systems. Nevertheless, it still has to face long scaling time and bad user experience. Therefore, we put forward an approach-Nscale for N-Code-based RAID-6 scaling. Nscale shortens the total scaling time by optimizing the data migration process and reducing the amount of data migration. Meanwhile, it ensures that the data only moves in the horizontal direction and in the same parity chain. In addition, it guarantees that the diagonal parity chain is not destroyed as much as possible. Derived from the experimental results, Nscale reduces the data migration by 81.05–92.35% and shortens the total scaling time by 54.5–62.4% under off-line. During and after the scaling process, Nscale also demonstrates excellent user average response time under different workloads, providing favorable user experience.












Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Arora S, Bala A (2021) An intelligent energy efficient storage system for cloud based big data applications. Simul Model Pract Theory 108:102260
Naeem M, Jamal T, Diaz-Martinez J, Butt SA, Montesano N, Tariq MI, De-la-Hoz-Franco E, De-La-Hoz-Valdiris E (2022) Trends and future perspective challenges in big data. In: Advances in intelligent data analysis and applications. Springer, pp 309–325
Sandhu AK (2021) Big data with cloud computing: discussions and challenges. Big Data Min Anal 5(1):32–40
Saadoon M, Hamid SHA, Sofian H, Altarturi HH, Azizul ZH, Nasuha N (2021) Fault tolerance in big data storage and processing systems: a review on challenges and solutions. Ain Shams Eng J 13(2):101538
Patterson DA, Gibson G, Katz RH (1988) A case for redundant arrays of inexpensive disks (RAID). In: ACM SIGMOD, ACM, Chicago, Illinois, 1–3 June 1988, pp 109–116
Xianghong L, Jiwu S (2012) Summary of research for erasure code in storage system. J Comput Res Dev 49(1):1–11
Blaum M, Brady J, Bruck J, Menon J (1995) EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures. IEEE Transact Comput 44(2):192–202
Corbett P, English B, Goel A, Grcanac T, Kleiman S, Leong J, Sankar S (2004) Row-diagonal parity for double disk failure correction. In: File and storage technologies (FAST), San Francisco, California, 31 Mar–2 Apr 2004. USENIX, pp 1–14
Huang C, Xu L (2008) STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Transact Comput 57(7):889–901
Xu L, Bruck J (1999) X-code: MDS array codes with optimal encoding. IEEE Transact Inf Theory 45(1):272–276
Xie P, Yuan Z, Huang J, Qin X (2019) N-Code: an optimal RAID-6 MDS array code for load balancing and high I/O performance. In: The 48th International Conference on Parallel Processing, Kyoto, 5–8 Aug 2019. ACM, pp 34:31-34:10
Yuan Z, Xie P, Geng S (2019) Summary of research for RAID system scaling schemes. Acta Electron Sinica 47(11):2420–2431
Zhang G, Shu J, Xue W, Zheng W (2007) SLAS: an efficient approach to scaling round-robin striped volumes. ACM Transact Storage (TOS) 3(1):3:1-3:29
Zhang G, Zheng W, Shu J (2009) ALV: a new data redistribution approach to RAID-5 scaling. IEEE Transact Comput 59(3):345–357
Zheng W, Zhang G (2011) Fastscale: accelerate raid scaling by minimizing data migration. In: USENIX Conference on File and Storage Technologies, San Jose, CA, 15–17 Feb 2011. USENIX, pp 149–161
Zhang G, Wang J, Li K, Shu J, Zheng W (2014) Redistribute data to regain load balance during raid-4 scaling. IEEE Transact Parallel Distrib Syst 26(1):219–229
Wu C, He X (2012) GSR: A global stripe-based redistribution approach to accelerate RAID-5 scaling. In: The International Conference on Parallel Processing, Pittsburgh, PA, 10–13 Sept 2012. IEEE Computer Society, pp 460-469
Zhang G, Zheng W, Li K (2013) Rethinking raid-5 data layout for better scalability. IEEE Transact Comput 63(11):2816–2828
Mao Y, Wan J, Zhu Y, Xie C (2013) A new parity-based migration method to expand raid-5. IEEE Transact Parallel Distrib Syst 25(8):1945–1954
Liang J, Xu Y, Li Y, Pan Y (2017) ISM-an intra-stripe data migration approach for RAID-5 scaling. In: International Conference on Networking, Architecture, and Storage (NAS), IEEE Computer Society, Shenzhen, 7–9 Aug 2017, pp 1–10
Gonzalez JL, Cortes T (2004) Increasing the capacity of RAID5 by online gradual assimilation. In: The International Workshop on Storage Network Architecture and Parallel I/O, ACM, New York, 30 Sept 2004, pp 17–24
Goel A, Shahabi C, Yao S-YD, Zimmermann R (2002) SCADDAR: an efficient randomized technique to reorganize continuous media blocks. In: The 18th International Conference on Data Engineering, EEE Computer Society, San Jose, CA, 26 Feb–1 Mar 2002, pp 73–82
Wu C, He X, Han J, Tan H, Xie C (2012) SDM: A stripe-based data migration scheme to improve the scalability of RAID-6. In: IEEE International Conference on Cluster Computing, IEEE Computer Society, Beijing, 24–28 Sept 2012, pp 284–292
Zhang G, Li K, Wang J, Zheng W (2013) Accelerate rdp raid-6 scaling by reducing disk i/os and xor operations. IEEE Transact Comput 64(1):32–44
Zhang G, Wu G, Lu Y, Wu J, Zheng W (2016) Xscale: online X-code RAID-6 scaling using lightweight data reorganization. IEEE Transact Parallel Distrib Syst 27(12):3687–3700
Yuan Z, You X, Lv X, Li M, Xie P (2021) HS6: an efficient H-code RAID-6 scaling by optimizing data migrating and parity updating. J Supercomput 77(11):12987–13017
Wu C, Wan S, He X, Cao Q, Xie C (2011) H-Code: a hybrid MDS array code to optimize partial stripe writes in RAID-6. In: IEEE International Parallel and Distributed Processing Symposium, IEEE, Anchorage, Alaska, 16–20 May 2011, pp 782–793
Yuan Z, You X, Lv X, Li M, Xie P (2021) HDS: optimizing data migration and parity update to realize RAID-6 scaling for HDP. Cluster Comput 24(4):3815–3835
Wu C, He X, Wu G, Wan S, Liu X, Cao Q, Xie C (2011) HDP code: a horizontal-diagonal parity code to optimize i/o load balancing in raid-6. In: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), IEEE Computer Society, Hong Kong, 27–30 June 2011, pp 209–220
Fu Y, Shu J, Luo X, Shen Z, Hu Q (2016) Short code: an efficient RAID-6 MDS code for optimizing degraded reads and partial stripe writes. IEEE Transact Comput 66(1):127–137
Yuan Z, You X, Lv X, Xie P (2021) SS6: online short-code RAID-6 scaling by optimizing new disk location and data migration. Comput J 64(10):1600–1616
Jin P, Xie P, Yuan Z, Hu Y, Gao Y, Ma J (2019) An Approach for RAID-6 Scaling Based on D-code. In: International Conference on Computer and Communications (ICCC), IEEE, Chengdu, 6–9 Dec 2019, pp 545-549
Fu Y, Shu J (2015) D-Code: An efficient RAID-6 code to optimize I/O loads and read performance. In: IEEE International Parallel and Distributed Processing Symposium, Hyderabad, 25–29 May 2015. IEEE Computer Society, pp 603-612
Hu Y, Xie P, Gao Y, Liu F, Li F, Wang D (2020) A scheme for RAID-6 Scaling Based on EVENODD. In: International Conference on High Performance Compilation, Computing and Communications, ACM, Guang Zhou, 27–29 June 2020, pp 84−88
Zhong X, Yuan Z, Hu Y, Xie P (2019) An Approach for RAID Scaling Based on STAR-Code. In: International Conference on Computer and Communication Engineering Technology (CCET), Beijing, 16–18 Aug 2019. IEEE, pp 105–108
Hafner JL (2006) HoVer erasure codes for disk arrays. In: International Conference on Dependable Systems and Networks (DSN’06), 2006. IEEE, pp 217–226
Hu Y, Xie P, Gao Y, Geng S (2020) A Scheme for RAID-6 Scaling Based on HoVer. In: International Conference on High Performance Compilation, Computing and Communications, Guangdong, 27–29 June 2020. ACM, pp 168–172
Jin C, Jiang H, Feng D, Tian L (2009) P-Code: A new RAID-6 code with optimal properties. In: the 23rd international conference on Supercomputing, Yorktown Heights, NY, 8–12 June 2009. ACM, pp 360–369
Xie P, Huang J, Cao Q, Xie C (2014) Balanced p-code: A raid-6 code to support highly balanced i/os for disk arrays. In: IEEE International Conference on Networking, Architecture, and Storage, Tianjin, 6–8 Aug 2014. IEEE Computer Society, pp 133–137
Shen Z, Shu J (2014) Hv code: An all-around mds code to improve efficiency and reliability of raid-6 systems. In: IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA, 23–26 June 2014. IEEE Computer Society pp 550–561
Ajdari M, Park P, Kim J, Kwon D, Kim J (2019) CIDR: A cost-effective in-line data reduction system for terabit-per-second scale SSD arrays. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019. IEEE, pp 28–41
Qiang Z, Jie L, Yinlong X, Yongkun L (2019) Research of SSD array architecture based on workload awareness. J Comput Res Dev 56(4):755–766
Davidović N, Obradović S, Dordević B, Timčenko V (2020) The influence of workloads and depth queue on the performance of SSD disk RAID 0 level array. In: 2020 19th International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, 18-20 March 2020. IEEE, pp 1-6
Zhang X, Hu Y, Lee PP, Zhou P Toward optimal storage scaling via network coding: From theory to practice. In: IEEE Conference on Computer Communications, Honolulu, HI, 16–19 Apr 2018. IEEE, pp 1808–1816
Maturana F, Rashmi K (2020) Bandwidth cost of code conversions in distributed storage: Fundamental limits and optimal constructions. arXiv preprint arXiv:2008.12707
Wu S, Shen Z, Lee PP, Xu Y (2021) Optimal repair-scaling trade-off in locally repairable codes: analysis and evaluation. IEEE Transact Parallel Distrib Syst 33:56–59
Lin Z, Guo H, Wu C (2020) AIR: an approximate intelligent redistribution approach to accelerate RAID scaling. CCF Transact High Perform Comput 2:50–56
Chen C, Jiang J, Fu R, Chen L, Li C, Wan S (2021) An intelligent caching strategy considering time-space characteristics in vehicular named data networks. IEEE Transact Intell Transp Syst. https://doi.org/10.1109/TITS.2021.3128012
Lin Z, Guo H, Wu C, Li J, Xue G, Guo M Rack-Scaling: An efficient rack-based redistribution method to accelerate the scaling of cloud disk arrays. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021. IEEE, pp 892–901
Guo H, Lin Z, Gu Y, Wu C, Jiang L, Li J, Xue G, Guo M Lazy-WL: a wear-aware load balanced data redistribution method for efficient SSD array scaling. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER), 2021. IEEE, pp 157–168
Acknowledgements
This work is supported by the Key Laboratory Foundation of IoT of Qinghai under Grant 2022-ZJ-Y21. Ping Xie and Zhu Yuan contributed equally to the work and should be regarded as co-first authors.
Funding
The Key Laboratory Foundation of IOT of Qinghai under Grant 2022-ZJ-Y21.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This manuscript belongs to the scope of engineering and does not involve human and animal research. All authors in this manuscript have informed consent.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xie, P., Yuan, Z. & Hu, Y. Nscale: an efficient RAID-6 online scaling via optimizing data migration. J Supercomput 79, 2383–2403 (2023). https://doi.org/10.1007/s11227-022-04752-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04752-5