Abstract
Cloud computing is being widely adopted in the industry due to providing more computation power and improve resource utilization. In cloud computing systems, many users execute various types of applications that produce a large amount of data. To handle a large amount of data, cloud computing systems provide various, high-performance, and large-scale clustered storage devices. With the large capacity, improving the performance of storage maintenance is an important issue since the large capacity can increase the suspend time during the maintenance significantly. As a storage maintenance technique, checking a bad block in which the data cannot be accessed anymore prevents I/O failure of the application. However, an existing bad block checker (e.g., badblocks in Linux) takes a long time, even when storage devices provide parallelism (e.g., multiple disks, multi-channel SSD, etc). It is because the existing bad block checker performs I/O and check operations in a serialized manner. To reduce the checking time, we propose an efficient and parallel bad block checker for exploiting the parallelism of storage devices. In our scheme, we enable parallel I/O and check operations for the bad block instead of the serialized operations. To do this, we first divide a series of check operations into parallel tasks (i.e., independent tasks). Second, we create a thread pool in which multiple workers fetch their tasks concurrently. Finally, we enable each checker to perform its own check and I/O operations in parallel. We implement and evaluate our checker on a 32-core machine with a disk array and an NVMe SSD. The experimental results show that our proposed bad block checker improves the performance by up to 3.7\(\times\) and 7.8\(\times\) in the disk array and NVMe SSD, respectively, compared with the existing bad block checker.
Similar content being viewed by others
Notes
If the read operation for a block is failed, bad block checker considers the block as a bad block.
References
Venkatesh, M., Sumalatha, MR., SelvaKumar, C.: Improving public auditability, data possession in data storage security for cloud computing. In 2012 International Conference on Recent Trends in Information Technology, pages 463–467. IEEE, (2012)
Aceto, G., Botta, A., De Donato, W., Pescapè, A.: Cloud monitoring: A survey. Comput. Netw. 57(9), 2093–2115 (2013)
Yang, T., Shia, B.-C., Wei, J., Fang, K.: Mass data analysis and forecasting based on cloud computing. JSW 7(10), 2189–2195 (2012)
Lee, Y.C., Zomaya, A.Y.: Energy efficient utilization of resources in cloud computing systems. J. Supercomput. 60(2), 268–280 (2012)
Jian-Hua, Z., Nan, Z.: Cloud computing-based data storage and disaster recovery. In 2011 International Conference on Future Computer Science and Education, pp. 629–632. IEEE, (2011)
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Fut. Gener. Comput. Syst. 25(6), 599–616 (2009)
Xun, X.: From cloud computing to cloud manufacturing. Robotics and computer-integrated manufacturing 28(1), 75–86 (2012)
Rimal, B. P., Jukan, A., Katsaros, D., Goeleven, Y.: Architectural requirements for cloud computing systems: an enterprise cloud approach. J. Grid Comput., 9(1):3–26, 2011
George, A., Angela Demke, B., Ashvin, G.: Opportunistic storage maintenance. In Proceedings of the 25th Symposium on Operating Systems Principles, pp. 457–473. ACM (2015)
Kumar, R.S., Saxena, A.: Data integrity proofs in cloud storage. In 2011 Third International Conference on Communication Systems and Networks (COMSNETS 2011), pp. 1–4. IEEE, (2011)
Aaron, K.: How to check bad sectors or bad blocks on hard disk in linux. https://www.tecmint.com/check-linux-hard-disk-bad-sectors-bad-blocks/
Anonymous. Bad sector. https://en.wikipedia.org/wiki/Bad_sector
Anonymous. Bad block. http://www.linfo.org/bad_blocks.html
Anonymous. Bad sectors explained: Why hard drives get bad sectors and what you can do about it. https://www.howtogeek.com/173463/bad-sectors-explained-why-hard-drives-get-bad-sectors-and-what-you-can-do-about-it/
George, A., Alina, O., Bianca, S.: Practical scrubbing: Getting to the bad sector at the right time. In IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012), pp. 1–12. IEEE, (2012)
Farzaneh, M., Ioan, S., Bianca, S.: Proactive error prediction to improve storage system reliability. In 2017\(\{\)USENIX\(\}\)Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 17), pp. 391–402 (2017)
Emily. Best free ssd repair tool: Fix corrupted solid state drive easily. https://www.diskpart.com/articles/ssd-repair-tool-7201.html#H2-4
Jack Wallen. how-to-check-ssd-health-in-linux. https://www.techrepublic.com/article/how-to-check-ssd-health-in-linux/
Mihir-Patkar. 5 warning signs your ssd is about to break down and fail. https://www.makeuseof.com/tag/5-warning-signs-ssd-break-fail/
badblocks. https://linux.die.net/man/8/badblocks
Anonymous. How long does it take fsck to run?! http://www.jaguarpc.com/forums/vps-dedicated/14217-howlong-does-take-fsck-run.html
Anonymous. How long does badblocks take on a 1tb drive? https://superuser.com/questions/240641/how-long-does-badblocks-take-on-a-1tb-drive
uPẽlvis Is there anything faster than badblocks -wsv /dev/sdx for thoroughly testing a new drive on linux? https://www.reddit.com/r/DataHoarder/comments/575vn4/is_there_anything_faster_than_badblocks_wsv/
Lakshmi, N., Bairavasundaram, G. R., Goodson, S., Pasupathy, J. S.: An analysis of latent sector errors in disk drives. In ACM SIGMETRICS Performance Evaluation Review, vol. 35, pp. 289–300. ACM, (2007)
Han, J., Zhu, G., Lee, E., Son, Y.: An efficient and parallel bad block checker for parallelism of storage devices. In 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), pp. 13–18. IEEE, (2020)
mkfs command. https://linux.die.net/man/8/mkfs
McKusick, M. K., Joy, W. N., Leffler, S. J., Fabry, R. S: Fsck- the unixfile system check program. Unix System Manager’s Manual-4.3 BSD Virtual VAX-11 Version, (1986)
tytso. e2fsprogs. https://github.com/tytso/e2fsprogs
Patterson, D. A., Gibson, G., Katz, R. H : A case for redundant arrays of inexpensive disks (RAID), vol. 17. ACM (1988)
Ma, A., Dragga, C., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Mckusick, M.K.: Ffsck: The fast file-system checker. ACM Trans. Storage (TOS) 10(1), 2 (2014)
Gunawi, H. S., Rajimwale, A., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H: Sqck: A declarative file system checker. In OSDI, pp. 131–146 (2008)
chkdsk. https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/chkdsk
Schwarz, Thomas JE., Xin, Qin ., Miller, Ethan L., Long, Darrell DE., Hospodor, Andy ., Ng. Spencer.: Disk scrubbing in large archival storage systems. In The IEEE Computer Society’s 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004.(MASCOTS 2004). Proceedings., pages 409–418. IEEE, (2004)
Bjørling, M., Axboe, J., Nellans, D., Bonnet, P.: Linux block io: introducing multi-queue ssd access on multi-core systems. In Proceedings of the 6th international systems and storage conference, pp. 22. ACM, (2013)
Son, Yongseok., Kim, Sunggon., Yeom, Heon Y ., Han, Hyuck.: High-performance transaction processing in journaling file systems. In 16th \(\{\)USENIX\(\}\)Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 18), pages 227–240, (2018)
Bhat, S.S., Rasha Eqbal, A.T., Clements, M.F.K., Nickolai, Z.: Scaling a file system to many cores using an operation log. In Proceedings of the 26th Symposium on Operating Systems Principles, pp. 69–86. ACM, (2017)
Acknowledgements
This work was supported by the National Research Foundation of the Korea Government (MSIT) (2018R1C1B5085640, 2021R1C1C1010861). This work was also supported by Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (P0012724, The Competency Development Program for Industry Specialist) (Corresponding Author: Yongseok Son).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Han, J., Zhu, G., Lee, E. et al. Design and implementation of an efficient and parallel bad block checker for parallelism of storage devices. Cluster Comput 26, 2615–2627 (2023). https://doi.org/10.1007/s10586-021-03353-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-021-03353-w