Abstract
Client-side metadata prefetching is commonly used in wide area network (WAN) file systems because it can effectively hide network latency. However, most existing prefetching approaches do not meet the various prefetching requirements of multiple workloads. They are usually optimized for only one specific workload and have no or harmful effects on other workloads. In this paper, we present a new self-tuning client-side metadata prefetching scheme that uses two different prefetching strategies and dynamically adapts to workload changes. It uses a directory-directed prefetching strategy to prefetch the related file metadata in the same directory, and a correlation-directed prefetching strategy to prefetch the related file metadata accessed across directories. A novel self-tuning mechanism is proposed to efficiently convert the prefetching strategy between directory-directed and correlation-directed prefetching. Experimental results using real system traces show that the hit ratio of the client-side cache can be significantly improved by our self-tuning client-side prefetching. With regards to the multi-workload concurrency scenario, our approach improves the hit ratios for the no-prefetching, directory-directed prefetching, variant probability graph algorithm, variant apriori algorithm, and variant semantic distance algorithm by up to 15.22%, 6.32%, 10.08%, 11.65%, and 10.73%, corresponding to 25.24%, 18.11%, 23.53%, 24.94%, and 24.19% reductions in the average access time, respectively.
Similar content being viewed by others
References
Wrzeszcz M, Trzepla K, Slota R, et al. Metadata organization and management for globalization of data access with onedata. In: Proceedings of the International Conference on Parallel Processing and Applied Mathematics, Krakow, 2015. 312–321
Grimshaw A, Morgan M, Kalyanaraman A. GFFS—the XSEDE global federated file system. Parall Process Lett, 2013, 23: 1340005
Weil S A, Brandt S A, Miller E L, et al. Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, Washington, 2006. 307–320
Ghemawat S, Gobioff H, Leung S T. The Google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, New York, 2003. 20–43
Zhang S, Catanese H, Wang A A I. The composite-file file system: decoupling the one-to-one mapping of files and metadata for better performance. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies, Santa Clara, 2016. 15–22
Beckmann N, Chen H, Cidon A. LHD: improving cache hit rate by maximizing hit density. In: Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, Renton, 2018. 389–403
Li Z, Chen Z, Srinivasan S M, et al. C-Miner: mining block correlations in storage systems. In: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, San Francisco, 2004. 173–186
Hsu W W, Smith A J, Young H C. The automatic improvement of locality in storage systems. ACM Trans Comput Syst, 2005, 23: 424–473
Ding X, Jiang S, Chen F, et al. DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In: Proceedings of USENIX Annual Technical Conference, Boston, 2007. 7: 261–274
Jiang S, Ding X, Xu Y, et al. A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage, 2013, 9: 1–23
Kuenning G H. The design of the seer predictive caching system. In: Proceedings of the 1st Workshop on Mobile Computing Systems and Applications, New York, 1994. 37–43
Griffioen J. Performance measurements of automatic prefetching. In: Proceedings of the ISCA International Conference on Parallel and Distributed Computing Systems, New York, 1995. 165–170
Li X, Xiao L, Qiu M, et al. Enabling dynamic file I/O path selection at runtime for parallel file system. J Supercomput, 2014, 68: 996–1021
Battle L, Chang R, Stonebraker M. Dynamic prefetching of data tiles for interactive visualization. In: Proceedings of the 2016 International Conference on Management of Data, San Francisco, 2016. 1363–1375
Wei B, Xiao L M, Wei W, et al. A new adaptive coding selection method for distributed storage systems. IEEE Access, 2018, 6: 13350–13357
Lin W, Xu S Y, Li J, et al. Design and theoretical analysis of virtual machine placement algorithm based on peak workload characteristics. Soft Comput, 2017, 21: 1301–1314
Patrick C M, Kandemir M, Karakoy M, et al. Cashing in on hints for better prefetching and caching in PVFS and MPI-IO. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, 2010: 191–202
Henschel R, Simms S, Hancock D, et al. Demonstrating Lustre over a 100 Gbps wide area network of 3500 km. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, Salt Lake City, 2012. 1–8
Carns P, Lang S, Ross R, et al. Small-file access in parallel file systems. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, New York, 2009. 1–11
Cao P, Felten E W, Karlin A R, et al. A study of integrated prefetching and caching strategies. SIGMETRICS Perform Eval Rev, 1995, 23: 188–197
Habermann P, Chi C C, Alvarez-Mesa M, et al. Application-specific cache and prefetching for HEVC CABAC decoding. IEEE Multimedia, 2017, 24: 72–85
Al Assaf M M, Jiang X, Qin X, et al. Informed prefetching for distributed multi-level storage systems. J Sign Process Syst, 2018, 90: 619–640
Hou B, Chen F. Pacaca: mining object correlations and parallelism for enhancing user experience with cloud storage. In: Proceedings of the 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2018. 293–305
Acknowledgements
This work was supported by National key R&D Program of China (Grant No. 2018YFB0203901), National Natural Science Foundation of China (Grant No. 61772053), the Fund of the State Key Laboratory of Software Development Environment (Grant No. SKLSDE-2018ZX-10), and Science Challenge Project (Grant No. TZ2016002).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, B., Xiao, L., Song, Y. et al. A self-tuning client-side metadata prefetching scheme for wide area network file systems. Sci. China Inf. Sci. 65, 132101 (2022). https://doi.org/10.1007/s11432-019-2833-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2833-1