{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,4]],"date-time":"2025-04-04T19:45:01Z","timestamp":1743795901786,"version":"3.37.3"},"reference-count":92,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"DOI":"10.13039\/501100021171","name":"Guangdong Basic and Applied Basic Research Foundation","doi-asserted-by":"crossref","award":["2021A1515110080"],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Science Foundation for Young Scientists of China","award":["62202382"]},{"name":"Chinese National Key Research and Development Program","award":["2022YFB2702101"]},{"name":"Shaanxi Key Research and Development Program","award":["2021ZDLGY03-02, 2021ZDLGY03-08"]},{"name":"Major Research Plan of the National Natural Science Foundation of China","award":["92152301"]},{"name":"National Science Foundation of China for General Program","award":["62272394"]},{"name":"BITS Pilani-BBF\/BIT","award":["FY2022-23\/BCPS-123, GOA\/ACG\/2022-2023\/Oct\/11, and BPGC\/RIG\/2021-22\/06-2022\/02"]},{"name":"NSF","award":["CSR-2106634, CSR-2312785, CCF-1919113, OAC-2004751"]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2024,8,31]]},"abstract":"\n The wide adoption of Docker containers for supporting agile and elastic enterprise applications has led to a broad proliferation of container images. The associated storage performance and capacity requirements place a high pressure on the infrastructure of\n container registries<\/jats:bold>\n that store and distribute images and\n container storage systems<\/jats:bold>\n on the Docker client side that manage image layers and store ephemeral data generated at container runtime. The storage demand is worsened by the large amount of duplicate data in images. Moreover, container storage systems that use Copy-on-Write (CoW) file systems as storage drivers exacerbate the redundancy. Exploiting the high file redundancy in real-world images is a promising approach to drastically reduce the growing storage requirements of container registries and improve the space efficiency of container storage systems. However, existing deduplication techniques significantly degrade the performance of both registries and container storage systems because of data reconstruction overhead as well as the deduplication cost.\n <\/jats:p>\n \n We propose DupHunter, an end-to-end deduplication scheme that deduplicates layers for both Docker registries and container storage systems while maintaining a high image distribution speed and container I\/O performance. DupHunter is divided into three tiers: registry tier, middle tier, and client tier. Specifically, we first build a high-performance deduplication engine at the registry tier that not only natively deduplicates layers for space savings but also reduces layer restore overhead. Then, we use deduplication offloading at the middle tier to eliminate the redundant files from the client tier and avoid bringing deduplication overhead to the clients. To further reduce the data duplicates caused by CoWs and improve the container I\/O performance, we utilize a container-aware storage system at the client tier that reserves space for each container and arranges the placement of files and their modifications on the disk to preserve locality. Under real workloads, DupHunter reduces storage space by up to 6.9\u00d7 and reduces the\n GET<\/jats:monospace>\n layer latency up to 2.8\u00d7 compared to the state-of-the-art. Moreover, DupHunter can improve the container I\/O performance by up to 93% for reads and 64% for writes.\n <\/jats:p>","DOI":"10.1145\/3643819","type":"journal-article","created":{"date-parts":[[2024,1,30]],"date-time":"2024-01-30T12:15:29Z","timestamp":1706616929000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["An End-to-end High-performance Deduplication Scheme for Docker Registries and Docker Container Storage Systems"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6059-1154","authenticated-orcid":false,"given":"Nannan","family":"Zhao","sequence":"first","affiliation":[{"name":"Northwestern Polytechnical University, Xi'an, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9434-7965","authenticated-orcid":false,"given":"Muhui","family":"Lin","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4732-2707","authenticated-orcid":false,"given":"Hadeel","family":"Albahar","sequence":"additional","affiliation":[{"name":"Kuwait University, Kuwait, Kuwait"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3694-5511","authenticated-orcid":false,"given":"Arnab K.","family":"Paul","sequence":"additional","affiliation":[{"name":"BITS Pilani - KK Birla Goa Campus, Zuarinagar, India"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-9266-5304","authenticated-orcid":false,"given":"Zhijie","family":"Huan","sequence":"additional","affiliation":[{"name":"Northwestern Polytechnical University, Xi'an, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7219-7856","authenticated-orcid":false,"given":"Subil","family":"Abraham","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6848-6387","authenticated-orcid":false,"given":"Keren","family":"Chen","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksburg, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1424-9977","authenticated-orcid":false,"given":"Vasily","family":"Tarasov","sequence":"additional","affiliation":[{"name":"IBM Research-Almaden, San Jose, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7897-5612","authenticated-orcid":false,"given":"Dimitrios","family":"Skourtis","sequence":"additional","affiliation":[{"name":"IBM Research - Almaden, San Jose, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4487-2436","authenticated-orcid":false,"given":"Ali","family":"Anwar","sequence":"additional","affiliation":[{"name":"University of Minnesota, Twin Cities, Minneapolis, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0871-7263","authenticated-orcid":false,"given":"Ali","family":"Butt","sequence":"additional","affiliation":[{"name":"Virginia Tech., Blacksburg, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,6,6]]},"reference":[{"issue":"5","key":"e_1_3_2_2_2","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1145\/1168917.1168860","article-title":"A comparison of software and hardware techniques for x86 virtualization","volume":"40","author":"Adams Keith","year":"2006","unstructured":"Keith Adams and Ole Agesen. 2006. A comparison of software and hardware techniques for x86 virtualization. ACM SIGOPS Operat. Syst. Rev. 40, 5 (2006), 2\u201313.","journal-title":"ACM SIGOPS Operat. Syst. Rev."},{"key":"e_1_3_2_3_2","unstructured":"Alfred Krohmer. 2023. Proposal: Deduplicated Storage and Transfer of Container Images. Retrieved from https:\/\/gist.github.com\/devkid\/5249ea4c88aab4c7bff1b34c955c1980"},{"key":"e_1_3_2_4_2","unstructured":"Aliyun Open Storage Service (Aliyun OSS). Retrieved from https:\/\/cn.aliyun.com\/product\/oss?spm=5176.683009.2.4.Wma3SL"},{"key":"e_1_3_2_5_2","unstructured":"Amazon. 2023. Amazon Elastic Container Registry. Retrieved from https:\/\/aws.amazon.com\/ecr\/"},{"key":"e_1_3_2_6_2","unstructured":"Amazon. 2023. Containers on AWS. Retrieved from https:\/\/aws.amazon.com\/containers\/services\/"},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST\u201918)","author":"Anwar Ali","year":"2018","unstructured":"Ali Anwar, Mohamed Mohamed, Vasily Tarasov, Michael Littley, Lukas Rupprecht, Yue Cheng, Nannan Zhao, Dimitrios Skourtis, Amit S. Warke, Heiko Ludwig, Dean Hildebrand, and Ali R. Butt. 2018. Improving docker registry design based on production workload analysis. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST\u201918)."},{"key":"e_1_3_2_8_2","first-page":"2034\u20132041 Vol.3","volume-title":"Proceedings of the Congress on Evolutionary Computation (CEC\u201903)","volume":"3","author":"Bonino D.","year":"2003","unstructured":"D. Bonino, F. Corno, and G. Squillero. 2003. Dynamic prediction of Web requests. In Proceedings of the Congress on Evolutionary Computation (CEC\u201903), Vol. 3. 2034\u20132041 Vol.3. DOI:10.1109\/CEC.2003.1299923"},{"key":"e_1_3_2_9_2","unstructured":"Btrfs. 2023. Retrieved from https:\/\/btrfs.wiki.kernel.org\/index.php\/Deduplication"},{"key":"e_1_3_2_10_2","volume-title":"Cray User Group","author":"Canon Richard Shane","year":"2016","unstructured":"Richard Shane Canon and Doug Jacobsen. 2016. Shifter: Containers for HPC. In Cray User Group."},{"key":"e_1_3_2_11_2","first-page":"309","volume-title":"Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST\u201918)","author":"Cao Zhichao","year":"2018","unstructured":"Zhichao Cao, Hao Wen, Fenggang Wu, and David H. C. Du. 2018. ALACC: Accelerating restore performance of data deduplication systems using adaptive look-ahead window assisted chunk caching. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST\u201918). 309\u2013324."},{"key":"e_1_3_2_12_2","unstructured":"Ceph. 2023. Retrieved from https:\/\/docs.ceph.com\/docs\/master\/dev\/deduplication\/"},{"volume-title":"Proceedings of the IEEE\/ACM 18th International Conference on Mining Software Repositories (MSR\u201917)","author":"Cito J\u00fcrgen","key":"e_1_3_2_13_2","unstructured":"J\u00fcrgen Cito, Gerald Schermann, John Erik Wittern, Philipp Leitner, Sali Zumberi, and Harald C. Gall. [n.d.]. An empirical analysis of the docker container ecosystem on GitHub. In Proceedings of the IEEE\/ACM 18th International Conference on Mining Software Repositories (MSR\u201917)."},{"key":"e_1_3_2_14_2","unstructured":"Cloud Native Computing Foundation Projects. 2023. Retrieved from https:\/\/www.cncf.io\/projects\/"},{"key":"e_1_3_2_15_2","unstructured":"Backup Compression and Deduplication. 2023. Retrieved from https:\/\/tinyurl.com\/vgvb7wu"},{"key":"e_1_3_2_16_2","unstructured":"Datadog. 2023. 8 Surprising Facts About Real Docker Adoption. Retrieved from https:\/\/www.datadoghq.com\/docker-adoption\/"},{"key":"e_1_3_2_17_2","unstructured":"Docker. 2023. Retrieved from https:\/\/www.docker.com\/"},{"key":"e_1_3_2_18_2","unstructured":"Docker Hub. 2023. Retrieved from https:\/\/hub.docker.com\/"},{"key":"e_1_3_2_19_2","unstructured":"Docker Inc.2023. Docker Registry. Retrieved from https:\/\/github.com\/docker\/distribution"},{"key":"e_1_3_2_20_2","unstructured":"Docker Inc.2023. Docker Registry HTTP API V2. Retrieved from https:\/\/github.com\/docker\/distribution\/blob\/master\/docs\/spec\/api.md"},{"key":"e_1_3_2_21_2","unstructured":"DockerSlim. 2023. Retrieved from https:\/\/dockersl.im"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCISE.2002.1046594"},{"key":"e_1_3_2_23_2","first-page":"115","volume-title":"Proceedings of the IEEE 41st International Conference on Distributed Computing Systems (ICDCS\u201921)","author":"Fan Hao","year":"2021","unstructured":"Hao Fan, Shengwei Bian, Song Wu, Song Jiang, Shadi Ibrahim, and Hai Jin. 2021. Gear: Enable efficient container storage and deployment with a new image format. In Proceedings of the IEEE 41st International Conference on Distributed Computing Systems (ICDCS\u201921). 115\u2013125. DOI:10.1109\/ICDCS51616.2021.00020"},{"key":"e_1_3_2_24_2","unstructured":"Overlay file system. 2023. Retrieved from https:\/\/docs.kernel.org\/filesystems\/overlayfs.html"},{"key":"e_1_3_2_25_2","unstructured":"filefrag(8)\u2014Linux manual page. 2023. Retrieved from https:\/\/man7.org\/linux\/man-pages\/man8\/filefrag.8.html"},{"key":"e_1_3_2_26_2","unstructured":"fio(1) Linux man page. 2023. Retrieved from https:\/\/linux.die.net\/man\/1\/fio"},{"key":"e_1_3_2_27_2","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201914)","author":"Fu Min","year":"2014","unstructured":"Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Fangting Huang, and Qing Liu. 2014. Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information. In Proceedings of the USENIX Annual Technical Conference (ATC\u201914)."},{"key":"e_1_3_2_28_2","volume-title":"Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST\u201915)","author":"Fu Min","year":"2015","unstructured":"Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Yucheng Zhang, and Yujuan Tan. 2015. Design tradeoffs for data deduplication performance in backup workloads. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST\u201915)."},{"key":"e_1_3_2_29_2","volume-title":"Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER\u201911)","author":"Fu Yinjin","year":"2011","unstructured":"Yinjin Fu, Hong Jiang, Nong Xiao, Lei Tian, and Fang Liu. 2011. AA-Dedupe: An application-aware source deduplication approach for cloud backup services in the personal computing environment. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER\u201911)."},{"key":"e_1_3_2_30_2","unstructured":"GNU Tar. 2023. Basic Tar Format. Retrieved from https:\/\/www.gnu.org\/software\/tar\/manual\/html_node\/Standard.html"},{"key":"e_1_3_2_31_2","unstructured":"Google. 2023. Google Container Registry. Retrieved from https:\/\/cloud.google.com\/container-registry\/"},{"key":"e_1_3_2_32_2","unstructured":"Google Inc.2013. Google Compute Engine. Retrieved from https:\/\/cloud.google.com\/compute\/"},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the International Conference on Service-Oriented Computing (ICSOC\u201917)","author":"Gschwind Katharina","year":"2017","unstructured":"Katharina Gschwind, Constantin Adam, Sastry Duri, Shripad Nadgowda, and Maja Vukovic. 2017. Optimizing service delivery with minimal runtimes. In Proceedings of the International Conference on Service-Oriented Computing (ICSOC\u201917)."},{"key":"e_1_3_2_34_2","first-page":"325","volume-title":"Proceedings of the ACM Symposium on Cloud Computing","author":"Guo Fan","year":"2019","unstructured":"Fan Guo, Yongkun Li, Min Lv, Yinlong Xu, and John C. S. Lui. 2019. HP-mapper: A high performance storage driver for docker containers. In Proceedings of the ACM Symposium on Cloud Computing. 325\u2013336."},{"key":"e_1_3_2_35_2","volume-title":"Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST\u201916)","author":"Harter Tyler","year":"2016","unstructured":"Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. Slacker: Fast distribution with lazy docker containers. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST\u201916)."},{"key":"e_1_3_2_36_2","unstructured":"IBM Cloud Kubernetes Service. 2023. IBM Cloud Kubernetes Service. Retrieved from https:\/\/www.ibm.com\/cloud\/container-service"},{"key":"e_1_3_2_37_2","unstructured":"IBM Cloud Kubernetes Service. 2023. S3 Storage Driver. Retrieved from https:\/\/docs.docker.com\/registry\/storage-drivers\/s3\/"},{"key":"e_1_3_2_38_2","volume-title":"Proceedings of the Middleware Industry Track Workshop","author":"Jayaram K. R.","year":"2011","unstructured":"K. R. Jayaram, Chunyi Peng, Zhe Zhang, Minkyong Kim, Han Chen, and Hui Lei. 2011. An empirical analysis of similarity in virtual machine images. In Proceedings of the Middleware Industry Track Workshop."},{"key":"e_1_3_2_39_2","unstructured":"jdupes. 2023. Retrieved from https:\/\/github.com\/jbruchon\/jdupes"},{"key":"e_1_3_2_40_2","unstructured":"JFrog Artifcatory. 2023. Retrieved from https:\/\/jfrog.com\/artifactory\/"},{"key":"e_1_3_2_41_2","volume-title":"Proceedings of the International Systems and Storage Conference (SYSTOR\u201909)","author":"Jin Keren","year":"2009","unstructured":"Keren Jin and Ethan L. Miller. 2009. The effectiveness of deduplication on virtual machine disk images. In Proceedings of the International Systems and Storage Conference (SYSTOR\u201909)."},{"key":"e_1_3_2_42_2","first-page":"654","volume-title":"Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC\u201997)","author":"Karger David","year":"1997","unstructured":"David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC\u201997). Association for Computing Machinery, New York, NY, 654\u2013663. DOI:10.1145\/258533.258660"},{"key":"e_1_3_2_43_2","volume-title":"Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC\u201997)","author":"Karger David","year":"1997","unstructured":"David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC\u201997)."},{"key":"e_1_3_2_44_2","volume-title":"Proceedings of the IEEE International Conference on Cloud Computing in Emerging Markets (CCEM\u201916)","author":"Kumar K.","year":"2016","unstructured":"K. Kumar and M. Kurhekar. 2016. Economically efficient virtualization over cloud using docker containers. In Proceedings of the IEEE International Conference on Cloud Computing in Emerging Markets (CCEM\u201916)."},{"key":"e_1_3_2_45_2","volume-title":"Proceedings of the Linux Symposium","volume":"1","author":"KV Aneesh Kumar","year":"2008","unstructured":"Aneesh Kumar KV, Mingming Cao, Jose R. Santos, and Andreas Dilger. 2008. Ext4 block and inode allocator improvements. In Proceedings of the Linux Symposium, Vol. 1."},{"key":"e_1_3_2_46_2","first-page":"727","volume-title":"Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201920)","author":"Li Huiba","year":"2020","unstructured":"Huiba Li, Yifan Yuan, Rui Du, Kai Ma, Lanzheng Liu, and Windsor Hsu. 2020. DADI: Block-level image service for agile and elastic application deployment. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201920). 727\u2013740."},{"key":"e_1_3_2_47_2","volume-title":"Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST\u201913)","author":"Lillibridge Mark","year":"2013","unstructured":"Mark Lillibridge, Kave Eshghi, and Deepavali Bhagwat. 2013. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST\u201913)."},{"key":"e_1_3_2_48_2","volume-title":"Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909)","author":"Lillibridge M.","year":"2009","unstructured":"M. Lillibridge, K. Eshghi, D. Bhagwat, V. Deolalikar, G. Trezise, and P. Camble. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST\u201909)."},{"key":"e_1_3_2_49_2","volume-title":"Proceedings of the IEEE International Conference on Cloud Computing (CLOUD\u201919)","author":"Littley Michael","year":"2019","unstructured":"Michael Littley, Ali Anwar, Hannan Fayyaz, Zeshan Fayyaz, Vasily Tarasov, Lukas Rupprecht, Dimitrios Skourtis, Mohamed Mohamed, Heiko Ludwig, Yue Cheng, and Ali R. Butt. 2019. Bolt: Towards a scalable docker registry via hyperconvergence. In Proceedings of the IEEE International Conference on Cloud Computing (CLOUD\u201919)."},{"key":"e_1_3_2_50_2","volume-title":"Proceedings of the International Systems and Storage Conference (SYSTOR\u201912)","author":"Lu Maohua","year":"2012","unstructured":"Maohua Lu, David Chambliss, Joseph Glider, and Cornel Constantinescu. 2012. Insights for data reduction in primary storage: A practical analysis. In Proceedings of the International Systems and Storage Conference (SYSTOR\u201912)."},{"key":"e_1_3_2_51_2","first-page":"21","volume-title":"Proceedings of the Linux Symposium","volume":"2","author":"Mathur Avantika","year":"2007","unstructured":"Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. 2007. The new ext4 filesystem: Current status and future plans. In Proceedings of the Linux Symposium, Vol. 2. Citeseer, 21\u201333."},{"key":"e_1_3_2_52_2","volume-title":"Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST\u201903)","author":"Megiddo Nimrod","year":"2003","unstructured":"Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A self-tuning, low overhead replacement cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST\u201903)."},{"key":"e_1_3_2_53_2","volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912)","author":"Meister Dirk","year":"2012","unstructured":"Dirk Meister, J\u00fcrgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, and Julian Kunkel. 2012. A study on data deduplication in HPC storage systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912)."},{"key":"e_1_3_2_54_2","volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912)","author":"Meister Dirk","year":"2012","unstructured":"Dirk Meister, J\u00fcrgen Kaiser, Andre Brinkmann, Toni Cortes, Michael Kuhn, and Julian Kunkel. 2012. A study on data deduplication in HPC storage systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912)."},{"key":"e_1_3_2_55_2","unstructured":"Microsoft. 2023. Azure Container Registry. Retrieved from https:\/\/azure.microsoft.com\/en-us\/services\/container-registry\/"},{"key":"e_1_3_2_56_2","unstructured":"Microsoft Azure. 2023. Retrieved from https:\/\/azure.microsoft.com\/en-us\/"},{"key":"e_1_3_2_57_2","volume-title":"ACM SIGOPS Operating Systems Review","author":"Muthitacharoen Athicha","year":"2001","unstructured":"Athicha Muthitacharoen, Benjie Chen, and David Mazieres. 2001. A low-bandwidth network file system. In ACM SIGOPS Operating Systems Review, Vol. 35."},{"key":"e_1_3_2_58_2","first-page":"1063","volume-title":"Proceedings of the IEEE 38th International Conference on Distributed Computing Systems (ICDCS\u201918)","author":"Oh M.","year":"2018","unstructured":"M. Oh, S. Park, J. Yoon, S. Kim, K. Lee, S. Weil, H. Y. Yeom, and M. Jung. 2018. Design of global data deduplication for a scale-out distributed storage system. In Proceedings of the IEEE 38th International Conference on Distributed Computing Systems (ICDCS\u201918). 1063\u20131073."},{"issue":"2","key":"e_1_3_2_59_2","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1145\/170036.170081","article-title":"The LRU-K page replacement algorithm for database disk buffering","volume":"22","author":"O.\u2019Neil Elizabeth J.","year":"1993","unstructured":"Elizabeth J. O.\u2019Neil, Patrick E. O.\u2019Neil, and Gerhard Weikum. 1993. The LRU-K page replacement algorithm for database disk buffering. ACM SIGMOD Rec. 22, 2 (1993), 297\u2013306.","journal-title":"ACM SIGMOD Rec."},{"key":"e_1_3_2_60_2","unstructured":"OpenStack Swift storage driver. 2023. OpenStack Swift storage driver. Retrieved from https:\/\/docs.docker.com\/registry\/storage-drivers\/swift\/"},{"issue":"1","key":"e_1_3_2_61_2","first-page":"11","article-title":"A survey and classification of storage deduplication systems","volume":"47","author":"Paulo Jo\u00e3o","year":"2014","unstructured":"Jo\u00e3o Paulo and Jos\u00e9 Pereira. 2014. A survey and classification of storage deduplication systems. ACM Comput. Surveys 47, 1 (2014), 11.","journal-title":"ACM Comput. Surveys"},{"key":"e_1_3_2_62_2","first-page":"95","volume-title":"Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST\u201913)","author":"Plank James S.","year":"2013","unstructured":"James S. Plank, Mario Blaum, and James L. Hafner. 2013. SD codes: Erasure codes designed for how storage systems really fail. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST\u201913). 95\u2013104."},{"key":"e_1_3_2_63_2","unstructured":"Moby project. 2023. Retrieved from https:\/\/github.com\/moby\/moby"},{"key":"e_1_3_2_64_2","volume-title":"Proceedings of the 11th Joint Meeting on Foundations of Software Engineering (FSE\u201917)","author":"Rastogi Vaibhav","year":"2017","unstructured":"Vaibhav Rastogi, Drew Davidson, Lorenzo De Carli, Somesh Jha, and Patrick McDaniel. 2017. Cimplifier: Automatically Debloating Containers. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering (FSE\u201917)."},{"key":"e_1_3_2_65_2","unstructured":"Redis. 2023. Retrieved from https:\/\/redis.io\/"},{"key":"e_1_3_2_66_2","unstructured":"Redis. 2023. SETNX. Retrieved from https:\/\/redis.io\/commands\/setnx"},{"issue":"2","key":"e_1_3_2_67_2","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1137\/0108018","article-title":"Polynomial codes over certain finite fields","volume":"8","author":"Reed Irving S.","year":"1960","unstructured":"Irving S. Reed and Gustave Solomon. 1960. Polynomial codes over certain finite fields. J. Soc. Industr. Appl. Math. 8, 2 (1960), 300\u2013304.","journal-title":"J. Soc. Industr. Appl. Math."},{"key":"e_1_3_2_68_2","volume-title":"Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201916)","author":"Shilane Philip","year":"2016","unstructured":"Philip Shilane, Ravi Chitloor, and Uday Kiran Jonnala. 2016. 99 Deduplication problems. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201916). USENIX Association, Denver, CO. Retrieved from https:\/\/www.usenix.org\/conference\/hotstorage16\/workshop-program\/presentation\/shilane"},{"key":"e_1_3_2_69_2","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201913)","author":"Shim H.","year":"2013","unstructured":"H. Shim, P. Shilane, and W. Hsu. 2013. Characterization of incremental data changes for efficient data protection. In Proceedings of the USENIX Annual Technical Conference (ATC\u201913)."},{"key":"e_1_3_2_70_2","volume-title":"Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud\u201919)","author":"Skourtis Dimitris","year":"2019","unstructured":"Dimitris Skourtis, Lukas Rupprecht, Vasily Tarasov, and Nimrod Megiddo. 2019. Carving perfect layers out of docker images. In Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud\u201919)."},{"key":"e_1_3_2_71_2","volume-title":"Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201916)","author":"Spillane Richard P.","year":"2016","unstructured":"Richard P. Spillane, Wenguang Wang, Luke Lu, Maxime Austruy, Rawlinson Rivera, and Christos Karamanolis. 2016. Exo-clones: Better container runtime image management across the clouds. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage\u201916)."},{"key":"e_1_3_2_72_2","volume-title":"Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST\u201912)","author":"Srinivasan Kiran","year":"2012","unstructured":"Kiran Srinivasan, Timothy Bisson, Garth R. Goodson, and Kaladhar Voruganti. 2012. iDedup: Latency-aware, inline data deduplication for primary storage. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST\u201912)."},{"key":"e_1_3_2_73_2","unstructured":"Microsoft Azure Storage. 2023. Retrieved from https:\/\/azure.microsoft.com\/en-us\/services\/storage\/"},{"key":"e_1_3_2_74_2","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1145\/3419111.3421291","volume-title":"Proceedings of the 11th ACM Symposium on Cloud Computing","author":"Sun Yu","year":"2020","unstructured":"Yu Sun, Jiaxin Lei, Seunghee Shin, and Hui Lu. 2020. Baoverlay: A block-accessible overlay file system for fast and efficient container storage. In Proceedings of the 11th ACM Symposium on Cloud Computing. 90\u2013104."},{"key":"e_1_3_2_75_2","volume-title":"Proceedings of the 32nd International Conference on Massive Storage Systems and Technology (MSST\u201916)","author":"Sun Zhen","year":"2016","unstructured":"Zhen Sun, Geoff Kuenning, Sonam Mandal, Philip Shilane, Vasily Tarasov, Nong Xiao, and Erez Zadok. 2016. A long-term user-centric analysis of deduplication patterns. In Proceedings of the 32nd International Conference on Massive Storage Systems and Technology (MSST\u201916)."},{"key":"e_1_3_2_76_2","volume-title":"Proceedings of the Ottawa Linux Symposium","author":"Tarasov Vasily","year":"2014","unstructured":"Vasily Tarasov, Deepak Jain, Geoff Kuenning, Sonam Mandal, Karthikeyani Palanisami, Philip Shilane, Sagar Trehan, and Erez Zadok. 2014. Dmdedup: Device mapper target for data deduplication. In Proceedings of the Ottawa Linux Symposium."},{"key":"e_1_3_2_77_2","volume-title":"Proceedings of the 2nd IEEE International Workshops on Foundations and Applications of Self* Systems (FAS*W\u201917)","author":"Tarasov V.","year":"2017","unstructured":"V. Tarasov, L. Rupprecht, D. Skourtis, A. Warke, D. Hildebrand, M. Mohamed, N. Mandagere, W. Li, R. Rangaswami, and M. Zhao. 2017. In search of the ideal storage configuration for docker containers. In Proceedings of the 2nd IEEE International Workshops on Foundations and Applications of Self* Systems (FAS*W\u201917)."},{"key":"e_1_3_2_78_2","first-page":"199","volume-title":"Proceedings of the IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W\u201917)","author":"Tarasov Vasily","year":"2017","unstructured":"Vasily Tarasov, Lukas Rupprecht, Dimitris Skourtis, Amit Warke, Dean Hildebrand, Mohamed Mohamed, Nagapramod Mandagere, Wenji Li, Raju Rangaswami, and Ming Zhao. 2017. In search of the ideal storage configuration for Docker containers. In Proceedings of the IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W\u201917). IEEE, 199\u2013206."},{"key":"e_1_3_2_79_2","volume-title":"Proceedings of the USENIX Annual Technical Conference (ATC\u201918)","author":"Thalheim J\u00f6rg","year":"2018","unstructured":"J\u00f6rg Thalheim, Pramod Bhatotia, Pedro Fonseca, and Baris Kasikci. 2018. Cntr: Lightweight OS containers. In Proceedings of the USENIX Annual Technical Conference (ATC\u201918)."},{"key":"e_1_3_2_80_2","unstructured":"James Turnbull. 2023. The Docker Book: Containerization Is the New Virtualization. Shroff Publishers Navi Mumbai India."},{"key":"e_1_3_2_81_2","first-page":"1","volume-title":"Proceedings of the IEEE International Systems Conference (SysCon\u201912)","author":"Upadhyay Amrita","year":"2012","unstructured":"Amrita Upadhyay, Pratibha R. Balihalli, Shashibhushan Ivaturi, and Shrisha Rao. 2012. Deduplication and compression techniques in cloud design. In Proceedings of the IEEE International Systems Conference (SysCon\u201912). IEEE, 1\u20136."},{"key":"e_1_3_2_82_2","unstructured":"Vdo. 2023. Retrieved from https:\/\/github.com\/dm-vdo\/vdo"},{"key":"e_1_3_2_83_2","volume-title":"Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST\u201912)","author":"Wallace Grant","year":"2012","unstructured":"Grant Wallace, Fred Douglis, Hangwei Qian, Philip Shilane, Stephen Smaldone, Mark Chamness, and Windsor Hsu. 2012. Characteristics of backup workloads in production systems. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST\u201912)."},{"key":"e_1_3_2_84_2","volume-title":"Microservices: Flexible Software Architecture","author":"Wolff Eberhard","year":"2016","unstructured":"Eberhard Wolff. 2016. Microservices: Flexible Software Architecture. Addison-Wesley Professional, Boston, MA."},{"key":"e_1_3_2_85_2","first-page":"18","volume-title":"Proceedings of the 18th ACM SIGPLAN\/SIGOPS International Conference on Virtual Execution Environments","author":"Wu Song","year":"2022","unstructured":"Song Wu, Zhuo Huang, Pengfei Chen, Hao Fan, Shadi Ibrahim, and Hai Jin. 2022. Container-aware I\/O stack: Bridging the gap between container storage drivers and solid state devices. In Proceedings of the 18th ACM SIGPLAN\/SIGOPS International Conference on Virtual Execution Environments. 18\u201330."},{"key":"e_1_3_2_86_2","first-page":"1","volume-title":"Proceedings of the 6th Asia-Pacific Workshop on Systems","author":"Wu Xingbo","year":"2015","unstructured":"Xingbo Wu, Wenguang Wang, and Song Jiang. 2015. Totalcow: Unleash the power of copy-on-write for thin-provisioned containers. In Proceedings of the 6th Asia-Pacific Workshop on Systems. 1\u20137."},{"key":"e_1_3_2_87_2","first-page":"24","volume-title":"Proceedings of the Symposium on Mass Storage Systems and Technologies (MSST\u201917)","volume":"3","author":"Xu Qiumin","year":"2017","unstructured":"Qiumin Xu, Manu Awasthi, Krishna T. Malladi, Janki Bhimani, Jingpei Yang, Murali Annavaram, and Ming Hsieh. 2017. Performance analysis of containerized applications on local and remote storage. In Proceedings of the Symposium on Mass Storage Systems and Technologies (MSST\u201917), Vol. 3. 24\u201328."},{"key":"e_1_3_2_88_2","unstructured":"ZFS. 2023. Retrieved from https:\/\/en.wikipedia.org\/wiki\/ZFS"},{"key":"e_1_3_2_89_2","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1007\/978-3-030-64243-3_38","volume-title":"Proceedings of the15th International Conference on Green, Pervasive, and Cloud Computing (GPC\u201920)","author":"Zhang Shiqiang","year":"2020","unstructured":"Shiqiang Zhang, Song Wu, Hao Fan, Deqing Zou, and Hai Jin. 2020. BED: A block-level deduplication-based container deployment framework. In Proceedings of the15th International Conference on Green, Pervasive, and Cloud Computing (GPC\u201920). Springer, 504\u2013518."},{"key":"e_1_3_2_90_2","volume-title":"Proceedings of the Storage Developer Conference (SDC\u201916)","author":"Zhao Frank","year":"2016","unstructured":"Frank Zhao, Kevin Xu, and Randy Shain. 2016. Improving copy-on-write performance in container storage drivers. In Proceedings of the Storage Developer Conference (SDC\u201916)."},{"key":"e_1_3_2_91_2","volume-title":"Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER\u201919)","author":"Zhao Nannan","year":"2019","unstructured":"Nannan Zhao, Vasily Tarasov, Hadeel Albahar, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Amit S. Warke, Mohamed Mohamed, and Ali R. Butt. 2019. Large-scale analysis of the docker Hub dataset. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER\u201919)."},{"key":"e_1_3_2_92_2","volume-title":"Proceedings of the IEEE International Symposium on Workload Characterization (IISWC\u201913)","author":"Zhou Ruijin","year":"2013","unstructured":"Ruijin Zhou, Ming Liu, and Tao Li. 2013. Characterizing the efficiency of data deduplication for big data storage management. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC\u201913)."},{"key":"e_1_3_2_93_2","volume-title":"Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST\u201908)","author":"Zhu Benjamin","year":"2008","unstructured":"Benjamin Zhu, Kai Li, and R. Hugo Patterson. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST\u201908)."}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643819","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T12:19:29Z","timestamp":1717676369000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643819"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,6]]},"references-count":92,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,8,31]]}},"alternative-id":["10.1145\/3643819"],"URL":"https:\/\/doi.org\/10.1145\/3643819","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2024,6,6]]},"assertion":[{"value":"2023-04-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-22","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}