Abstract
Existing data management tools have some limitations such as restrictions to specific file systems or shortage of transparence to applications. In this paper, we present a new data management tool called AIP, which is implemented via the standard data management API, and hence it supports multiple file systems and makes data management operations transparent to applications. First, AIP provides centralized policy-based data management for controlling the placement of files in different storage tiers. Second, AIP uses differentiated collections of file states to improve the execution efficiency of data management policies, with the help of the caching mechanism of file states. Third, AIP also provides a resource arbitration mechanism for controlling the rate of initiated data management operations. Our results from representative experiments demonstrate that AIP has the ability to provide high performance, to introduce low management overhead, and to have good scalability.
Similar content being viewed by others
References
Smith A J. Long term file migration: development and evaluation of algorithms. Commun ACM, 1981, 24: 521–532
Douceur J R, Bolosky W J. A large-scale study of file system contents. In: Proceedings of the 1999 ACM SIGMETRICS Conference. New York: ACM, 1999. 59–70
Vogels W. File system usage in Windows NT 4.0. In: Proceedings of the 17th ACM Symposium on Operating Systems Principles. New York: ACM, 1999. 93–109
Wang F, Xin Q, Hong B, et al. File system workload analysis for large scale scientific computing applications. In: Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies(MSST 2004). Washington DC: IEEE, 2004. 139–152
Gibson T J, Miller E L, Long D D E. Long-term file activity and inter-reference patterns. In: Proceedings of 24th International Conference on Technology Management and Performance Evaluation of Enterprise-Wide Information Systems. California: Computer Measurement Group, 1998. 976–987
Gibson T J, Miller E L. Long-term file activity patterns in a UNIX workstation environment. In: 15th IEEE Symposium on Mass Storage Systems. Washington DC: IEEE, 1998. 355–371
Gribble S, Manku G, Roselli E, et al. Self-similarity in file systems. In: SIGMETRICS98. New York: ACM, 1998. 141–150
Miroshnichenko A. Data management API: the standard and implementation experiences. In: Proceedings of AUUG 96 & Asia Pacific World Wide Web. NSW: AUUG, 1996. 271–282
Jin H, Xiong M Z, Wu S. Information value evaluation model for ILM. In: ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Washington DC, 2008. 543–548
Zhao X N, Li Z H, Zeng L J. A hierarchical storage strategy based on block-level data valuation. In: 4th International Conference on Networked Computing and Advanced Information Management. Washington DC: IEEE, 2008. 36–41
Vengerov D. A reinforcement learning framework for online data migration in hierarchical storage systems. J Supercomput, 2008, 43: 1–19
Verma A, Pease D, Sharma U, et al. An architecture for lifecycle management in very large file systems. In: Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies. Washington DC: IEEE, 2005. 160–168
Menon J, Pease D A, Rees R, et al. IBM storage tank-a heterogeneous scalable SAN file system. IBM Syst J, 2003, 42: 250–267
Beigi M, Devarakonda M V, Jain R, et al. Akshat verma: policy-based information lifecycle management in a largescale file system. In: POLICY’ 05 Proceedings of the 6th IEEE International Workshop on Policies for Distributed Systems and Networks. Washington DC: IEEE, 2005. 139–148
He D S, Zhang X B, Du D H C, et al. Coordinating parallel hierarchical storage management in object-base cluster file system. In: Proceeding of 23nd IEEE-14th NASA Goddard Conference on Mass Storage Systems and Technologies. Washington DC: IEEE, 2006. 219–234
Gelb J P. System-managed storage. IBM Syst J, 1989, 28: 77–103
Kaczmarski M, Jiang T, Pease D. Beyond backup towards storage management. IBM Syst J, 2003, 42: 322–338
Anonymous. Veritas data protection products. 2004. http://veritas.com
Brooks C, McFarlane P, Pott N, et al. IBM tivoli storage management concepts. http://www.redbooks.ibm.com/redbooks/pdfs/sg244877.pdf
EMC Corporation. A better approach to managing file system data, lowering costs, reducing risk, and managing data growth. EMC White Paper. 2006
Pike R, Presotto D, Dorward S, et al. Plan 9 from Bell Labs. Comput Syst, 1995, 8: 221254
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, G., Qiu, J., Shu, J. et al. AIP: a tool for flexible and transparent data management. Sci. China Inf. Sci. 56, 1–11 (2013). https://doi.org/10.1007/s11432-011-4466-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-011-4466-6