Abstract
Recent escalations in Internet development and volume of data have created a growing demand for large-capacity storage solutions. Although Cloud storage has yielded new ways of storing, accessing and managing data, there is still a need for an inexpensive, effective and efficient storage solution especially suited to big data management and analysis. In this paper, we take our previous work one step further and present an in-depth analysis of the key features of future big data storage services for both unstructured and semi-structured data, and discuss how such services should be constructed and deployed. We also explain how different technologies can be combined to provide a single, highly scalable, efficient and performance-aware big data storage system. We especially focus on the issues of data de-duplication for enterprises and private organisations. This research is particularly valuable for inexperienced solution providers like universities and research organisations, and will allow them to swiftly set up their own big data storage services.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amazon. Amazon Simple Storage Service (S3), http://aws.amazon.com/s3/
Google. Google Cloud Storage Service, http://code.google.com/apis/storage/
AWS Case Study: SmugMug (2013)
AWS Case Study: Jungle Disk
Amazon, Amazon S3 - The First Trillion Objects (2012)
Gohring, N.: Amazon’s S3 Down for Several Hours
Brodkin, J.: Outage hits Amazon S3 storage service (2008)
Li, Y., Guo, L., Guo, Y.: CACSS: Towards a Generic Cloud Storage Service. In: CLOSER 2012, pp. 27–36. SciTePress (2012)
Garfinkel, S.L.: An evaluation of amazon’s grid computing services: EC2, S3, and SQS. Citeseer (2007)
Rackspace. Cloud Files, http://www.rackspace.co.uk
Barr, J.: (2011)
Wang, G., Ng, T.E.: The impact of virtualization on network performance of amazon ec2 data center. In: 2010 Proceedings of the IEEE INFOCOM. IEEE (2010)
Garfinkel, S.L.: An evaluation of amazon’s grid computing services: EC2, S3, and SQS. in Center for. 2007. Citeseer (2007)
Openstack, http://openstack.org
Nurmi, D., et al.: The eucalyptus open-source cloud-computing system. IEEE (2009)
Abe, Y., Gibson, G.: pWalrus: Towards better integration of parallel file systems into cloud storage. IEEE (2010)
Bresnahan, J., et al.: Cumulus: an open source storage cloud for science. SC10 Poster (2010)
Borthakur, D.: The hadoop distributed file system: Architecture and design. Hadoop Project Website (2007)
HBase, A.: http://hbase.apache.org/
Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop Hbase-0.20.2 performance evaluation. In: 2010 4th International Conference on New Trends in Information Science and Service Science, NISS (2010)
Khetrapal, A., Ganesh, V.: HBase and Hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University (2006)
Saab, P.: Scaling memcached at Facebook. Facebook Engineering Note (2008)
Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: The Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)
Chang, F., et al.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2), 4 (2008)
Ongaro, D., et al.: Fast crash recovery in RAMCloud. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM (2011)
Tianming, Y., et al.: DEBAR: A scalable high-performance de-duplication storage system for backup and archiving. In: 2010 IEEE International Symposium on Parallel & Distributed Processing, IPDPS (2010)
Yujuan, T., et al.: SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup. In: 2010 39th International Conference on Parallel Processing, ICPP (2010)
Chuanyi, L., et al.: ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System. In: Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008 (2008)
Quinlan, S., Dorward, S.: Venti: A new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies (2002)
You, L.L., Pollack, K.T., Long, D.D.: Deep Store: An archival storage system architecture. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005. IEEE (2005)
Dubnicki, C., et al.: Hydrastor: A scalable secondary storage. In: Procedings of the 7th Conference on File and Storage Technologies. USENIX Association (2009)
Jiansheng, W., et al.: MAD2: A scalable high-throughput exact deduplication approach for network backup services. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST (2010)
Guo, Y.-K., Guo, L.: IC cloud: Enabling compositional cloud. International Journal of Automation and Computing 8(3), 269–279 (2011)
Sandberg, R., et al.: Design and implementation of the Sun network filesystem (1985)
Carns, P.H., et al.: PVFS: A parallel file system for Linux clusters. USENIX Association (2000)
Schwan, P.: Lustre: Building a file system for 1000-node clusters (2003)
Gilbert, H., Handschuh, H.: Security analysis of SHA-256 and sisters. In: Matsui, M., Zuccherato, R.J. (eds.) SAC 2003. LNCS, vol. 3006, pp. 175–193. Springer, Heidelberg (2004)
Apache. Hadoop MapReduce, http://hadoop.apache.org/mapreduce/
Borthakur, D.: Hadoop avatarnode high availability (2010)
Doclo, L.: Clustering Tomcat Servers with High Availability and Disaster Fallback (2011)
Mulesoft, Tomcat Clustering - A Step By Step Guide
Amazon. Route 53, http://aws.amazon.com/route53/
JetS3t. JetS3t, http://jets3t.s3.amazonaws.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, Y., Guo, L., Guo, Y. (2013). An Efficient and Performance-Aware Big Data Storage System. In: Ivanov, I.I., van Sinderen, M., Leymann, F., Shan, T. (eds) Cloud Computing and Services Science. CLOSER 2012. Communications in Computer and Information Science, vol 367. Springer, Cham. https://doi.org/10.1007/978-3-319-04519-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-04519-1_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04518-4
Online ISBN: 978-3-319-04519-1
eBook Packages: Computer ScienceComputer Science (R0)