Abstract
In the current era of the Internet of Things (IoT), massive number of sensors are used in our daily lives. Sensors are everywhere around us. They exist in our homes, work places, streets, cars, and even ourselves. Examples include home appliances, wearable devices, and medical sensors. These sensors generate huge amount of dynamic, heterogeneous, and unstructured data that need special handling beyond the capabilities of conventional relational databases. Thus, identification of suitable data management platform to store and query this data is necessary. Despite of its popularity and efficiency in processing various types of big data, there is no single-guided study of how NoSQL data stores will behave with the Internet of Things (IoT) datasets. IoT data have its own characteristics that make it special. IoT data come from various sensors, with a wide range of formats, high velocity, and require high throughput processing with low latency. NoSQL data stores are commonly used to provide flexibility and availability for big data handling. However, there is a lack of comprehensive studies about which NoSQL data store performs the best from the two scalability aspects (scale-up and scale-out) in a distributed and parallel processing environment. This paper benchmarks the commonly used NoSQL data stores (MongoDB, Cassandra, and HBase), and compares their performance with real industrial IoT dataset. In addition, we focus on comparing the throughput, latency, and run time of the evaluated NoSQL data stores.
Similar content being viewed by others
References
Abramova V, Bernardino J (2013) NoSQL databases: Mongodb vs cassandra. In: Proceedings of the International C* Conference on Computer Science and Software Engineering, ACM, pp. 14–22
Abramova V, Bernardino J, Furtado P (2014) Which NoSQL database? A performance overview. Open J Databases (OJDB) 1(2):17–24
Adrian M (2016) DBMS 2015 numbers paint a picture of slow but steady change. https://blogs.gartner.com/merv-adrian/2016/04/12/dbms-2015-numbers-paint-a-picture-of-slow-but-steady-change/. Accessed Apr 2016
Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516
Apache. https://hbase.apache.org/. Accessed July 2018
Aslett M (2015) NoSQL by the numbers. http://www.odbms.org/2015/07/nosql-by-the-numbers/. Accessed July 2015
Barbierato E, Gribaudo M, Iacono M (2014) Performance evaluation of NoSQL big-data applications using multi-formalism models. Future Gener Comput Syst 37:345–353
Boral H, DeWitt DJ (1984) A methodology for database system performance evaluation, vol 14. ACM, New York
Brewer E (2010) A certain freedom: thoughts on the cap theorem. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing, ACM, pp. 335–335
Brewer EA (2000) Towards robust distributed systems. In: PODC, vol 7
Cattell R (2011) Scalable SQL and NoSQL data stores. ACM Sigmod Record 39(4):12–27
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) 26(2):4
Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on cloud computing, ACM, pp. 143–154
Corbellini A, Mateos C, Zunino A, Godoy D, Schiaffino S (2017) Persisting big-data: The NoSQL landscape. Inf Syst 63:1–23
Danova T (2013) Morgan Stanley: 75 billion devices will be connected to the Internet of Things by 2020. http://www.businessinsider.com/75-billion-devices-will-be-connectedto-the-internet-by-2020-2013-10. Accessed Oct 2013
Das N, Paul S, Sarkar BB, Chakrabarti S (2019) NoSQL overview and performance testing of HBase over multiple nodes with MYSQL. In: Abraham A, Dutta P, Mandal J, Bhattacharya A, Dutta S (eds) Emerging technologies in data mining and information security. Springer, Singapore, pp 269–279
Davoudian A, Chen L, Liu M (2018) A survey on NoSQL stores. ACM Comput Surv (CSUR) 51(2):40
Dede E, Govindaraju M, Gunter D, Canon RS, Ramakrishnan L (2013) Performance evaluation of a MongoDB and Hadoop platform for scientific data analysis. In: Proceedings of the 4th ACM workshop on scientific cloud computing. ACM, pp 13–20
Dey A, Fekete A, Nambiar R, Rohm U (2014) YCSB+ T: benchmarking web-scale transactional databases. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW). IEEE, pp 223–230
E. Corporation (2015) BenchMarking Top NoSQL databases. https://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf. Accessed May 2015
Flores A, Ramírez S, Toasa R, Vargas J, Urvina-Barrionuevo R, Lavin JM (2018) Performance evaluation of NoSQL and SQL queries in response time for the e-government. In: 2018 International Conference on eDemocracy & eGovernment (ICEDEG). IEEE, pp 257–262
Fox A, Brewer EA (1999) Harvest, yield, and scalable tolerant systems. In: Proceedings of the seventh workshop on hot topics in operating systems, 1999. IEEE, pp 174–178
Gessert F, Wingerath W, Friedrich S, Ritter N (2017) NoSQL database systems: a survey and decision guidance. Comput Sci Res Dev 32(3–4):353–365
Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen H-A (2013) Bigbench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, pp 1197–1208
Han J, Haihong E, Le G, Du J (2011) Survey on NoSQL database. In: 2011 6th International Conference on Pervasive Computing and Applications (ICPCA). IEEE, pp 363–366
Hendawi A, Gupta J, Jiayi L, Teredesai A, Naveen R, Mohak S, Ali M (2018) Distributed NoSQL data stores: performance analysis and a case study. Accepted. In: Big data 2018. IEEE, pp 2–18
Hendawi AM, Gupta J, Shi Y, Fattah H, Ali M (2017) The microsoft reactive framework meets the internet of moving things. In: Proceedings of the International Conference on Data Engineering, ICDE, California, USA, 2017. IEEE
internetlivestats. Internet live stats. http://www.internetlivestats.com/one-second/#tweets-band. Accessed Oct 2017
Kim H-J, Ko E-J, Jeon Y-H, Lee K-H (2018) Techniques and guidelines for effective migration from RDBMS to NoSQL. J Supercomput. https://doi.org/10.1007/s11227-018-2361-2
Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40
Li Y, Manoharan S (2013) A performance comparison of SQL and NoSQL databases. In: 2013 IEEE pacific rim conference on communications, computers and signal processing (PACRIM). IEEE, pp 15–19
Macaulay J, Buckalew L, Chung G (2015) Internet of things in logistics: a collaborative report by DHL and Cisco on implications and use cases for the logistics industry. Report, DHL Customer Solutions & Innovation, Troisdorf
Martins P, Abbasi M, Sá F (2019) A study over NoSQL performance. In: World conference on information systems and technologies. Springer, pp 603–611
McKendrick J (2016) With Internet Of Things and big data, 92% of everything we do will be in the cloud. https://www.forbes.com/sites/joemckendrick/2016/11/13/. Accessed Nov 2016
Mongo. https://www.mongodb.com/. Accessed July 2018
Mulcahy M (2017) Big data statistics & facts for 2017. https://www.waterfordtechnologies.com/big-data-interesting-facts/. Accessed Feb 2017
Ntarmos N, Patlakas I, Triantafillou P (2014) Rank join queries in NoSQL databases. Proc VLDB Endow 7(7):493–504
Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M (2009) A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM, pp 165–178
Pereira DA, Ourique de Morais W, Pignaton de Freitas E (2018) NoSQL real-time database performance comparison. Int J Parallel Emerg Distrib Syst 33(2):144–156
Plageras AP, Psannis KE, Stergiou C, Wang H, Gupta BB (2018) Efficient IoT-based sensor big data collection-processing and analysis in smart buildings. Future Gener Comput Syst 82:349–357
Planet-Cassandra. http://www.planetcassandra.org/blog/cassandra-error-handling-done-right/. Accessed July 2018
Press G (20147) Internet of Things by the numbers: market estimates and forecasts. https://www.forbes.com/sites/gilpress/2014/08/22/internet-of-thingsby-the-numbers-market-estimatesand-forecasts/#285c030b9194. Accessed Aug 2014
Rabl T, Gómez-Villamor S, Sadoghi M, Muntés-Mulero V, Jacobsen H-A, Mankovskii S (2012) Solving big data challenges for enterprise application performance management. Proc VLDB Endow 5(12):1724–1735
Ranking. http://db-engines.com/en/ranking. Accessed July 2018
Sakr S, Liu A, Batista DM, Alomari M (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor 13(3):311–336
Sayce D (2016) Number of tweets per day? https://www.dsayce.com/social-media/tweets-day/. Accessed Nov 2016
Silva YN, Almeida I, Queiroz M (2016) Learning SQL: beyond traditional relational databases. In: SIGCSE
Sivasubramanian S (2012) Amazon dynamoDB: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, pp 729–730
Tauro CJ, Aravindh S, Shreeharsha A (2012) Comparative study of the new generation, agile, scalable, high performance nosql databases. Int J Comput Appl 48(20):1–4
UC Berkley. https://amplab.cs.berkeley.edu/benchmark/. Accessed July 2018
ul Haque A, Mahmood T, Ikram N (2018) Performance comparison of state of art NoSQL technologies using apache spark. In: Proceedings of SAI Intelligent Systems Conference. Springer, pp 563–576
Van der Veen JS, Van der Waaij B, Meijer RJ (2012) Sensor data storage performance: SQL or NoSQL, physical or virtual. In: 2012 IEEE 5th International Conference on Cloud Computing (CLOUD). IEEE, pp 431–438
Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y, Gao W, Jia Z, Shi Y, Zhang S, et al (2014) Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA). IEEE, pp 488–499
Zikopoulos P, Eaton C et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York
Acknowledgements
This work was supported by National Research Foundation of Korea-Grant funded by the Korean Government (Ministry of Science and ICT)-NRF-2017R1A2B2012337).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hendawi, A., Gupta, J., Liu, J. et al. Benchmarking large-scale data management for Internet of Things. J Supercomput 75, 8207–8230 (2019). https://doi.org/10.1007/s11227-019-02984-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02984-6