Benchmarking large-scale data management for Internet of Things | The Journal of Supercomputing
Skip to main content

Benchmarking large-scale data management for Internet of Things

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In the current era of the Internet of Things (IoT), massive number of sensors are used in our daily lives. Sensors are everywhere around us. They exist in our homes, work places, streets, cars, and even ourselves. Examples include home appliances, wearable devices, and medical sensors. These sensors generate huge amount of dynamic, heterogeneous, and unstructured data that need special handling beyond the capabilities of conventional relational databases. Thus, identification of suitable data management platform to store and query this data is necessary. Despite of its popularity and efficiency in processing various types of big data, there is no single-guided study of how NoSQL data stores will behave with the Internet of Things (IoT) datasets. IoT data have its own characteristics that make it special. IoT data come from various sensors, with a wide range of formats, high velocity, and require high throughput processing with low latency. NoSQL data stores are commonly used to provide flexibility and availability for big data handling. However, there is a lack of comprehensive studies about which NoSQL data store performs the best from the two scalability aspects (scale-up and scale-out) in a distributed and parallel processing environment. This paper benchmarks the commonly used NoSQL data stores (MongoDB, Cassandra, and HBase), and compares their performance with real industrial IoT dataset. In addition, we focus on comparing the throughput, latency, and run time of the evaluated NoSQL data stores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Abramova V, Bernardino J (2013) NoSQL databases: Mongodb vs cassandra. In: Proceedings of the International C* Conference on Computer Science and Software Engineering, ACM, pp. 14–22

  2. Abramova V, Bernardino J, Furtado P (2014) Which NoSQL database? A performance overview. Open J Databases (OJDB) 1(2):17–24

    Google Scholar 

  3. Adrian M (2016) DBMS 2015 numbers paint a picture of slow but steady change. https://blogs.gartner.com/merv-adrian/2016/04/12/dbms-2015-numbers-paint-a-picture-of-slow-but-steady-change/. Accessed Apr 2016

  4. Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516

    Article  Google Scholar 

  5. Apache. https://hbase.apache.org/. Accessed July 2018

  6. Aslett M (2015) NoSQL by the numbers. http://www.odbms.org/2015/07/nosql-by-the-numbers/. Accessed July 2015

  7. Barbierato E, Gribaudo M, Iacono M (2014) Performance evaluation of NoSQL big-data applications using multi-formalism models. Future Gener Comput Syst 37:345–353

    Article  Google Scholar 

  8. Boral H, DeWitt DJ (1984) A methodology for database system performance evaluation, vol 14. ACM, New York

    Google Scholar 

  9. Brewer E (2010) A certain freedom: thoughts on the cap theorem. In: Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing, ACM, pp. 335–335

  10. Brewer EA (2000) Towards robust distributed systems. In: PODC, vol 7

  11. Cattell R (2011) Scalable SQL and NoSQL data stores. ACM Sigmod Record 39(4):12–27

    Article  Google Scholar 

  12. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) 26(2):4

    Article  Google Scholar 

  13. Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on cloud computing, ACM, pp. 143–154

  14. Corbellini A, Mateos C, Zunino A, Godoy D, Schiaffino S (2017) Persisting big-data: The NoSQL landscape. Inf Syst 63:1–23

    Article  Google Scholar 

  15. Danova T (2013) Morgan Stanley: 75 billion devices will be connected to the Internet of Things by 2020. http://www.businessinsider.com/75-billion-devices-will-be-connectedto-the-internet-by-2020-2013-10. Accessed Oct 2013

  16. Das N, Paul S, Sarkar BB, Chakrabarti S (2019) NoSQL overview and performance testing of HBase over multiple nodes with MYSQL. In: Abraham A, Dutta P, Mandal J, Bhattacharya A, Dutta S (eds) Emerging technologies in data mining and information security. Springer, Singapore, pp 269–279

    Chapter  Google Scholar 

  17. Davoudian A, Chen L, Liu M (2018) A survey on NoSQL stores. ACM Comput Surv (CSUR) 51(2):40

    Article  Google Scholar 

  18. Dede E, Govindaraju M, Gunter D, Canon RS, Ramakrishnan L (2013) Performance evaluation of a MongoDB and Hadoop platform for scientific data analysis. In: Proceedings of the 4th ACM workshop on scientific cloud computing. ACM, pp 13–20

  19. Dey A, Fekete A, Nambiar R, Rohm U (2014) YCSB+ T: benchmarking web-scale transactional databases. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW). IEEE, pp 223–230

  20. E. Corporation (2015) BenchMarking Top NoSQL databases. https://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf. Accessed May 2015

  21. Flores A, Ramírez S, Toasa R, Vargas J, Urvina-Barrionuevo R, Lavin JM (2018) Performance evaluation of NoSQL and SQL queries in response time for the e-government. In: 2018 International Conference on eDemocracy & eGovernment (ICEDEG). IEEE, pp 257–262

  22. Fox A, Brewer EA (1999) Harvest, yield, and scalable tolerant systems. In: Proceedings of the seventh workshop on hot topics in operating systems, 1999. IEEE, pp 174–178

  23. Gessert F, Wingerath W, Friedrich S, Ritter N (2017) NoSQL database systems: a survey and decision guidance. Comput Sci Res Dev 32(3–4):353–365

    Article  Google Scholar 

  24. Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen H-A (2013) Bigbench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, pp 1197–1208

  25. Han J, Haihong E, Le G, Du J (2011) Survey on NoSQL database. In: 2011 6th International Conference on Pervasive Computing and Applications (ICPCA). IEEE, pp 363–366

  26. Hendawi A, Gupta J, Jiayi L, Teredesai A, Naveen R, Mohak S, Ali M (2018) Distributed NoSQL data stores: performance analysis and a case study. Accepted. In: Big data 2018. IEEE, pp 2–18

  27. Hendawi AM, Gupta J, Shi Y, Fattah H, Ali M (2017) The microsoft reactive framework meets the internet of moving things. In: Proceedings of the International Conference on Data Engineering, ICDE, California, USA, 2017. IEEE

  28. internetlivestats. Internet live stats. http://www.internetlivestats.com/one-second/#tweets-band. Accessed Oct 2017

  29. Kim H-J, Ko E-J, Jeon Y-H, Lee K-H (2018) Techniques and guidelines for effective migration from RDBMS to NoSQL. J Supercomput. https://doi.org/10.1007/s11227-018-2361-2

  30. Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. ACM SIGOPS Oper Syst Rev 44(2):35–40

    Article  Google Scholar 

  31. Li Y, Manoharan S (2013) A performance comparison of SQL and NoSQL databases. In: 2013 IEEE pacific rim conference on communications, computers and signal processing (PACRIM). IEEE, pp 15–19

  32. Macaulay J, Buckalew L, Chung G (2015) Internet of things in logistics: a collaborative report by DHL and Cisco on implications and use cases for the logistics industry. Report, DHL Customer Solutions & Innovation, Troisdorf

  33. Martins P, Abbasi M, Sá F (2019) A study over NoSQL performance. In: World conference on information systems and technologies. Springer, pp 603–611

  34. McKendrick J (2016) With Internet Of Things and big data, 92% of everything we do will be in the cloud. https://www.forbes.com/sites/joemckendrick/2016/11/13/. Accessed Nov 2016

  35. Mongo. https://www.mongodb.com/. Accessed July 2018

  36. Mulcahy M (2017) Big data statistics & facts for 2017. https://www.waterfordtechnologies.com/big-data-interesting-facts/. Accessed Feb 2017

  37. Ntarmos N, Patlakas I, Triantafillou P (2014) Rank join queries in NoSQL databases. Proc VLDB Endow 7(7):493–504

    Article  Google Scholar 

  38. Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M (2009) A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. ACM, pp 165–178

  39. Pereira DA, Ourique de Morais W, Pignaton de Freitas E (2018) NoSQL real-time database performance comparison. Int J Parallel Emerg Distrib Syst 33(2):144–156

    Article  Google Scholar 

  40. Plageras AP, Psannis KE, Stergiou C, Wang H, Gupta BB (2018) Efficient IoT-based sensor big data collection-processing and analysis in smart buildings. Future Gener Comput Syst 82:349–357

    Article  Google Scholar 

  41. Planet-Cassandra. http://www.planetcassandra.org/blog/cassandra-error-handling-done-right/. Accessed July 2018

  42. Press G (20147) Internet of Things by the numbers: market estimates and forecasts. https://www.forbes.com/sites/gilpress/2014/08/22/internet-of-thingsby-the-numbers-market-estimatesand-forecasts/#285c030b9194. Accessed Aug 2014

  43. Rabl T, Gómez-Villamor S, Sadoghi M, Muntés-Mulero V, Jacobsen H-A, Mankovskii S (2012) Solving big data challenges for enterprise application performance management. Proc VLDB Endow 5(12):1724–1735

    Article  Google Scholar 

  44. Ranking. http://db-engines.com/en/ranking. Accessed July 2018

  45. Sakr S, Liu A, Batista DM, Alomari M (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor 13(3):311–336

    Article  Google Scholar 

  46. Sayce D (2016) Number of tweets per day? https://www.dsayce.com/social-media/tweets-day/. Accessed Nov 2016

  47. Silva YN, Almeida I, Queiroz M (2016) Learning SQL: beyond traditional relational databases. In: SIGCSE

  48. Sivasubramanian S (2012) Amazon dynamoDB: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, pp 729–730

  49. Tauro CJ, Aravindh S, Shreeharsha A (2012) Comparative study of the new generation, agile, scalable, high performance nosql databases. Int J Comput Appl 48(20):1–4

    Google Scholar 

  50. UC Berkley. https://amplab.cs.berkeley.edu/benchmark/. Accessed July 2018

  51. ul Haque A, Mahmood T, Ikram N (2018) Performance comparison of state of art NoSQL technologies using apache spark. In: Proceedings of SAI Intelligent Systems Conference. Springer, pp 563–576

  52. Van der Veen JS, Van der Waaij B, Meijer RJ (2012) Sensor data storage performance: SQL or NoSQL, physical or virtual. In: 2012 IEEE 5th International Conference on Cloud Computing (CLOUD). IEEE, pp 431–438

  53. Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y, Gao W, Jia Z, Shi Y, Zhang S, et al (2014) Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA). IEEE, pp 488–499

  54. Zikopoulos P, Eaton C et al (2011) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Research Foundation of Korea-Grant funded by the Korean Government (Ministry of Science and ICT)-NRF-2017R1A2B2012337).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyung-Sup Kwak.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hendawi, A., Gupta, J., Liu, J. et al. Benchmarking large-scale data management for Internet of Things. J Supercomput 75, 8207–8230 (2019). https://doi.org/10.1007/s11227-019-02984-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02984-6

Keywords