Abstract
The aim of the paper is to show the data quality issues concerning statistical data gathering supported by Big Data technology. An example of statistical data gathering on job offers was used. This example allowed comparing data quality issues in two different methods of data gathering: traditional statistical surveys vs. Big Data technology. The case study shows that there are lots of barriers related to data quality when using Big Data technology. These barriers were identified and described in the paper. The important part of the article is the list of issues that must be tackled to improve the data quality in the repositories that comes from Big Data technology. The proposed solution gives an opportunity to integrate it with existing systems in organization, such as the data warehouse.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Biesdorf, S., Court, D., Willmott, P.: Big data: What’s your plan? McKinsey Quarterly, 40–51 (2013)
Brown, B., Court, D., Willmott, P.: Mobilizing your c-suite for big-data analytics. McKinsey Quarterly, 76–87 (2013)
Central Statistical Office of Poland: Central statistical office of poland notes, http://www.stat.gov.pl/gus/5466_PLK_HTML.htm (accessed December 1, 2013)
Church, A.H., Dutta, S.: The promise of big data for od: Old wine in new bottles or the next generation of data-driven methods for change? OD Practitioner 45, 23–31 (2013)
Das, T.K., Kumar, P.: Big data analytics: A framework for unstructured data analysis. International Journal of Engineering Science & Technology 5, 153–156 (2013)
Dolnicar, S., Grun, B.: Including Don’t know answer options in brand image surveys improves data quality. International Journal of Market Research 55, 2–14 (2013)
Durand, M.: Can big data deliver on its promise? OECD Observer,17 (2012)
Eurostat: Eurostat notes, http://epp.eurostat.ec.europa.eu/cache/ITY_SDDS/en/jvs_esms.htm (accessed December 12, 2013)
Hansen, J., Smith, S.: The impact of two-stage highly interesting questions on completion rates and data quality in online marketing research. International Journal of Market Research 54, 241–260 (2012)
Haug, A., Arlbjorn, J., Zachariassen, F., Schlichter, J.: Master data quality barriers: an empirical investigation. Industrial Management & Data Systems 113, 234–249 (2013)
Hoffmann, L.: Looking back at big data. Communications of the ACM 56, 21–23 (2013)
Jacobs, A.: The pathologies of big data. Communications of the ACM 52, 36–44 (2009)
Karr, A., Sanil, A., Banks, D.: Data quality: A statistical perspective. Statistical Methodology, 137–173 (2006)
Kumar, A., Niu, F., Re, C.: Hazy: Making it easier to build and maintain big-data analytics. Communications of the ACM 56, 40–49 (2013)
Louridas, P., Ebert, C.: Embedded analytics and statistics for big data. IEEE Software 30, 33–39 (2013)
Mandal, P.: Data quality in statistical process control. Total Quality Management & Business Excellence 15, 89–103 (2004)
Maślankowski, J.: The evolution of the data warehouse systems in recent years. Journal of Management and Finance 11, 42–54 (2013)
Maślankowski, J.: The integration of web-based information and the structured data in data warehousing. In: Wrycza, S. (ed.) SIGSAND/PLAIS 2013. LNBIP, vol. 161, pp. 66–75. Springer, Heidelberg (2013)
McAffee, A., Brynjolfsson, E.: Big data: The management revolution. Harvard Business Review, 61–68 (2012)
Nunan, D., Di Domenico, M.: Market research and the ethics of big data. International Journal of Market Research 55, 2–13 (2013)
Ross, J., Beath, C.M., Quaadgras, A.: You May Not Need Big Data After All. Harvard Business Review, 90–91 (2013)
Schroeder, J.: Big data, big business and the future of enterprise computing. NetworkWorld Asia 10, 17 (2013)
Sidi, F., Mohamed, K., Jabar, M., Ishak, I., Ibrahim, H., Mustapha, A.: A review of current trend on data management and quality in data communication. Australian Journal of Basic & Applied Sciences 7, 755–760 (2013)
Stonebraker, M.: What does ‘big data’ mean? Communications of the ACM 56, 10 (2013)
Vaughan, L., Yang, R.: Web data as academic and business quality estimates: A comparison of three data sources. Journal of the American Society for Information Science & Technology 63, 1960–1972 (2012)
Wang, R., Strong, D.: Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems 12, 5–33 (1996)
Yiu, D.: 5 storage system challenges in the big data era. NetworkWorld Asia 10, 26 (2013)
Zhang, D.: Granularities and inconsistencies in big data analysis. International Journal of Software Engineering & Knowledge Engineering 23, 887–893 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Maślankowski, J. (2014). Data Quality Issues Concerning Statistical Data Gathering Supported by Big Data Technology. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures, and Structures. BDAS 2014. Communications in Computer and Information Science, vol 424. Springer, Cham. https://doi.org/10.1007/978-3-319-06932-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-06932-6_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06931-9
Online ISBN: 978-3-319-06932-6
eBook Packages: Computer ScienceComputer Science (R0)