{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,13]],"date-time":"2025-04-13T00:12:11Z","timestamp":1744503131748,"version":"3.37.3"},"reference-count":37,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2020,4,26]],"date-time":"2020-04-26T00:00:00Z","timestamp":1587859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"The quality assurance of publication data in collaborative knowledge bases and in current research information systems (CRIS) becomes more and more relevant by the use of freely available spatial information in different application scenarios. When integrating this data into CRIS, it is necessary to be able to recognize and assess their quality. Only then is it possible to compile a result from the available data that fulfills its purpose for the user, namely to deliver reliable data and information. This paper discussed the quality problems of source metadata in Wikipedia and CRIS. Based on real data from over 40 million Wikipedia articles in various languages, we performed preliminary quality analysis of the metadata of scientific publications using a data quality tool. So far, no data quality measurements have been programmed with Python to assess the quality of metadata from scientific publications in Wikipedia and CRIS. With this in mind, we programmed the methods and algorithms as code, but presented it in the form of pseudocode in this paper to measure the quality related to objective data quality dimensions such as completeness, correctness, consistency, and timeliness. This was prepared as a macro service so that the users can use the measurement results with the program code to make a statement about their scientific publications metadata so that the management can rely on high-quality data when making decisions.<\/jats:p>","DOI":"10.3390\/a13050107","type":"journal-article","created":{"date-parts":[[2020,4,27]],"date-time":"2020-04-27T08:15:29Z","timestamp":1587975329000},"page":"107","source":"Crossref","is-referenced-by-count":6,"title":["How to Inspect and Measure Data Quality about Scientific Publications: Use Case of Wikipedia and CRIS Databases"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5225-389X","authenticated-orcid":false,"given":"Otmane","family":"Azeroual","sequence":"first","affiliation":[{"name":"German Centre for Higher Education Research and Science Studies (DZHW), 10117 Berlin, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0163-5492","authenticated-orcid":false,"given":"W\u0142odzimierz","family":"Lewoniewski","sequence":"additional","affiliation":[{"name":"Department of Information Systems, Pozna\u0144 University of Economics and Business, 61-875 Pozna\u0144, Poland"}]}],"member":"1968","published-online":{"date-parts":[[2020,4,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1080\/02681102.2019.1596654","article-title":"The role of information and communication technologies in socioeconomic development: Towards a multi-dimensional framework","volume":"25","author":"Roztocki","year":"2019","journal-title":"Inform. Tech. Dev."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1257\/aer.p20161058","article-title":"International Data on Measuring Management Practices","volume":"106","author":"Bloom","year":"2016","journal-title":"Am. Econ. Rev."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1271","DOI":"10.1007\/s11192-018-2735-5","article-title":"Data measurement in research information systems: Metrics for the evaluation of data quality","volume":"115","author":"Azeroual","year":"2018","journal-title":"Scientometrics"},{"key":"ref_4","first-page":"262","article-title":"Data Integration under Integrity Constraints","volume":"Volume 2348","author":"Pidduck","year":"2002","journal-title":"Advanced Information Systems Engineering. CAiSE 2002"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/j.ijinfomgt.2018.02.007","article-title":"Analyzing data quality issues in research information systems via data profiling","volume":"41","author":"Azeroual","year":"2018","journal-title":"Int. J. Inform. Manag."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lewoniewski, W., W\u0119cel, K., and Abramowicz, W. (2017). Relative quality and popularity evaluation of multilingual Wikipedia articles. Informatics, 4.","DOI":"10.20944\/preprints201709.0130.v1"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.nedt.2010.05.004","article-title":"Wikipedia as an evidence source for nursing and healthcare students","volume":"31","author":"Haigh","year":"2011","journal-title":"Nurse Educ. Today"},{"key":"ref_8","first-page":"561","article-title":"Analysis of references across Wikipedia languages","volume":"Volume 756","year":"2017","journal-title":"Information and Software Technologies. ICIST 2017"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Nielsen, F.\u00c5. (2007). Scientific citations in Wikipedia. arXiv, Available online: https:\/\/arxiv.org\/pdf\/0705.2106.pdf.","DOI":"10.5210\/fm.v12i8.1997"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1007\/978-3-540-73257-0_49","article-title":"The hidden order of Wikipedia","volume":"Volume 4564","author":"Schuler","year":"2007","journal-title":"Online Communities and Social Computing. OCSC 2007"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"715","DOI":"10.1002\/asi.21304","article-title":"Improving Wikipedia\u2019s credibility: References and citations in a sample of history articles","volume":"61","author":"Luyt","year":"2010","journal-title":"J. Am. Soc. Inf. Sci. Tec."},{"key":"ref_12","unstructured":"English Wikipedia (2019, November 15). Wikipedia: Verifiability. Available online: https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Verifiability."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2003","DOI":"10.1002\/asi.23309","article-title":"Do \u201caltmetrics\u201d correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective","volume":"66","author":"Costas","year":"2015","journal-title":"J. Am. Soc. Inf. Sci. Tec."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"63","DOI":"10.3163\/1536-5050.103.1.019","article-title":"PlumX","volume":"103","author":"Champieux","year":"2015","journal-title":"J. Med. Libr. Assoc."},{"key":"ref_15","first-page":"139","article-title":"Application of SEO metrics to determine the quality of Wikipedia articles and their sources","volume":"Volume 920","year":"2018","journal-title":"Information and Software Technologies. ICIST 2018"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"91","DOI":"10.3103\/S0005105518020073","article-title":"Library Sites as Seen through the Lens of Web Analytics","volume":"52","author":"Redkina","year":"2018","journal-title":"Automat. Doc. Math. Ling."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ford, H., Sen, S., Musicant, D.R., and Miller, N. (2013, January 5\u20137). Getting to the source: Where does Wikipedia get its information from?. Proceedings of the 9th International Symposium on oPen Collaboration, Hong Kong, China.","DOI":"10.1145\/2491055.2491064"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2116","DOI":"10.1002\/asi.23687","article-title":"Amplifying the impact of open access: Wikipedia and the diffusion of science","volume":"68","author":"Teplitskiy","year":"2017","journal-title":"J. Am. Soc. Inf. Sci. Tec."},{"key":"ref_19","first-page":"374","article-title":"Exploring the use of social media to measure journal article impact","volume":"2011","author":"Evans","year":"2011","journal-title":"AMIA Annu. Symp. Proc."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Shuai, X., Jiang, Z., Liu, X., and Bollen, J. (2013, January 22\u201326). A comparative study of academic and Wikipedia ranking. Proceedings of the 13th ACM\/IEEE-CS Joint Conference on Digital libraries, Indianapolis, IN, USA.","DOI":"10.1145\/2467696.2467746"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"e11429","DOI":"10.2196\/11429","article-title":"The Most Influential Medical Journals According to Wikipedia: Quantitative Analysis","volume":"21","author":"Jemielniak","year":"2019","journal-title":"J. Med. Internet. Res."},{"key":"ref_22","unstructured":"English Wikipedia (2019, November 15). Help: Citation Tools. Available online: https:\/\/en.wikipedia.org\/wiki\/Help:Citation_tools."},{"key":"ref_23","first-page":"619","article-title":"Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia","volume":"Volume 339","author":"Abramowicz","year":"2018","journal-title":"Business Information Systems Workshops. BIS 2018"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Warncke-Wang, M., Cosley, D., and Riedl, J. (2013, January 5\u20137). Tell me more: An actionable quality model for Wikipedia. Proceedings of the 9th International Symposium on Open Collaboration, Hong Kong, China.","DOI":"10.1145\/2491055.2491063"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lewoniewski, W., W\u0119cel, K., and Abramowicz, W. (2016, January 13\u201316). Quality and importance of Wikipedia articles in different languages. Proceedings of the International Conference on Information and Software Technologies, Druskininkai, Lithuania.","DOI":"10.1007\/978-3-319-46254-7_50"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lewoniewski, W., W\u0119cel, K., and Abramowicz, W. (2019). Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics. Computers, 8.","DOI":"10.20944\/preprints201905.0144.v1"},{"key":"ref_27","first-page":"82","article-title":"Improving the data quality in the research information systems","volume":"15","author":"Azeroual","year":"2017","journal-title":"Int. J. Comput. Sci. Inf. Secur."},{"key":"ref_28","first-page":"12","article-title":"Data quality measures and data cleansing for research information systems","volume":"16","author":"Azeroual","year":"2018","journal-title":"J. Digit. Inform. Manag."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Azeroual, O., Saake, G., and Abuosba, M. (2019). ETL Best Practices for Data Quality Checks in RIS Databases. Informatics, 6.","DOI":"10.3390\/informatics6010010"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Azeroual, O., and Sch\u00f6pfel, J. (2019). Quality issues of CRIS data: An exploratory investigation with universities from twelve countries. Publications, 7.","DOI":"10.3390\/publications7010014"},{"key":"ref_31","first-page":"337","article-title":"Quality of Research Information in RIS Databases: A Multidimensional Approach","volume":"Volume 353","author":"Abramowicz","year":"2019","journal-title":"Business Information Systems. BIS 2019"},{"key":"ref_32","unstructured":"Crossref (2019, November 23). Main Page. Available online: https:\/\/www.crossref.org\/."},{"key":"ref_33","unstructured":"English Wikipedia (2019, December 02). Template: Cite Book. Available online: https:\/\/en.wikipedia.org\/wiki\/Template:Cite_book."},{"key":"ref_34","unstructured":"German Wikipedia (2019, December 02). Vorlage: Literatur. Available online: https:\/\/de.wikipedia.org\/wiki\/Vorlage:Literatur."},{"key":"ref_35","unstructured":"Data.Lewoniewski.info (2019, November 15). The Most Popular Parameters in Wikipedia Citation Templates Related to Scientific Publications. Available online: http:\/\/data.lewoniewski.info\/bis2020\/."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1080\/07421222.1996.11518099","article-title":"Beyond accuracy: What data quality means to data consumers","volume":"12","author":"Wang","year":"1996","journal-title":"J. Manag. Inform. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Batini, C., and Scannapieco, M. (2016). Data and Information Quality: Dimensions, Principles and Techniques, Springer.","DOI":"10.1007\/978-3-319-24106-7"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/5\/107\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,27]],"date-time":"2024-06-27T07:55:58Z","timestamp":1719474958000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/5\/107"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,26]]},"references-count":37,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2020,5]]}},"alternative-id":["a13050107"],"URL":"https:\/\/doi.org\/10.3390\/a13050107","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2020,4,26]]}}}