{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,3]],"date-time":"2024-09-03T17:45:58Z","timestamp":1725385558558},"reference-count":34,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,8,25]],"date-time":"2022-08-25T00:00:00Z","timestamp":1661385600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Systems"],"abstract":"Bookkeeping data free of fraud and errors are a cornerstone of legitimate business operations. The highly complex and laborious work of financial auditors calls for finding new solutions and algorithms to ensure the correctness of financial statements. Both supervised and unsupervised machine learning (ML) techniques nowadays are being successfully applied to detect fraud and anomalies in data. In accounting, it is a long-established problem to detect financial misstatements deemed anomalous in general ledger (GL) data. Currently, widely used techniques such as random sampling and manual assessment of bookkeeping rules become challenging and unreliable due to increasing data volumes and unknown fraudulent patterns. To address the sampling risk and financial audit inefficiency, we applied seven supervised ML techniques inclusive of deep learning and two unsupervised ML techniques such as isolation forest and autoencoders. We trained and evaluated our models on a real-life GL dataset and used data vectorization to resolve journal entry size variability. The evaluation results showed that the best trained supervised and unsupervised models have high potential in detecting predefined anomaly types as well as in efficiently sampling data to discern higher-risk journal entries. Based on our findings, we discussed possible practical implications of the resulting solutions in the accounting and auditing contexts.<\/jats:p>","DOI":"10.3390\/systems10050130","type":"journal-article","created":{"date-parts":[[2022,8,31]],"date-time":"2022-08-31T06:09:36Z","timestamp":1661926176000},"page":"130","source":"Crossref","is-referenced-by-count":20,"title":["Detecting Anomalies in Financial Data Using Machine Learning Algorithms"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-7212-9573","authenticated-orcid":false,"given":"Alexander","family":"Bakumenko","sequence":"first","affiliation":[{"name":"Department of Computer Science, Electrical and Space Engineering, Lule\u00e5 University of Technology, SE 971 87 Lule\u00e5, Sweden"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-4250-4752","authenticated-orcid":false,"given":"Ahmed","family":"Elragal","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Electrical and Space Engineering, Lule\u00e5 University of Technology, SE 971 87 Lule\u00e5, Sweden"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Baesens, B., Van Vlasselaer, V., and Verbeke, W. (2015). Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection, Wiley.","DOI":"10.1002\/9781119146841"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zemankova, A. (2019, January 8\u201310). Artificial Intelligence in Audit and Accounting: Development, Current Trends, Opportunities and Threats-Literature Review. Proceedings of the 2019 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO), Athens, Greece.","DOI":"10.1109\/ICCAIRO47923.2019.00031"},{"key":"ref_3","first-page":"1","article-title":"Unsupervised anomaly detection for internal auditing: Literature review and research agenda","volume":"21","author":"Nonnenmacher","year":"2021","journal-title":"Int. J. Digit. Account. Res."},{"key":"ref_4","unstructured":"IFAC (2022, April 18). International Standards on Auditing 240, The Auditor\u2019s Responsibilities Relating to Fraud in an Audit of Financial Statements. Available online: https:\/\/www.ifac.org\/system\/files\/downloads\/a012-2010-iaasb-handbook-isa-240.pdf."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Singleton, T.W., and Singleton, A.J. (2010). Fraud Auditing and Forensic Accounting, Wiley. [4th ed.].","DOI":"10.1002\/9781118269183"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/j.accinf.2016.12.004","article-title":"Data mining applications in accounting: A review of the literature and organizing framework","volume":"24","author":"Amani","year":"2017","journal-title":"Int. J. Account. Inf. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lahann, J., Scheid, M., and Fettke, P. (2019, January 15\u201317). Utilizing Machine Learning Techniques to Reveal VAT Compliance Violations in Accounting Data. Proceedings of the 2019 IEEE 21st Conference on Business Informatics (CBI), Moscow, Russia.","DOI":"10.1109\/CBI.2019.00008"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Becirovic, S., Zunic, E., and Donko, D. (2020, January 18\u201320). A Case Study of Cluster-based and Histogram-based Multivariate Anomaly Detection Approach in General Ledgers. Proceedings of the 2020 19th International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, Bosnia and Herzegovina.","DOI":"10.1109\/INFOTEH48170.2020.9066333"},{"key":"ref_9","unstructured":"EY (2022, April 22). How an AI Application Can Help Auditors Detect Fraud. Available online: https:\/\/www.ey.com\/en_gl\/better-begins-with-you\/how-an-ai-application-can-help-auditors-detect-fraud."},{"key":"ref_10","unstructured":"PwC (2022, April 22). GL.ai, PwC\u2019s Anomaly Detection for the General Ledger. Available online: https:\/\/www.pwc.com\/m1\/en\/events\/socpa-2020\/documents\/gl-ai-brochure.pdf."},{"key":"ref_11","unstructured":"Schreyer, M., Sattarov, T., Schulze, C., Reimer, B., and Borth, D. (2019, January 5). Detection of Accounting Anomalies in the Latent Space using Adversarial Autoencoder Neural Networks. Proceedings of the 2nd KDD Workshop on Anomaly Detection in Finance, Anchorage, AK, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Schultz, M., and Tropmann-Frick, M. (2020, January 7\u201310). Autoencoder Neural Networks versus External Auditors: Detecting Unusual Journal Entries in Financial Statement Audits. Proceedings of the 53rd Hawaii International Conference on System Sciences, Maui, HI, USA.","DOI":"10.24251\/HICSS.2020.666"},{"key":"ref_13","first-page":"55","article-title":"Journal entries with deep learning model","volume":"6","author":"Budimir","year":"2018","journal-title":"Int. J. Adv. Comput. Eng. Netw. IJACEN"},{"key":"ref_14","first-page":"19","article-title":"Types of machine learning algorithms. New advances in machine learning","volume":"3","author":"Ayodele","year":"2010","journal-title":"New Adv. Mach. Learn."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1007\/s42979-021-00592-x","article-title":"Machine Learning: Algorithms, Real-World Applications and Research Directions","volume":"2","author":"Sarker","year":"2021","journal-title":"SN Comput. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"e267","DOI":"10.7717\/peerj-cs.267","article-title":"Adaptations of data mining methodologies: A systematic literature review","volume":"6","author":"Plotnikova","year":"2020","journal-title":"PeerJ Comput. Sci."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Foroughi, F., and Luksch, P. (2018). Data Science Methodology for Cybersecurity Projects. Comput. Sci. Inf. Technol., 01\u201314.","DOI":"10.5121\/csit.2018.80401"},{"key":"ref_18","unstructured":"Azevedo, A., and Santos, M. (2008, January 24\u201326). KDD, semma and CRISP-DM: A parallel overview. Proceedings of the IADIS European Conference on Data Mining, Amsterdam, The Netherlands."},{"key":"ref_19","unstructured":"Microsoft (2022, May 23). What Is the Team Data Science Process?. Available online: https:\/\/docs.microsoft.com\/en-us\/azure\/architecture\/data-science-process\/overview."},{"key":"ref_20","unstructured":"BAS (2022, April 12). General Information about the Accounting Plan. Available online: https:\/\/www.bas.se\/english\/general-information-about-the-accounting-plan."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/j.procs.2019.12.111","article-title":"Data dimensional reduction and principal components analysis","volume":"163","author":"Salem","year":"2019","journal-title":"Procedia Comput. Sci."},{"key":"ref_22","unstructured":"Databrics (2022, April 26). How (Not) to Tune Your Model with Hyperopt. Available online: https:\/\/databricks.com\/blog\/2021\/04\/15\/how-not-to-tune-your-model-with-hyperopt.html."},{"key":"ref_23","unstructured":"Gholamy, A., Kreinovich, V., and Kosheleva, O. (2022, April 19). Why 70\/30 or 80\/20 Relation between Training and Testing Sets: A Pedagogical Explanation. Available online: https:\/\/scholarworks.utep.edu\/cs_techrep\/1209."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1080\/00220670209598786","article-title":"An Introduction to Logistic Regression Analysis and Reporting","volume":"96","author":"Peng","year":"2002","journal-title":"J. Educ. Res."},{"key":"ref_25","first-page":"249","article-title":"Support Vector Machines: Theory and Applications","volume":"2049","author":"Evgeniou","year":"2001","journal-title":"Mach. Learn. Its Appl. Adv. Lect."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"20","DOI":"10.38094\/jastt20165","article-title":"Classification Based on Decision Tree Algorithm for Machine Learning","volume":"2","author":"Jijo","year":"2021","journal-title":"J. Appl. Sci. Technol. Trends"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_28","unstructured":"Cunningham, P., and Delany, S.J. (2020). k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples). arXiv."},{"key":"ref_29","unstructured":"Rish, I. (2001, January 4\u201310). An Empirical Study of the Na\u00efve Bayes Classifier. Proceedings of the IJCAI 2001 Work Empir Methods Artif Intell, Seattle, WA, USA."},{"key":"ref_30","first-page":"13","article-title":"Artificial Neural Network Systems","volume":"21","author":"Dastres","year":"2021","journal-title":"Int. J. Imaging Robot."},{"key":"ref_31","unstructured":"Liu, F.T., Ting, K.M., and Zhou, Z.H. (2019, January 8\u201311). Isolation Forest. Proceedings of the ICDM \u201908, Eighth IEEE International Conference on Data Mining, Beijing, China."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"48","DOI":"10.4236\/jcc.2021.98004","article-title":"Improved Isolation Forest Algorithm for Anomaly Test Data Detection","volume":"9","author":"Xu","year":"2021","journal-title":"J. Comput. Commun."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3439950","article-title":"Deep Learning for Anomaly Detection: A Review","volume":"54","author":"Pang","year":"2020","journal-title":"ACM Comput. Surv."},{"key":"ref_34","unstructured":"Bank, D., Koenigstein, N., and Giryes, R. (2020). Autoencoders. arXiv."}],"container-title":["Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-8954\/10\/5\/130\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,4]],"date-time":"2024-08-04T12:38:46Z","timestamp":1722775126000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-8954\/10\/5\/130"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,25]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["systems10050130"],"URL":"https:\/\/doi.org\/10.3390\/systems10050130","relation":{},"ISSN":["2079-8954"],"issn-type":[{"type":"electronic","value":"2079-8954"}],"subject":[],"published":{"date-parts":[[2022,8,25]]}}}