Abstract
Health insurance fraud accounts for 3–10% of total medical expenditures every year. If the growth of fraud activities is allowed, it will cause irreversible consequences to the medical system. However, medical-related data is too large and complex, and it is difficult to process such a large amount of data with traditional statistical methods. Therefore, machine learning algorithms have become one of important solutions. When faced with different data, whether the learning method can maintain its stability and give a more appropriate answer is a big question. Many related studies focused on medical insurance fraud and assessment, but few studies attempts to discover the important factors of medical fraud, and find optimal machines learning method. Therefore, this study used two unpublished datasets that might discover novel knowledge, and four machine learning methods, including Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF) and Multilayer Perceptron (MLP) to find the best machine learning method that can effectively detect medical fraud. From results of DT, we also extracted 19 crucial characteristics of medical insurance fraud, and grouped them into 4 categories, which are medical service providers, applied insurance claims amount, Healthcare Common Procedure Coding System (HCPCS), and beneficiary. Results of experiments could provide valuable suggestions for insurance management to establish an automatic audit mechanism to eliminate medical frauds.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Data available on request.
References
Almhaithawi D, Jafar A, Aljnidi M (2020) Example-dependent cost-sensitive credit cards fraud detection using SMOTE and Bayes minimum risk. SN Appl Sci 2(9):1–12
Askari SMS, Hussain MA (2020) IFDTC4.5: intuitionistic fuzzy logic-based decision tree for E-transactional fraud detection. J Inf Secur Appl 52:1–13
Bach MP, Dumičić K, Žmuk B, Ćurlin T, Zoroja J (2018) “Internal fraud in a project-based organization: CHAID decision tree analysis. Procedia Comput Sci 138:680–687
Bauder RA and Khoshgoftaar TM (2018) The detection of medicare fraud using machine learning methods with excluded provider labels. In: The Thirty-First International Florida Artificial Intelligence Research Society Conference, pp 404–409
Cao H and Zhang R (2019) Using PCA to improve the detection of medical insurance fraud in SOFM Neural Networks. In: 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences. Association for Computing Machinery, New York, NY, USA, pp 117–122
Chang J-R, Chen L-S, Lin L-W (2021) A novel cluster based over-sampling approach for classifying imbalanced sentiment data. IAENG Int J Comput Sci 48(4):1118–1128
Cms.gov (2020) Retrieved from https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/medicare-Provider-Charge-Data/Part-D-Prescriber, (2020.5.30)
Da Rosa RC (2018) An evaluation of unsupervised machine learning algorithms for detecting fraud and abuse in the U.S. Medicare Insurance Program. Master Thesis, The College of Engineering and Computer Science, Florida Atlantic University
Danaa AAA, Daabo MI, Abdul-Barik A (2021) Detecting electronic banking fraud on highly imbalanced data using hidden Markov models. Earthline J Math Sci 7(2):315–332
Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. J Big Data 6(1):1–25
Dou Y and Xiong H (2017) Research on recognition of medical insurance fraud based on modified support vector machine. In: 2017 International Conference on Computer Technology, Electronics and Communication, Dalian, China, pp 1021–1025
Ekin T, Ieva F, Ruggeri F, Soyer R (2018) Statistical medical fraud assessment: exposition to an emerging field. Int Stat Rev. https://doi.org/10.1111/insr.12269
Ekin T, Lakomski G, Musal RM (2019) An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat Anal Data Min. https://doi.org/10.1002/sam.11408
Genuer R (2021) Contributions to Random forests methods for several data analysis problems (Doctoral dissertation, Université de Bordeaux)
Greco C, Pace P, Basagni S, Fortino G (2021) Jamming detection at the edge of drone networks using multi-layer perceptrons and decision trees. Appl Soft Comput 111:107806
Gupta RY, Mudigonda SS, Baruah PK & Kandala PK (2021) Implementation of correlation and regression models for health insurance fraud in Covid-19 environment using actuarial and data science techniques. arXiv preprint arXiv:2102.04210
Gyamfi NK and Abdulai J (2018) Bank Fraud detection using support vector machine. In: 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference, Vancouver, BC, pp 37–41
Hamad K, Khalil MA, Shanableh A (2017) Modeling roadway traffic noise in a hot climate using artificial neural networks. Transp Res Part D 53:161–177
Health care fraud (2020) Retrieved from https://www.fbi.gov/investigate/whitecollar-crime/health-care-fraud. (2020.12.30)
Heidari AA, Faris H, Mirjalili S, Aljarah I, Mafarja M (2020) Ant lion optimizer: theory, literature review, and application in multi-layer perceptron neural networks. Nat-Inspired Optimiz. https://doi.org/10.1007/978-3-030-12127-3_3
Herland M, Bauder RA, Khoshgoftaar TM (2019) The effects of class rarity on the evaluation of supervised healthcare fraud detection models. J Big Data 6(21):1–33
https://www.justice.gov/guidance (2020.12.10)
https://www.kaggle.com/rohitrox/healthcare-provider-fraud-detection-analysis
https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge -Data/Part-D-Prescriber
Ismail A, Shehab A, El-Henawy IM (2019) Healthcare analysis in smart big data analytics: reviews, challenges and recommendations. In: Hassanien A, Elhoseny M, Ahmed S, Singh A (eds) Security in smart cities: models, applications, and challenges. Lecture Notes in Intelligent Transportation and Infrastructure. Springer, Cham, pp 27–45
Itani S, Lecron F, Fortemps P (2019) Specifics of medical data mining for diagnosis aid: a survey. Expert Syst Appl 118:300–314
Kataria S and Nafis MT (2019) Internet banking fraud detection using deep learning based on decision tree and multilayer perceptron. In: 2019 6th International Conference on Computing for Sustainable Global Development, New Delhi, India, pp 1298–1302
Kumar MS, Soundarya V, Kavitha S, Keerthika ES and Aswini E (2019) Credit card fraud detection using random forest algorithm. In: 2019 3rd International Conference on Computing and Communications Technologies, Chennai, India, pp 149–153
Lee J, Shin H, Cho S (2020) A medical treatment-based scoring model to detect abusive institutions. J Biomed Inform 107:1–12
Li Y, Yan C, Liu W, Li M (2018) A principle component analysis-based random forest with the potential nearest neighbor method for automobile insurance fraud identification. Appl Soft Comput 70:1000–1009
Liang J, Zheng X, Chen Z, Dai S, Xu J, Ye H, Lei J (2019) The experience and challenges of healthcare-reform-driven medical consortia and Regional Health Information Technologies in China: a longitudinal study. Int J Med Inform 131:103954
Mackey TK, Miyachi K, Fung D, Qian S, Short J (2020) Combating health care fraud and abuse: Conceptualization and prototyping study of a blockchain antifraud framework. J Med Internet Res 22(9):e18623
Medicare Fraud Strike Force (2021) Office of inspector general, Retrieved from https://www.oig.hhs.gov/fraud/strike-force/, (2021.3.21)
Nguyen TT, Tahir H, Abdelrazek M & Babar A (2020). Deep learning methods for credit card fraud detection. arXiv preprint arXiv:2012.03754
Ostad-Ali-Askari K, Shayannejad M, Hossein Ghorbanizadeh-Kharazi H (2017) Artificial neural network for modeling Nitrate pollution of groundwater in marginal area of Zayandeh-rood river, Isfahan, Iran. KSCE J Civ Eng 21(1):134–140
Pan SS, Zhang WJ (2017) Fraudulent medical behavior detection based on hybrid approach. J East China Normal Univ (natural Science) 2017:125–137
Pandey P, Saroliya A, Kumar R (2018) Analyses and detection of health insurance fraud using data mining and predictive modeling techniques. In: Pant M, Ray K, Sharma TK, Rawat S, Bandyopadhyay A (eds) Soft computing: theories and applications. Springer, Singapore, pp 41–49
Parnian K, Sorouri F, Souha AN, Molazadeh A, Mahdavi S (2021) Fraud detection in health insurance using a combination of feature subset selection based on squirrel optimization algorithm and nearest neighbors algorithm methods. Future Gener Distrib Syst J 3(2):1–11
Qu Y, Fan M, Zhang X, Ji W (2019) Analysis of smart health research context and development trend driven by big data. In: Chen H, Zeng D, Yan X, Xing C (eds) International conference on smart health. Springer, Cham, pp 142–154
Roy AG, Urolagin S (2019) Credit risk assessment using decision tree and support vector machine based data analytics. In: Mateev M, Poutziouris P (eds) Creative business and social innovations for a sustainable future. Springer International Publishing, Cham, pp 79–84
Saldamli G, Reddy V, Bojja KS, Gururaja MK, Doddaveerappa Y & Tawalbeh L (2020) Health care insurance fraud detection using blockchain. In: 2020 Seventh International Conference on Software Defined Systems (SDS) (pp 145–152). IEEE
Salem A, Sleit A, Sharieh AA-A, Jabri R (2019) Enhanced authentication system performance based on keystroke dynamics using classification algorithms. KSII Trans Internet Inf Syst 13(8):4076–4092
Sun C, Yan Z, Li Q, Zheng Y, Lu X, Cui L (2019) Abnormal Group-Based Joint Medical Fraud Detection. IEEE Access 7:13589–13596
Tanwar S, Parekh K, Evans R (2020) Blockchain-based electronic healthcare record system for healthcare 4.0 applications. J Inf Secur Appl 50:102407
Tike A and Tavarageri S (2017) A medical price prediction system using hierarchical decision trees. In: 2017 IEEE International Conference on Big Data, Boston, MA, pp 3904–3913
Wang Z, Yang J, Dai M, Xu R, Liang X (2019) A method of detecting webshell based on multi-layer perception. Acad J Comput Inf Sci 2(1):81–91
Wijenayake S, Graham T, Christen P (2018) A decision tree approach to predicting recidivism in domestic violence. In: Ganji M, Rashidi L, Fung BCM, Wang C (eds) Trends and applications in knowledge discovery and data mining. Springer International Publishing, pp 3–15
Xuan S, Liu G, Li Z, Zheng L, Wang S and Jiang C (2018) Random forest for credit card fraud detection. In: 2018 IEEE 15th International Conference on Networking, Sensing and Control, Zhuhai, pp 1–6
Yang J, Li Y, Liu Q, Li L, Feng A, Wang T et al (2020) Brief introduction of medical database and data mining technology in big data era. J Evid-Based Med 13(1):57–69
Yao J, Zhang J & Wang L (2018) A financial statement fraud detection model based on hybrid data mining methods. In: 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp 57–61). IEEE
Yekkala I, Dixit S (2018) Prediction of heart disease using random forest and rough set-based feature selection. Int J Big Data Anal Healthcare (IJBDAH) 3(1):1–12
Zhang Y, Chi G, Zhipeng Zhang Z (2018) Decision tree for credit scoring and discovery of significant features: anempirical analysis based on Chinese microfinance for farmers. Filomat 32(5):1513–1521
Zhang C, Xiao X, Wu C (2020) Medical Fraud and Abuse detection system based on machine learning. Int J Environ Res Public Health 17(19):7265
Zhang W and He X (2017) An anomaly detection method for medicare fraud detection. In: 2017 IEEE International Conference on Big Knowledge, Hefei, pp 309–314
Acknowledgements
This work was supported in part by National Science and Technology Council, Taiwan (Grant No. MOST 111-2410-H-324-006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nalluri, V., Chang, JR., Chen, LS. et al. Building prediction models and discovering important factors of health insurance fraud using machine learning methods. J Ambient Intell Human Comput 14, 9607–9619 (2023). https://doi.org/10.1007/s12652-023-04633-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-023-04633-6