Abstract
Recently, healthcare organizations getting engross in digitizing the health insurance system. Besides its undeniable benefits, the risk of exaggerating a claim or entirely fabricating one by providers is increasing. Provider profiling aids in outlier false claims by measure the performance of providers and outcomes of healthcare. Hence provider profiling has become an interesting research topic in the health insurance system. However, most of the existing provider profiling approaches are encountering the problem of intermediate results due to class overlappings. Another problem encounter in developing or validating an automated fraud detection model is the availability of labeled data. The manual labeling of huge claims data by medical experts is always not feasible. Hence, it is essential to automate the process of fraud detection which was not focused on by the researchers who are developing healthcare fraud detection models. There is one existing approach to automate the labeling of health insurance claims which considers the provider’s unique identification number as a reference while one-to-one mapping with real-world fraudulent claims. However, the approach is encountering a problem of missing values in providers’ identification numbers, causing poor performance in healthcare fraud detection models. In this study, we have proposed a Weighted MultiTree approach to mitigate the aforementioned problems of provider profiling and labeling. MultiTree is a DAG construction in which each node is reachable from any other node without ambiguity. And hence our proposed approach performed provider profiling without intermediate results with less construction cost. And the labeling of claims using unique details set of providers yielded from MultiTree enhanced the detection accuracy of fraudulent claims.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
(data transparency) Data used for this paper is available in the public domain.
Code availability
(software application or custom code) Custom code developed.
References
ACA (2021) The Affordable Care Act and Health Care Fraud. https://weaver.com/blog/affordable-care-act-and-health-care-fraud Accessed 20 Nov 2020
Ashtiani MN, Raahemi B (2021) Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3096799
Bauder RA, Khoshgoftaar TM (2016) A probabilistic programming approach for outlier detection in healthcare claims. In: Proceedings of the 15th IEEE international conference on machine learning and applications, pp 347–354
Bauder RA, Khoshgoftaar TM (2017) Multivariate outlier detection in medicare claims payments applying probabilistic programming methods. J Health Serv Outcomes Res Methodol 17:1–34
Bauder RA, Khoshgoftaar TM, Seliya N (2017) A survey on the state of healthcare upcoding fraud analysis and detection. J Health Serv Outcomes Res Methodol 17:31–55
Bayerstadler A, Dijk LV, Winter F (2016) Bayesian multinomial latent variable modeling for fraud and abuse detection in health insurance. Insur Math Econ 71:244–252
Bekkar M, Djemaa HK, Alitouche TA (2013) Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 3:10
Boutaher N, Elomri A, Abghour N et al (2020) A review of credit card fraud detection using machine learning techniques. In: 5th international conference on cloud computing and artificial intelligence: technologies and applications (CloudTech), pp 1–5
Branting LK, Reeder F, Gold J et al (2016) Graph analytics for healthcare fraud risk estimation. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 845–851
Capelleveen GV, Poel M, Roland MM et al (2016) Outlier detection in healthcare fraud: a case study in the medicaid dental domain. Int J Account Inf Syst 21:18–31
Chandola V, Sukumar SR, Schryver JC (2013) Knowledge discovery from massive healthcare claims data. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1312–1320
Chelladurai U, Pandian S (2021) A novel blockchain based electronic health record automation system for healthcare. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-021-03163-3
CMS (2019) Medicare Physician & Other Practitioners—by Provider and Service https://data.cms.gov/provider-summary-by-type-of-service/medicare-physician-other-practitioners/medicare-physician-other-practitioners-by-provider-and-service. Accessed 10 Nov 2020
Dhieb N, Ghazzai H, Besbes H et al (2020) A secure AI-driven architecture for automated insurance systems: fraud detection and risk measurement. IEEE Access 8:58546–58558
Hancock JT, Khoshgoftaar TM (2021) Gradient boosted decision tree algorithms for medicare fraud detection. SN Comput Sci 2(268):1–12
Haque ME, Tozal ME (2021) Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2021.3051165
Hasselgren A, Kralevska K, Gligoroski D et al (2020) Blockchain in healthcare and health sciences—a scoping review. Int J Med Informatics 134(104040):1–10
HCFG (2021) Challenge of Health Care Fraud. https://healthcarefraudgroup.com/the-challenges-of-health-care-fraud/. Accessed 12 July 2021
HCPCS (2019) Centers for Medicare & Medicaid Services, HCPCS general information. https://www.cms.gov/Medicare/Coding/MedHCPCSGenInfo/index.html. Accessed 20 Jan 2019
He H, Wang J, Graco W et al (1997) Application of neural networks to detection of medical fraud. Expert Syst Appl 13(4):329–336
He H, Hawkins S, Graco W et al (2000) Application of genetic algorithms and k-nearest neighbor method in real world medical fraud detection problem. J Adv Comput Intell Intell Inf 4(2):130–137
Herland M, Khoshgoftaar TM, Bauder RA (2018) Big data fraud detection using multiple medicare data sources. J Big Data 5(29):1–21
Jeni LA, Cohn JF, De La Torre F (2013) Facing imbalanced data–recommendations for the use of performance metrics. In: 2013 Humaine association conference on affective computing and intelligent interaction (ACII), pp 245–251
Jiang Z, Chen X, Dong B et al (2020) Trajectory-based community detection. IEEE Trans Circuits Syst II Express Briefs 67(6):1139–1143
Johnson JM, Khoshgoftaar TM (2019) Medicare fraud detection using neural networks. J Big Data 6(63):1–35
Johnson JM, Khoshgoftaar TM (2021) Medical provider embeddings for healthcare fraud detection. SN Comput Sci 2(276):1–15
Johnson ME, Nagarur N (2015) Multi-stage methodology to detect health insurance claim fraud. Health Care Manag Sci. https://doi.org/10.1007/s10729-015-9317-3
Kosea I, Gokturk M, Kilic K (2015) An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance. Appl Soft Comput 36:283–299
Li J, Huang KY, Jin J et al (2008) A survey on statistical methods for health care fraud detection. Health Care Manag Sci 11:275–287
Lucas Y, Portier P-E, Laporte L et al (2020) Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Futur Gener Comput Syst 102:393–402
Marr B (2015) How big data is changing healthcare. https://www.forbes.com/sites/bernardmarr/2015/04/21/how-big-data-is-changing-healthcare. Accessed 18 June 2020
Matloob I, Khan SA, Rahman HU (2020) Sequence mining and prediction-based healthcare fraud detection methodology. IEEE Access 8:143256–143273
Matthews (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica Et Biophysica Acta (BBA)-Protein Structure 405(2):442–451
McGhin T, Choo K-K, Liu CZ et al (2019) Blockchain in healthcare applications: research challenges and opportunities. J Netw Comput Appl 135:62–75
NHCAA (2010) Combating Health Care Fraud in a Post-Reform World: Seven Guiding Principles for Policymakers. https://www.pcmanet.org/wp-content/uploads/2016/08/pr-dated-05-09-13-whitepaper_oct10.pdf. Accessesed 11 Mar 2020
NPI (2019) Centers for Medicare & Medicaid Services, National Provider Identifier (NPI) standard. https://www.cms.gov/Regulations-and-Guidance/Administrative-Simplification/NationalProvIdentStand/. Accessed 11 Mar 2019
OIG (2019) LEIE downloadable databases https://oig.hhs.gov/exclusions/exclusions_list.asp. Accessed 10 Nov 2019
Ozbayoglu AM, Gudelek MU, Sezer OB (2020) Deep learning for financial applications: a survey. Appl Soft Comput 93(106384):1–29
Sahmoud S, Topcuoglu HR (2020) A general framework based on dynamic multi-objective evolutionary algorithms for handling feature drifts on data streams. Futur Gener Comput Syst 102:42–52
San Miguel Carrasco R, Sicilia-Urbán MÁ (2020) Evaluation of deep neural networks for reduction of credit card fraud alerts. IEEE Access 8:186421–186432
Sasaki Y (2007) The truth of the F-measure. Teach Tutor mater
Shanmugapriya E, Kavitha R (2019) Medical big data analysis: preserving security and privacy with hybrid cloud technology. Soft Comput 23:2585–2596
Shin H, Park H, Lee J et al (2012) A Scoring model to detect abusive billing patterns in health insurance claims. Expert Syst Appl 39(8):7441–7450
Simborg DW (2008) Healthcare fraud: whose problem is it anyway? J Am Med Inform Assoc 15(3):278–280
Viveros MS, Nearhos JP, Rothman MJ (1996) Applying data mining techniques to a health insurance information system. In: Proceedings of the 22nd conference on very large data bases (VLDB), pp 286–294
Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Disc 8(3):275–300
Yang WS, Hwang SY (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31:56–68
Zhang Z, Chen L, Liu Q et al (2020) A fraud detection method for low-frequency transaction. IEEE Access 8:25210–25220
Zhou S, He J, Yang H et al (2020) Big data-driven abnormal behavior detection in healthcare based on association rules. IEEE Access 8:129002–129011
Funding
This work is partially supported by the Scheme for Promotion of Academic and Research Collaboration (SPARC), sponsored by the Ministry of Human Resource Development, Government of India, under the project titled Digital Health Records Storage and Analysis for Healthcare Provisioning of Global Patients: An India-Australia Initiative (1406).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Settipalli, L., Gangadharan, G.R. Provider profiling and labeling of fraudulent health insurance claims using Weighted MultiTree. J Ambient Intell Human Comput 14, 3487–3508 (2023). https://doi.org/10.1007/s12652-021-03481-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03481-6