Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms: A Scoping Review - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Nov 1;157(11):1362-1369.
doi: 10.1001/jamadermatol.2021.3129.

Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms: A Scoping Review

Affiliations
Review

Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms: A Scoping Review

Roxana Daneshjou et al. JAMA Dermatol. .

Abstract

Importance: Clinical artificial intelligence (AI) algorithms have the potential to improve clinical care, but fair, generalizable algorithms depend on the clinical data on which they are trained and tested.

Objective: To assess whether data sets used for training diagnostic AI algorithms addressing skin disease are adequately described and to identify potential sources of bias in these data sets.

Data sources: In this scoping review, PubMed was used to search for peer-reviewed research articles published between January 1, 2015, and November 1, 2020, with the following paired search terms: deep learning and dermatology, artificial intelligence and dermatology, deep learning and dermatologist, and artificial intelligence and dermatologist.

Study selection: Studies that developed or tested an existing deep learning algorithm for triage, diagnosis, or monitoring using clinical or dermoscopic images of skin disease were selected, and the articles were independently reviewed by 2 investigators to verify that they met selection criteria.

Consensus process: Data set audit criteria were determined by consensus of all authors after reviewing existing literature to highlight data set transparency and sources of bias.

Results: A total of 70 unique studies were included. Among these studies, 1 065 291 images were used to develop or test AI algorithms, of which only 257 372 (24.2%) were publicly available. Only 14 studies (20.0%) included descriptions of patient ethnicity or race in at least 1 data set used. Only 7 studies (10.0%) included any information about skin tone in at least 1 data set used. Thirty-six of the 56 studies developing new AI algorithms for cutaneous malignant neoplasms (64.3%) met the gold standard criteria for disease labeling. Public data sets were cited more often than private data sets, suggesting that public data sets contribute more to new development and benchmarks.

Conclusions and relevance: This scoping review identified 3 issues in data sets that are used to develop and test clinical AI algorithms for skin disease that should be addressed before clinical translation: (1) sparsity of data set characterization and lack of transparency, (2) nonstandard and unverified disease labels, and (3) inability to fully assess patient diversity used for algorithm development and testing.

PubMed Disclaimer

Figures

Figure.
Figure.
Overview of Data Sets and Studies. Squares represent studies; circles, data sets; and arrows, use of a data set. The number of images in a given data set is represented by the size of the circle. Private data sets are often only connected to 1 study, whereas public data sets help generate multiple studies. A mapping of the corresponding data sets and studies is provided in the eFigure in the Supplement.

Comment in

Similar articles

Cited by

References

    1. Daneshjou R, He B, Ouyang D, Zou JY. How to evaluate deep learning for cancer diagnostics—factors and recommendations. Biochim Biophys Acta Rev Cancer. 2021;1875(2): 188515. doi:10.1016/j.bbcan.2021.188515 - DOI - PMC - PubMed
    1. Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med. 2021;27(4):582–584. doi:10.1038/s41591-021-01312-x - DOI - PubMed
    1. Holland S, Hosny A, Newman S, Joseph J, Chmielinski K. The dataset nutrition label: a framework to drive higher data quality standards. arXiv. Preprint posted online May 9, 2018. 1805.03677.
    1. Gebru T, Morgenstern J, Vecchione B, et al. Datasheets for datasets. arXiv. Preprint posted online March 19, 2020. 1803.09010.
    1. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2009:248–255.

Publication types