We describe how the Obuchowski-Rockette (OR) method of analysis for multi-reader diagnostic studies can be used to estimate the variability of latent reader-performance outcomes, such as the area under the ROC curve (AUC). For a specific reader the latent or true reader performance outcome can conceptually be thought of as the estimate that would result if the reader were to read a very large number of cases. We note that for the sample sizes used in typical diagnostic studies, the latent reader-performance outcome is equal to the observed outcome minus measurement error. An often-cited study that assesses the variability of various reader-performance outcomes, including the AUC, is the study by Craig Beam et. al., “Variability in the Interpretation of Screening Mammograms by US Radiologists,” published in 1996. However, a problem with this type of study is that the variability estimates includes measurement error. Thus this approach overestimates latent reader variability and gives variability estimates that are dependent on case sample size. The proposed method overcomes these problems. We illustrate the proposed method for 29 radiologists in Jordan, with each reading 60 chest computed tomography (CT) scans. Using the OR method we were able to estimate the middle 95% range for latent AUC values to be 0.07; i.e., we estimate that 95% of radiologists differ by less than 0.07 in their ability to successfully discriminate between a pair of diseased and non-diseased cases. In contrast, the estimate for the 95% range for the observed AUCs was 0.18. Thus we see how conventional methods of describing reader variability can greatly overstate the variability of the true abilities of the readers.
|