Abstract
The incorporation of automated computational tools has a great amount of potential to positively influence the field of pathology. However, pathologists and regulatory agencies are reluctant to trust the output of complex models such as Convolutional Neural Networks (CNNs) due to their usual implementation as black-box tools. Increasing the interpretability of quantitative analyses is a critical line of research in order to increase the adoption of modern Machine Learning (ML) pipelines in clinical environments. Towards that goal, we present HistoLens, a Graphical User Interface (GUI) designed to facilitate quantitative assessments of datasets of annotated histological compartments. Additionally, we introduce the use of hand-engineered feature visualizations to highlight regions within each structure that contribute to particular feature values. These feature visualizations can then be paired with feature hierarchy determinations in order to view which regions within an image are significant to a particular sub-group within the dataset. As a use case, we analyzed a dataset of old and young mouse kidney sections with glomeruli annotated. We highlight some of the functional components within HistoLens that allow non-computational experts to efficiently navigate a new dataset as well as allowing for easier transition to downstream computational analyses.
1. INTRODUCTION
Artificial Intelligence (AI) and Machine Learning (ML) have proven themselves to be powerful tools in a wide variety of applications. Their utility is especially pronounced in scenarios where there is a vast amount of heterogeneous data that is difficult for classical models to replicate. High-complexity models such as Convolutional Neural Networks (CNNs) have demonstrated impressive performance in tasks ranging from image classification to sequence modeling. In the growing field of digital pathology, researchers have even developed fully-automated models that rival human pathologists in their speed and performance [1–3]. Despite these improvements, these models have been slow to incorporate into clinical environments. One explanation for the reluctance of pathologists and regulatory agencies to adopt AI-based technologies is the lack of interpretability that inhibits the ability of medical professionals to understand what the decision of the model is based on. Without justification for their decisions, how can doctors be expected to trust a model to assign diagnoses and treatments to their patients?
The lack of transparency in most modern CNNs has been proven to be vulnerable to biased training data as well as imperceptible noise that can negatively influence the capability of image filters to recognize key diagnostic criteria [4–6]. Without the development of post-hoc methods, such as Grad-CAM, the task of diagnosing biased networks would be nearly impossible [7]. Using Grad-CAM, computational researchers are able to trace the classification decisions made by a network back through the convolutional filters of a CNN in order to highlight regions in the input images that are most influential. While this provides valuable insight into what a given network is learning to detect, it does not provide additional information about that area in terms of why it is informative for a particular classification. Learning quantitative distinctions between healthy and diseased samples are where traditional methods involving the extraction of hand-engineered features excel. Hand-engineered features include measures of shape, color, and texture of specific components within a given image. Compared to features that are extracted via a ML model, hand-engineered features require more effort to design and calculate because they have an additional segmentation step. The use of immunohistochemical staining can greatly reduce the difficulty to segment specific cells and tissues in histology images. Designing models that use these features in order to render classification decisions allows for the interpretation of results for both pathologists and computational experts as well as helping to formulate biological hypotheses that examine the influence of specific cells and morphological changes [8–10].
The explanation of individual hand-engineered features may be relatively simple, however, physiological changes that are the result of disease are rarely the result of single features. Interpreting the influences of combinations of hand-engineered features on diagnostic decisions can be performed via linear models or logistic regression, but that is not translatable to how a pathologist makes observations in images. To address this shortcoming and increase the utility of quantitative analyses in digital pathology, we developed HistoLens. Through the many features provided in HistoLens tailored to usage on Whole Slide Images (WSIs), pathologists are provided with novel ways to visualize their data, determine the impact of specific compartments on classification decisions, and integrate their prior knowledge into downstream analyses to better inform ML experiments. The following sections in this paper will illustrate the usage of HistoLens using glomeruli annotated in mouse WSIs stained with Periodic Acid Schiff (PAS).
2. METHODS
File Inputs:
Upon opening the original HistoLens interface, the only button that is enabled is the “File Inputs” button. Pressing this button opens the File Input window from which the user can select the kind of experiment to start. In the upper button group, the user can select whether they are starting from a directory with pre-extracted images, WSIs and annotation files, or a HistoLens Experiment File. Selecting WSIs and annotation files and then selecting the directory containing WSIs and XML annotation files, generated after annotating structures in Aperio Imagescope, will take the user to the interactive compartment segmentation window. A HistoLens Experiment File is generated after an experiment is run and contains file paths to all required components.
Interactive Compartment Segmentation:
The task of segmentation can be a highly computationally intensive task. However, constraining the task according to some assumptions about the contents of histology WSIs allows for the simplification of the procedure using a small set of parameters per sub-compartment. The user can select whether they want to use colorspace transforms, color deconvolution, or a custom segmentation script [11] (Fig. 1). After that, users only have to select a channel from the transformed image and then modify the parameters controlling threshold level, minimum size, and segmentation hierarchy level. Segmentation hierarchy controls the order of segmentation from available pixels in a given structure. A segmentation hierarchy level of one means that this compartment is segmented first from all available regions in the structure boundaries. The second level is segmented from all pixels except those in the first segmentation hierarchy. The last level contains all of the remaining pixels. After selecting the combination of parameters that gets the best segmentation, the user can check the performance on other images and then click the “Done” button to proceed to feature extraction.
Hand-Engineered Feature Set:
The feature visualizations that are introduced in this work are based on a large set of 315 glomerular image features quantifying color, shape, texture, and relative size of sub-compartments including nuclei, luminal spaces, and PAS+ (mesangial) areas designed by Ginley et al [12, 13]. An additional set of 19 glomerular boundary morphology and nuclear distribution features are also included along with their associated visualizations. Nuclear distribution features include graph-based measures derived from Minimum Spanning Trees (MSTs) and Voronoi Diagrams where the centroids of segmented nuclei serve as nodes [14, 15]. During feature extraction, HistoLens iterates through each slide and calculates feature values for each structure in parallel. In-window progress bar and table show progress and once all slides in the slide folder are completed the user is given the option to calculate feature ranks according to provided labels.
Feature Ranking:
After features are extracted for the images, the users are then given the option to perform feature ranking according to feature importance in a classification task. Feature ranking is performed using the combined outputs of several feature selection algorithms, Minimum Redundancy Maximum Relevance (MRMR), Chi-Square tests, and ReliefF (Fig. 2). These algorithms were selected because they consider both the individual and combined influence of each of the hand-engineered features on slide-level classification. Handling of labels in HistoLens is flexible depending on user preference. This can include single or multiple slide-level labels or structure-level labels for more specific sub-classifications. The output of each of these algorithms are summed together so that the most informative features for each type of classification have the lowest combined rank sum.
Generation of Feature Visualizations:
Feature visualizations generated in HistoLens can be separated into four separate groups depending on properties of the specific feature being visualized. Similarity maps are binary masks identifying pixels within the glomerular boundary that have a value within a user-defined range of the feature value describing the entire image. For example, the similarity map of the feature “Luminal Mean Blue Value” would contain a binary mask with all pixels within the luminal space of the glomerulus where the blue intensity is within a user-defined percentage (default of 10%) of the mean blue intensity for all pixels in the luminal space. The remaining pixels outside of this range would appear as black (zero grayscale intensity). Deviation masks, contrastingly, assign different weights to pixels according to how different they are from feature value for the entire image. These include any features that measure the standard deviation of a particular characteristic within a compartment. Pixels within these compartments undergo a folded normal transformation wherein the absolute value of the Gaussian transformed pixel intensity replaces each pixel’s original RGB intensity. The resulting visualization equally weights pixels that are equally different from the value for the entire image. Object specific maps, unlike the previous two types of maps, do not capture individual pixels but instead are meant to visualize the characteristics of connected objects within a compartment. These can include individual nuclei, continuous regions of mesangium, or distinct luminal spaces such as capillary lumens and Bowman’s space. For these object specific maps, inclusion of an object in the final visualization can be based on similarity criteria, where a binary mask would include all objects that have feature values similar to the total image feature, or deviation criteria, where all of the pixels included with an object are weighted according to their deviation from the total image feature. Region specific maps, similarly to object specific maps also work with groups of pixels, however, they are used for visualizing texture features and are therefore not specific to connected objects. A 10×10 sliding window is progressed over an image and the Gray Level Co-occurrence Matrices (GLCMs) are used to calculate the correlation, contrast, energy, and homogeneity within that windows area. Regions that are with a user-defined similarity range are included in the final binary mask for a given texture feature.
3. RESULTS
In the figures below, a use case of HistoLens is presented. For this particular case, the user is interested in comparing distinctive glomerular characteristics between Old and Young mice. Initially, the feature that is selected for visualization and plotting is the total glomerular area (Fig. 3). The user can then change what feature they want to look at by selecting the compartment of interest, the type of feature, and then the specific name of the feature they want to see followed by the “View Specific” button. To view more general categories of features, such as all color properties of Mesangium or all Luminal Space features, the user can select either the “View Compartment Feature” or the “View Feature Types” with their desired compartment and feature type selected. Combining feature visualizations from multiple different categories is facilitated through the “Custom List” list box that enables the user to hand-select features from all 334 provided. To view the feature visualizations ranked according to each features importance in a classification task, the user can select whichever one of the original label categories provided in the feature ranking window they would like to view. The resulting combined visualization uses feature visualizations weighted according to their relative importance in assigning that image to the selected label category and mapped to the Jet colorspace to improve visual contrast. Viewing the image like this allows the user to ask the question, what areas in this image would suggest that this particular structure comes from this group of patients? To further refine the feature visualization process when viewing large combinations of features, the user can move the rectangular Region of Interest (ROI) over a specific area in the image and see which feature visualizations are present in that area and to what magnitude. In this way not only does HistoLens show what areas in an image are informative for a given classification, but it also is able to describe what about those regions is informative in terms of quantitative features.
Also included in HistoLens is the ability to select representative samples based on feature values for direct comparison. In Figure 4, we select the Mesangial Color features and click the “View Feature Types” button in order to plot the first and second Principle Components (PC1 and PC2) in the Feature Distribution Plot. Using the Select Regions tool allows the user to select specific areas in the plot from which to draw samples for side-by-side comparison. At this point, the user can then use the provided notepads to record specific observations in each image or they can copy a single note to all of the images within a specified region. In addition to written notes, the user can also make manual annotations of specific regions of interest within one or both images in the Add Annotations tab (Fig. 5). This human-derived information can then be used later on in more complex computational studies to detect, segment, or otherwise characterize etiologic structures.
4. CONCLUSIONS
HistoLens represents a crucial step in the development of explainable AI systems in digital pathology. Through HistoLens, we are able to use hand-engineered quantitative descriptors of structures extracted from histology WSIs in order to refine the search for biologically relevant abnormalities. The hand-engineered feature visualizations presented in this work not only serve to provide context for a given feature, they also establish an entirely new way to interpret the results of ML-based studies. Similar to how a human observer would justify their own observations, in HistoLens the user is provided not only with the image regions that would lead a classifier to make a particular decision but also a detailed description of what characteristics of those regions are driving that decision.
Currently, our lab is working to develop tools that engage pathologists in a cooperative and not competitive manner through the use of explainable ML pipelines. We will be expanding the generalizability of HistoLens to include multi-compartment studies as well as integrating deep learning networks to model complex biological hypotheses.
ACKNOWLEDGEMENT
This project was supported by NIH-NIDDK grant R01 DK114485 (PS), NIH-OD grant R01 DK114485 03S1 (PS), a glue grant (PS) from the NIH-NIDDK Kidney Precision Medicine Project grant U2C DK114886 (Contact: Dr. Jonathan Himmelfarb), a multi-disciplinary small team grant RSG201047.2 (PS) from the State University of New York, a pilot grant (PS) from the University of Buffalo’s Clinical and Translational Science Institute (CTSI) grant 3UL1TR00141206 S1 (Contact: Dr. Timothy Murphy), a DiaComp Pilot & Feasibility Project 21AU4180 (PS) with support from NIDDK Diabetic Complications Consortium grants U24 DK076169 and U24 DK115255 (Contact: Dr. Richard A. McIndoe), and NIH-OD grant U54 HL145608 (PS).
REFERENCES
- 1.Fenstermaker M, et al. , Development and Validation of a Deep-learning Model to Assist With Renal Cell Carcinoma Histopathologic Interpretation. 2020. 144: p. 152–157. [DOI] [PubMed] [Google Scholar]
- 2.Janowczyk A and Madabhushi A.J.J.o.p.i, Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. 2016. 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jayapandian CP, et al. , Development and evaluation of deep learning–based segmentation of histologic structures in the kidney cortex with multiple histologic stains. 2021. 99(1): p. 86–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Papernot N, et al. The limitations of deep learning in adversarial settings. in 2016 IEEE European symposium on security and privacy (EuroS&P). 2016. IEEE. [Google Scholar]
- 5.Akhtar N and Mian AJIA, Threat of adversarial attacks on deep learning in computer vision: A survey. 2018. 6: p. 14410–14430. [Google Scholar]
- 6.Madry A, et al. , Towards deep learning models resistant to adversarial attacks. 2017. [Google Scholar]
- 7.Selvaraju RR, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. in Proceedings of the IEEE international conference on computer vision. 2017. [Google Scholar]
- 8.McQuin C, et al. , CellProfiler 3.0: Next-generation image processing for biology. 2018. 16(7): p. e2005970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Diao JA, et al. , Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. 2021. 12(1): p. 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yu K-H, et al. , Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. 2016. 7(1): p. 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ruifrok AC and Johnston DA, Quantification of histochemical staining by color deconvolution. Analytical and quantitative cytology and histology, 2001. 23(4): p. 291–299. [PubMed] [Google Scholar]
- 12.Ginley B, et al. , Fully automated classification of glomerular lesions in lupus nephritis. SPIE Medical Imaging. Vol. 11320. 2020: SPIE. [Google Scholar]
- 13.Ginley B, et al. , Computational Segmentation and Classification of Diabetic Glomerulosclerosis.2019. 30(10): p. 1953–1967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Prim RCJTBSTJ, Shortest connection networks and some generalizations. 1957. 36(6): p. 1389–1401. [Google Scholar]
- 15.Voronoi G.J.J.f.d.r.u.a.M., Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Deuxième mémoire. Recherches sur les parallélloèdres primitifs. 1908. 1908(134): p. 198–287. [Google Scholar]