Artificial Intelligence Improves the Accuracy in Histologic Classification of Breast Lesions

António Polónia, MD, PhD; Sofia Campelos, MD; Ana Ribeiro, MD; Ierece Aymore, MD; Daniel Pinto, MD; Magdalena Biskup-Fruzynska, MD; Ricardo Santana Veiga, MD; Rita Canas-Marques, MD; Guilherme Aresta, MEng; Teresa Araújo, MEng; Aurélio Campilho, PhD; Scotty Kwok, MSc; Paulo Aguiar, PhD; Catarina Eloy, MD, PhD


Am J Clin Pathol. 2021;155(4):527-536. 

In This Article


Materials and Methods

Characteristics of the training set and algorithms

The training set of part A consisted of 400 microscopic photographs with 2048x1536 pixels and a pixel scale of 0.42μm/pixel. The images were classified as normal, benign, in situ carcinoma and invasive carcinoma, distributed evenly (100 for each class). The training set of part B consisted of 10 WSIs, with a pixel scale of 0.47μm/pixel, containing 247 irregularly shaped (freehand) ROIs with annotations using the same labels as part A. Patches were extracted from each of the photographs in Part A, using a patch size of 1495×1495 pixels and stride of 99 pixels. These patches (5600) were then resized to 299×299 pixels.

The conversion of WSI to patches began with the foreground extraction, making use of the colour characteristics of HE stains to threshold tissue regions. WSI was down sampled and converted from RGB to CIE L*a*b* colour space. All the pixels that were higher than 10% of the mean intensity of the a* channel became the foreground. Patches were extracted from WSIs of the same scale as in Part A. Patches with less than 5% foreground pixels were considered empty and discarded. Finally, the WSIs ground truth annotations were converted into patch-wise class labels.

The classifier uses a two-stage approach to take advantage of both photographs and WSIs of the training data. In the first stage, a 4-class Inception-Resnet-v2, pre-trained on ImageNet was fine-tuned using the patches from part A. In the second stage, the classifier was used to predict patches extracted from the WSIs. The difficulty of each patch was quantified by comparing the prediction with the ground truth, calculatedas the absolute class distance between the ground truth class and the predicted class, multiplied by the predicted probability. The top incorrect predictions (evenly sample from each of the 4 classes) were selected as "hard" examples. The classifier was then retrained using the 5600 patches from part A and 5900 "hard" patches from part B, using the same CNN architecture and hyper-parameters.

For inference, patch-wise results were aggregated onto image-level (algorithm A) and WSI heatmap (algorithm B) predictions. Specifically, algorithm A uses the average of the patch-wise predictions to produce a final label for each photograph. Algorithm B considers the average of the overlapping patches to produce a local hard label. These values are used for constructing a classification heatmap based on the patch's coordinates and the network's stride. The resulting map was normalized to a value between 0 (more likely to be normal) and 1 (more likely to be invasive carcinoma). The optimal thresholds for normal, benign, in situ carcinoma and invasive carcinoma were chosen empirically (0,0.35, 0.7, 0.75 respectively).