In Lung Cancer, Training Computers to Be Pathologists

Alexander M. Castellino, PhD

August 16, 2016

Computerized image process technology is emerging as a field that improves upon histopathologic determination across several malignancies.

Now a report published online August 16 in Nature Communications indicates that computers can be trained to analyze –hematoxylin and eosin (H&E)–stained whole slide histopathologic images of patients with lung cancer with a higher degree of accuracy than trained pathologists.

They do so by extracting information from approximately 10,000 image features, such as cell size, shape, distribution of pixel intensity in the cells and nuclei, and texture of cells and nuclei.

This "machine-learning" approach was very accurate in picking out lung cancer with a squamous histology from lung cancer adenocarcinomas and was also able to predict prognosis — long-term from short-term survival — with a greater than 85% accuracy.

"This objective approach can complement the subjective analysis provided by pathologists," lead author Kun-Hsing Yu, MD, from the biomedical informatics program and the Department of Genetics, Stanford University, California, told Medscape Medical News. Dr Yu conceived, designed, and performed the analyses reported in this study.

"Two highly skilled pathologists assessing the same slide will agree only about 60 percent of the time. This approach replaces this subjectivity with sophisticated, quantitative measurements that we feel are likely to improve patient outcomes," Michael Snyder, PhD, one of the senior authors on the study and professor and chair of genetics, said in a Stanford University press release.

"This is an important study and shows that tumor cell morphology still has considerable relevance, complementing gene signature, if one has a computer-assisted algorithm to help in understanding the significance of the morphological changes," Thomas M. Wheeler, MD, W.L. Moody, Jr. Professor and Chair, Department of Pathology & Immunology, Baylor College of Medicine, Houston, Texas, told Medscape Medical News. Dr Wheeler was not involved in the study and was approached for comment.

Training and Test Data Sets

The Stanford researchers obtained images from 2186 histopathologic images from 1017 patients (515 lung adenocarcinoma, 502 lung squamous cell carcinoma) from The Cancer Genome Atlas (TCGA) project. The images included carcinomas as well as adjacent benign tissue. The TCGA cohort was randomly partitioned into a training set and a test set.

In addition, 294 tissue microarrays images (1 image each for 227 lung adenocarcinomas and 67 lung squamous cell carcinomas) from the Stanford Tissue Microarray (TMA) database represented a test set used to confirm data obtained from the training set.

Training the Computer and Confirming the Observations

In the machine-learning approach, the computer was first trained to be a competent pathologist. For the purpose, histopathology images, pathology reports, and clinical information from the TCGA cohort representing the training set were available to the computer for processing based on software programs.

All images were "tiled." The software used was trained to select the 10 densest tiles per image for further analysis. The researchers built a "fully automated image-segmentation pipeline" to extract objective morphologic information from thousands of images.

The computer analyzed more than 5 million histopathologic image tiles for each of the lung adenocarcinomas and lung squamous cell carcinomas from the TCGA cohort.

The analyses included extraction of information for 9879 quantitative features (eg, cell size, shape, distribution of pixel intensity in the cells and nuclei, texture of cells and nuclei) for each image tile.

Using several "classifiers," the researchers first checked whether the image features were relevant by determining if they could distinguish malignancy from normal adjacent tissue.

"This basically means that we used several different machine-learning methods to distinguish (ie, to classify) images with different properties (eg, tumor vs nontumor, adenocarcinoma vs squamous cell carcinoma)," Dr Yu said.

The researchers determined that 80 quantitative features across the classifiers were able to achieve an average area under the receiver-operating characteristic curve of greater than 0.85. In short, the software program used to train the computer was more than 85% accurate in differentiating adenocarcinoma from benign tissue and squamous cell carcinoma from benign tissue.

Determining Prognostic Significance

For validating the relevance of the quantitative features, the researchers used their classifiers to distinguish adenocarcinomas from squamous cell carcinomas using the TCGA and TMA data sets; 240 quantitative features rose to the top through use of their best classifiers.

When the quantitative features were used to check for prognostic significance, the TCGA and TMA data sets showed that patients with stage I adenocarcinomas had longer survival than those with stage II, III, or IV.

The quantitative features were not able to tease out stage-dependent survival differences for patients with squamous cell carcinomas. Determination of prognostic significance based on tumor grade was also not possible for either tumor type.

However, the quantitative features were able to differentiate between longer-term from shorter-term survivors. In the classifiers used, 60 image features were of prognostic significance in adenocarcinomas and 15 in squamous cell carcinomas.

Histopathologic differences were not obvious from standard histopathologic visual inspection of tumors of the same grade with different survival outcomes. However, the software models the researchers used were able to distinguish shorter-term survivors from longer-term survivors based on a finite number of quantitative image features.

"It is…difficult for human evaluators to predict survival outcomes based purely on the H&E stained microscopic slides," the study authors write.

"The differences in tumour cell morphology between the two histopathology images were not easily identified by visual inspection, but could be distinguished based on our qualitative image features. These quantitative features proved to be useful in predicting survival outcomes," the researchers write.

The Value of the Study

In lung cancer, differentiating adenocarcinomas from squamous cell carcinoma is important for clinical practice, Dr Yu pointed out. "This objective approach can complement the subjective analysis performed by pathologists," he added.

Dr Wheeler agreed. "The pathologist still has to determine if a person has cancer or not," he said. "This study will not replace differential diagnosis function undertaken by pathologists," he added.

Dr Wheeler also pointed out that of greater value in patient management is a predictive (rather than prognostic) study, which allows one to determine the type of treatment to which a patient may respond. "However, since all patients in the data sets were treated with standard therapy, the prognostic significance shown by the researchers will not change current therapeutic approaches," he added.

"In clinical practice, where each patient represents an n = 1, the prognostic value of this study has less relevance," he pointed out.

"However, the study shows that morphology is still very powerful if mined correctly," Dr Wheeler said. "Training machines to notice differences that may not be visualized with the naked eye will help pathologists immensely," he added.

"By predicting tumor type and survival outcomes, we can provide decision support for physicians and contribute to personalized approach to cancer management," Dr Yu said.

"We launched this study because we wanted to begin marrying imaging to our ‘omics’ studies [cancer genomics, transcriptomics, proteomics] to better understand cancer processes at a molecular level," Dr Snyder said in the Stanford press release. "This brings cancer pathology into the 21st century," he added.

The authors have disclosed no relevant financial relationships. Dr Wheeler has a relationship with PathXL Inc and DNA SeqAlliance Inc; he also serves on the Advisory Board of Medscape Pathology & Lab Medicine.

Nat Commun. Published online August 16, 2016.

Follow Medscape Oncology on Twitter: @MedscapeOnc


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.