Data Sharing to Improve AI Used in Breast-Imaging Research

Louise Gagnon

August 16, 2021

A large dataset of digital breast tomosynthesis (DBT) images should help advance the artificial intelligence (AI) algorithms used for breast cancer imaging, researchers report.

The curated dataset, which consists of 22,032 DBT volumes associated with 5,610 studies from 5,060 patients, was published online in JAMA Network Open. The studies were divided into types: normal studies (91.4%), actionable studies that required additional imaging but no biopsy (5.0%), benign biopsied studies (2.0%), and studies that detected cancer (1.6%).

To develop and evaluate their deep-learning model for the detection of architectural distortions and masses, the researchers used a test set of 460 studies from 418 patients with cancer. Their algorithm reached a breast-based sensitivity of two false positives per DBT volume, or 65%.

"The main focus of this publication is on the dataset, rather than on a specific hypothesis," said principal researcher Maciej A. Mazurowski, PhD, scientific director of the Duke Center for Artificial Intelligence in Radiology in Durham, North Carolina.

"We have publicly shared a large dataset of digital breast tomosynthesis images, which are sometimes referred to as 3D mammograms, for more than 5000 patients. There are two purposes for sharing data like these. One is to improve research and development of machine-learning algorithms. You can train models with these data. The other reason, maybe even more important, is to provide a benchmark to test algorithms," he told Medscape Medical News.

The large-scale sharing of data is a key step toward transparency in science, said Mazurowski. "It is about making sure results can be easily reproduced and setting benchmarks."

The dataset includes masses and architectural distortions that were annotated by two experienced radiologists, but does not include annotations for calcifications and/or microcalcifications.

This lack of calcifications is a limitation of the study, said Jean Seely, MD, professor of radiology at the University of Ottawa in Ontario, Canada, who is president of the Canadian Society of Breast Imaging and regional lead for the Ontario Breast Screening Program.

"About 45% of invasive breast cancers are diagnosed based on calcifications," she explained.

Still, although the sensitivity of the AI algorithm was not high (65%) — the average sensitivity of 2D mammography is 85% — the researchers should be commended for releasing such a large dataset, said Seely.

"The fact that they have made it publicly available is very, very useful," she said, adding that the dataset can be leveraged in future breast-imaging research.

Although DBT is much better at identifying breast cancers than mammography, DBT exams take about 30% more time to read.

"There's a lot of work being done in artificial intelligence in breast imaging to not only improve the workflow for breast radiologists, but also to help with the diagnosis and detection," she noted. "Anything that helps improve the confidence and the accuracy of the radiologist is really what we're aiming for right now."

The size and the content of this dataset will contribute to breast-imaging research, said Jaron Chong, MD, from the Department of Medical Imaging at Western University in London, Ontario, Canada, who is chair of the AI Standing Committee at the Canadian Association of Radiologists.

"The contribution could be valuable in the long term because DBT is a rare dataset in comparison to conventional 2D mammography," said Chong. "Most existing datasets have focused on two-dimensional imaging. We might see more research papers reference this dataset in the future, iterating and improving upon this article's algorithm performance."

Mazurowski reports serving as an advisor to Gradient Health. Seely is an unpaid principal investigator for the Ottawa site of the Tomosynthesis Mammographic Imaging Screening Trial (TMIST). Chong has disclosed no relevant financial relationships.

JAMA Netw Open. Published online August 16, 2021. Full text

Follow Medscape on Facebook, Twitter, Instagram, and YouTube

Comments

3090D553-9492-4563-8681-AD288FA52ACE
Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as:

processing....