The Algorithm Will See You Now: Artificial Intelligence May Transform Retinal Screening

William C. Ou; Charles C. Wykoff, MD, PhD


September 18, 2017

The AI Explosion

The pursuit of artificial intelligence (AI)—machine simulation of human cognitive behavior—began in earnest in the 1950s. Advances in the 21st century, particularly in machine learning (eg, teaching computers to learn from and make predictions on data without being explicitly programmed), have brought AI into the current cultural spotlight. At the forefront of these developments is the explosion of deep learning (DL).[1]

DL has its basis in artificial neural networks (ANNs), structures of interconnected nodes that mimic the synaptic connections of the human nervous system. By adjusting the strengths (weights) of connections between nodes based on training data, ANNs can "learn" to perform specific tasks, such as image recognition.

ANNs are organized into layers that feed information into each other: an input layer, an output layer, and one or more hidden layers in between. Having several hidden layers (hence, "deep" learning) facilitates multiple levels of feature abstraction, such as building from pixels to edges to shapes and so on. Although computationally intensive, such technological advances as the integration of graphics processing units have made training DL networks much more feasible.

Historically, medicine has been no stranger to AI research, with one early example being automated interpretation of ECGs.[2] Radiology, in particular, is amenable to incorporation of computer systems, with computer-aided diagnosis systems currently commercially available in such areas as mammography and detection of lung nodules on chest radiographs.[3] In ophthalmology, computer-aided diagnosis has been investigated for glaucoma, age-related macular degeneration, and diabetic retinopathy.[4,5] In the anterior segment, AI and data-driven approaches have led to the Hill-RBF calculator for intraocular lens power.[6]

Following general trends, DL is becoming a hot topic in medical AI research, with the number of publications applying this method to medical image analysis skyrocketing since 2015.[7,8,9] In dermatology, for example, a DL algorithm was shown to have performance on par with that of dermatologists for classification of skin lesions.[10] This surge in interest has been accompanied by the emergence of such companies as Zebra Medical Vision[11] and Lunit,[12] focused on using DL to analyze medical images.

Going "Deeper" for Diabetic Retinopathy Screening

Diabetes mellitus (DM) can be considered a global epidemic, with 8.5% of adults affected worldwide.[13] Chronic hyperglycemia in DM commonly leads to microvascular damage to the retina, which can lead to exudation of fluid into the retina (diabetic macular edema) and pathologic angiogenesis (proliferative diabetic retinopathy). Although antiangiogenic agents or corticosteroids are available as treatment, diabetic macular edema remains a leading cause of vision loss among working-age adults.[14]

All individuals with DM are at risk for diabetic retinopathy (DR). Accordingly, the American Academy of Ophthalmology recommends ophthalmic examinations beginning 5 years after diagnosis of type 1 DM or at the time of diagnosis of type 2 DM, with follow-up examinations at least annually for both groups.[14] Of note, however, only approximately 60% of individuals with DM currently receive annual screenings for DR.[14]

Particular interest has been devoted to digital fundus photography as a modality for DR screening. Fundus photography offers several potential advantages over dilated ophthalmoscopy, including higher sensitivity[15,16,17] and the ability to grade images remotely. Mydriatic 7-field color fundus photography, as defined in the Early Treatment Diabetic Retinopathy Study, remains the gold standard for DR grading in clinical trials.[18] However, application of this protocol to DR screening is not feasible because it is time- and labor-intensive and requires mydriasis.

Alternatives, such as single-field 45° photography (mydriatic or nonmydriatic), have been investigated,[17,19,20,21] but the optimal screening method remains undetermined.[22]

Regardless, all photographic screening methods currently require manual grading of images. In the United States, universal adoption of photographic DR screening would require evaluation of an estimated 32 million images annually.[23]

Several reports have already been published on DL for DR detection in fundus photos, boasting performance superior to that of previous detection methods.[24,25,26] The highest-profile of these, published in JAMA in 2016 by Gulshan and colleagues,[24] describes results achieved with a DL algorithm trained using 128,175 ophthalmologist-graded 45° images to identify referable DR (moderate or worse DR or referable DME—hard exudates within 1 disc diameter of the macula).

The algorithm was validated using two independent data sets (EyePACS-1 [n = 9963] and Messidor-2 [n = 1748]) and achieved remarkably high sensitivity and specificity, with the majority opinion of an ophthalmologist panel as a reference standard. At a high-specificity operating point, sensitivity was 90.3% and 87.0% and specificity was 98.1% and 98.5% for EyePACS-1 and Messidor-2, respectively; likewise, at a high-sensitivity operating point, sensitivity was 97.5% and 96.1% and specificity was 93.4% and 93.9% for the two data sets.

Weighing Advantages and Limitations

The advantages of applying automated algorithms to DR screening include consistency of grading, cost-effectiveness, and a processing capacity of 260 million images per day.[24,27] Nonetheless, the work of Gulshan and colleagues is not without limitations. As pointed out by Wong and Bressler,[28] the reference standard used—the majority opinion of ophthalmologists—may not be fully congruous with the clinical trial standard of centralized reading center grading.

In addition, even if the algorithm can replicate human graders, it cannot overcome physical limitations, such as inability to acquire photographs in some patients, poor image quality, or limited view provided by single-field photography. Such advances as ultrawide field imaging may help to diminish the impact of some of these issues.[29]

Beyond limitations specific to ophthalmology, there are also challenges inherent to the incorporation of AI into clinical practice. DL algorithms are a so-called "black box": It is unclear exactly what an algorithm has learned or what features it is considering.[28] These features may be different from those that are currently used to grade DR (eg, microaneurysms, neovascularization).

Other questions also remain; these include liability and the evolution of the physician role as AI penetrates deeper into the delivery of daily clinical care.

Although AI and DL have enormous positive potential to transform patient care for the better, it is also true that patients, providers, and regulators need to remain mindful of the potential challenges and engaged in the ongoing transformation of medical care delivery.

Follow Medscape on Twitter: @Medscape


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.