What Potential Does AI Offer for Endocrinology?

Nancy A. Melville

September 28, 2023

While artificial intelligence appears to be on its way to transforming all fields of medicine, its potential benefits in endocrinology, with its substantial complexity, may be uniquely important. However, hurdles encountered with the latest AI iterations of chatbots underscore the need to proceed with caution.

"In contrast to other medical fields, endocrinology is not connected to a single organ structure; rather, it is a complicated biological system of hormones and metabolites, [intertwined with] various receptors, signaling pathways and intricate feedback mechanisms," explain the authors of a recent article on the issue in Nature Reviews Endocrinology.

With interconnections that are "often beyond the comprehension and reasoning capabilities of the human brain, AI [is anticipated] to be exceptionally well-suited to tackle this remarkable heterogeneity and complexity," they assert.

Since the first regulatory approvals for AI-based technology were granted back in 2015, endocrinology has already been revolutionized by AI-based tools, most notably with AI biosensors for continuous glucose monitoring systems alerting patients of glucose levels, and automated insulin-delivery systems.

AI-based machine learning has also ushered in improved detection and analysis of thyroid nodules and potential malignancies, with algorithms in the analysis of radiological test images enabling detection through a deeper analysis than can be applied with individual specialists.

Likewise, the benefits of AI in imaging extend to osteoporosis.

"Imaging certainly is one of the most promising fields, including (but not limited to) conventional radiography, computed tomography, and magnetic resonance tomography," explained Hans Peter Dimai, MD, a professor of medicine and endocrinology at the Medical University of Graz, in Graz, Austria, and the past president of the Austrian Bone and Mineral Society.

"A typical indication is fracture detection, not in terms of replacing expert radiologists or orthopedists but rather in terms of supporting those who are in specialist training," he told Medscape Medical News.

"Particularly the underdiagnosis of vertebral fractures has been an issue in past decades, with dramatic implications for the individual, since the first vertebral fracture would multiply the risk for any future fracture, and therefore requires immediate action from a physician's side."  

The areas expected to further benefit from AI continue to grow as systems evolve, with advances being reported across a variety of endocrinologic conditions. These include:

  • Papillary Thyroid Cancer (PTC): Central lymph node metastasis of papillary thyroid cancer is predictive of tumor recurrence and overall survival in PTC. However, few tests are able to diagnose the metastasis in the cancer with high accuracy. Using a convolutional neural network (CNN) prediction model built with a deep learning algorithm, researchers describe high diagnostic sensitivity and specificity of a model, as reported in study published earlier this year. The prediction model, developed using genetic mutations and clinicopathological factors, showed high prediction efficacy, with validation in subclinical as well as clinical metastasis groups, suggesting broad applicability.

  • Adrenal Tumors: Adrenal incidentalomas, or masses that are incidentally discovered when performing abdominal imaging for other reasons, can be a perplexing clinical challenge. Discovery of these is increasing as imaging technology advances. However, an AI-based machine learning approach utilizing CT is being developed to differentiate between subclinical pheochromocytoma and lipid-poor adenomas. As reported in a 2022 study, the prediction model scoring system used traditional radiological features on CT images to provide for a non-invasive method in assisting in the diagnosis and providing personalized care for people with adrenal tumors.

  • Osteoporosis: Bone Mineral Density (BMD): In the diagnosis of osteoporosis, the measurement of BMD using dual-energy x-ray absorptiometry (DEXA) is the gold standard. However, the availability of DEXA devices in many countries is inadequate, leaving an unmet need for alternative approaches. But one AI-based algorithm shows promising diagnostic accuracy compared with DEXA, potentially providing a low-cost screening alternative for the early diagnosis of osteoporosis.  

  • Osteoporosis: Fracture Risk Assessment Tool (FRAX): In fracture risk and prevention, the free FRAX tool, available online, is the gold standard and recommended in nearly all osteoporosis guidelines. However, several studies on AI-based tools show some benefit over FRAX, including one approach using longitudinal data with conventional spine radiographs, showing predictive accuracy that exceeds FRAX.  

  • Osteoporosis: Treatment: And for the often challenging process of treatment decision-making in osteoporosis, AI-based software, developed from more than 15,000 osteoporosis patients followed over 10 years, shows high accuracy in the prediction of response to treatment in terms of BMD increase, as described in another study. "Our results show that it is feasible to use a combination of electronic medical records (EMR)-derived information to develop a machine-learning algorithm to predict a BMD response following osteoporosis treatment," the authors report. "This alternative approach can aid physicians to select an optimal therapeutic regimen in order to maximize a patient-specific treatment outcome."

Chatbot Wrinkles

The prospects of large language models (LLMs) and ChatGPT unleash the potential to understand and generate text in a similar capacity as humans. Although controversial, they could likewise be compelling.

However, such systems can be vastly more complex than earlier AI-based tools, and some studies are illustrating the kinds of stumbling blocks that need to be overcome.

For instance, in a study published in May, researchers explored the potential of ChatGPT 4.0 to synthesize clinical guidelines for diabetic ketoacidosis from three different sources to reflect the latest evidence and local context.

Such efforts are important but can be very resource-intensive when conducted without the use of AI assistance.

The study's results showed that although ChatGPT was able to generate a comprehensive table comparing the guidelines, there were multiple recurrent errors in misreporting and nonreporting, as well as inconsistencies, "rendering the results unreliable," the authors write.

"Although ChatGPT demonstrates the potential for the synthesis of clinical guidelines, the presence of multiple recurrent errors and inconsistencies underscores the need for expert human intervention and validation," the authors conclude.

Likewise, other research using ChatGPT for use in vitreoretinal diseases, including diabetic retinopathy, further demonstrated disappointing results, with the technology showing the chatbot provided completely accurate responses to only 8 (15.4%) of 52 questions, with some responses containing inappropriate or potentially harmful medical advice.

"For example, in response to 'How do you get rid of epiretinal membrane?', the platform described vitrectomy but also included incorrect options of injection therapy and laser therapy," the authors report.

"The study highlights the limitations of using ChatGPT for the adaptation of clinical guidelines without expert human intervention," they conclude.

And in research published last month that looked at the ability of ChatGPT to interpret guidelines — in this case 26 diagnosis descriptions from the National Comprehensive Cancer Network — results showed that as many as one third of treatments recommended by the chatbot were at least partially not concordant with information stated in the NCCN guidelines, with recommendations varying based on how the question about treatment was presented.

"Clinicians should advise patients that LLM chatbots are not a reliable source of treatment information," the authors conclude.

Diversity Concerns

Among the most prominent concerns about chatbot inaccuracy has been the known lack of racial and ethnic diversity in large databases utilized in developing AI systems, potentially resulting in critical flaws in the information the systems produce.

In an editorial published with the NCCN guideline study, Atul Butte, MD, PhD, from the University of California San Francisco, noted that the shortcomings should be weighed with the potential benefits.

"There is no doubt that AI and LLMs are not yet perfect, and they carry biases that will need to be addressed," Butte writes.

"These algorithms will need to be carefully monitored as they are brought into health systems, [but] this does not alter the potential of how they can improve care for both the haves and have-nots of health care."

Commenting to Medscape Medical News, Butte elaborated that once the system flaws are refined, a key benefit will be the broader application of top standards of care to more patients who may have limited resources.

"It is a privilege to get the very best level of care from the very best centers, but that privilege is not distributable to all right now," Butte told Medscape Medical News.

"The real potential of LLMs and AI will be their ability to be trained from the patient, clinical, and outcomes data from the very best centers, and then used to deliver the best care through digital tools to all patients, especially to those without access to the best care or [those with] limited resources," he said.

Further commenting on the issue of potential bias with chatbots, Matthew Li, MD, from the University of Alberta in Edmonton, Canada, said that awareness of the nature of the problem and need for diversity in data for training and testing AI-systems issues appears to be improving.

"Thanks to much research on this topic in recent years, I think most AI researchers in medicine are at least aware of these challenges now, which was not the case only a few years ago," he told Medscape Medical News.

Across specialties, "the careful deployment of AI tools accounting for issues regarding AI model generalization, biases, and performance drift will be critical for ensuring safe and fair patient care," Li noted.

On a broader level is the ongoing general concern of the potential for over-reliance on the technology by clinicians. For example, a recent study showing radiologists across all experience levels reading mammograms were prone to automation bias when being supported by an AI-based system.

"Concerns regarding over-reliance on AI remain," said Li, who co-authored a study published in June on the issue.

"Ongoing research into and monitoring of the impact of AI systems as they are developed and deployed will be important to ensure safe patient care moving forward," he said.

Ultimately, the clinical benefit of AI systems to patients should be the bottom line, Dimai added.

"In my opinion, the clinical relevance, ie, the benefit for patients and/or physicians of a to-be-developed AI tool, must be clearly proven before its development starts and first clinical studies are carried out," he said.

"This is not always the case," Dimai said. "In other words, innovation per se should not be the only rationale and driving force for the development of such tools."

Li, an associate editor for the journal Radiology: Artificial Intelligence, reports no relevant financial relationships. Butte's disclosures are detailed in his editorial. Dimai is a member of the Key Medical Advisor Team of Image Biopsy Lab.

For more Medscape Diabetes and Endocrinology news, follow us on X (formerly Twitter) and Facebook


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.