Artificial Intelligence for the Orthopaedic Surgeon

An Overview of Potential Benefits, Limitations, and Clinical Applications

Eric C. Makhni, MD, MBA; Sonya Makhni, MD, MBA; Prem N. Ramkumar, MD, MBA


J Am Acad Orthop Surg. 2021;29(6):235-243. 

In This Article

Role of Artificial Intelligence in United States Healthcare: Potential and Limitations

Modern healthcare is primed for positive transformation by AI.[5] Despite having the highest healthcare expenditure per capita among developed countries, the United States consistently ranks poorly in key quality metrics such as average life expectancy, maternal and infant mortalities, and health equity.[11,12] Innovation in the form of AI offers exciting potential in both improving healthcare outcomes and reducing inefficiencies that currently plague modern medicine. A second contributing factor is the recent generation of tremendous data, known as "Big Data." From high-resolution medical imaging, granular electronic medical records (EMRs) data, genome sequencing, and numerous diagnostic testing capabilities, each patient encounter generates tremendous Big Data that cannot be effectively analyzed with human processing or standard statistical methods. One study of EMRs found that a single patient's health record was associated with an average of approximately 32,000 data elements.[13] As another example, the human genome of a single individual requires 125 gigabytes of data storage.[5] In an age of information overload and evidence-based decision-making, the physician is tasked with the integrating this overwhelming amount of data and synthesizing a clinical decision, a seemingly impossible task, given limited time and context.[5] Judicious use of AI and its predictive abilities represent a solution to delivering high value care in the setting of an overwhelming degree of Big Data.

It is important to note the inherent limitations of these technologies. As with all data analysis, the quality of the output and conclusion is heavily dependent on the quality of the input data. Therefore, just as in clinical research efforts, application of ML algorithms to databases that are of low quality is unlikely to yield meaningful and accurate results. Examples of low-quality input data include data sets with large amounts of missing information, poorly organized data sets (which can introduce error upon attempted analysis), low-volume databases that are not powered enough to draw meaningful conclusions, and inaccurate but accessible databases. Additional opportunities for external validation of predictive models exist, such as through Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) standards of reporting.[14]

A second concern regarding data inputs lies in the relevance of the data. Many large databases in orthopaedics draw on administrative and claims-based data that can be susceptible to discrepancies 25% of the time.[15] This concern affects traditional clinical research and AI-driven research. Moreover, although claims based data are important, ample evidence exists that they do not represent the most relevant and meaningful outcomes to patients,[16] particularly regarding patient satisfaction.[17] Instead, they represent data that are relatively easy to extract and aggregate from EMRs. Data inputs such as patient-reported outcome measures (PROMs) and social determinants of health may be far more relevant in predicting clinical outcomes when compared with this claims-based data, which was not intended to be primary inputs for such ML algorithms.

Finally, despite the relatively autonomous nature of analysis through ML algorithms, a potential for bias still exists. This bias may be a result of the algorithm that is used to analyze the data or with the data itself (eg, skewed datasets). For example, in one study by Obermeyer et al,[18] the authors identified racial bias in the "ground truth," which led to the conclusion that Black patients are more medically complex and costly to the system than White patients, when in reality the model failed to account for access. Similarly, when Amazon (Seattle, Washington, United States) attempted to build an AI-based tool to aid in recruiting new talent, the algorithm negatively selected against women because the training data were primarily rooted in male-dominated applications to accrue data.[5]

These guidelines, however, are not comprehensive and do not encompass the importance of model validation. As technically challenging as it may be to build and train an accurately predictive model, clinical application requires external validation on an outside cohort, as was done by Ramkumar et al with primary hip and knee replacement patients using both institutional and National Inpatient Sample data.[19,20] Without external validation using institutional data, the model from the National Inpatient Sample database could have created many false predictions if applied too soon, given the inherent weaknesses of this administrative database. Thus, external validation with multiple data sets is important beyond simply reporting performance metrics.

A commonly critiqued limitation is the "black box" nature of AI-based algorithms, which intimates that the inner workings of the model's decision-making or "rationale" behind particular inferences will never be known. However, there exist several vehicles to determine the weight or importance of the data inputted into the algorithm. As one example, Shapely Additive Explanation summary aggregate Plots are a method to show the relative importance and direction of each modeling variable used to generate a prediction across a data set. For image processing, as with classifying implants from plain radiographs, heatmaps (Figure 1) can be developed to identify aspects of the image that trigger specific classification. These images are created through layering techniques that, when retroactively analyzed, help make the "black box" nature more transparent.

Figure 1.

Heat map illustrates unique stem features from pixel processing and analysis that contributed to implant classification.