Is It Possible to Predict How Long a Surgery Will Last?

Alex Macario, MD, MBA


July 14, 2010

Predicting the duration of a surgical case is a bit like predicting the duration of a sports competition. For example, although you might know the average duration of a professional basketball game, it is impossible to know, to the minute, how long the next game will last. And just like a basketball game that is tied after 48 minutes, a surgical case can go into overtime if unexpected findings force a change in the surgical procedure that requires extra time.

On the other hand, a surgical case duration can be shorter than usual. This might occur if bleeding is unusually light, or when all the necessary equipment, supplies, and human activity are perfectly synchronized, so no downtime occurs during the operation. Continuing with the sports analogy, this might compare to the referees not calling any fouls and the coaches not asking for game-lengthening time outs.

The Random Nature of Case Duration

Patients and operating room (OR) personnel should accept the truth that surgical case durations are stochastic (or random, from the Greek word Στόχος for "aim" or "guess"), a term that indicates that the next state is determined both by the process's predictable actions and by a random element.

This is in sharp contrast to our preference to think deterministically; to believe that with enough information we could foresee the future and thereby estimate case duration to the nearest minute. Case duration is defined as the time from "wheels in" (when the patient is brought into the room) to "wheels out" (when the patient exits the room). Thus, case length can be affected by nonsurgical factors, such as the time needed for delivery of anesthesia before incision, the time needed to place a catheter in the bladder, or the wake-up time from anesthesia after the incision is closed. Because such nonoperative factors are a small fraction of the entire case duration and are constant within one type of surgery, studies that look at surgery time usually equate case time with surgery time.

Predicting Case Duration

Predicting case duration with greater accuracy would increase patient and surgeon satisfaction by reducing waiting time relative to the scheduled start time as communicated to the patient. Scheduling cases correctly could also help reduce the amount of time that the cases in an OR list run past their scheduled finish times (often referred to as "overutilized OR time"), which is necessary to maximize OR efficiency.

It isn't necessarily how fast or on-time a surgeon or room is but the variability of the case durations that influences OR staffing. There are fast surgeons who are predictably fast and slow surgeons who are predictably slow. Some surgeons, fast or slow, stay on schedule and others do not. Both situations allow the appropriate number of nurses and their shift durations to be scheduled to match the cases in an OR. This is optimal from an economic standpoint. On the other hand, if the fast surgeon is slow on a couple of cases or the slow surgeon is even slower, the whole OR day will go over time. This can strain both personnel assigned to that OR and other resources, such as radiography equipment, needed, for example, for a different case in another OR.

Cases with easier-to-predict durations include standardized surgeries or specialties that operate on the body surface or extremities, such as hysterectomy, hernia repair, or cystoscopy In contrast, difficult-to-predict cases are the more complex, nonstandard surgeries, such as cancer surgeries or major intra-abdominal procedures. The longer the surgery, the lower the accuracy in estimating case duration.

The higher the proportion of "easy-to-predict" cases in an OR (such as an outpatient surgery center doing straightforward procedures), the more accurate the OR schedule will be, on the whole. Conversely, in tertiary hospitals where many complex and uncommon surgeries are performed, overall prediction accuracy will be lower.

Prediction error. Prediction error equals the actual duration from "wheels in" to "wheels out" of a new case (usually easily obtained from the OR information system) minus the surgeon's original estimate (if available). Each surgeon's data on past case duration can help that surgeon modify estimates for new cases, taking case complexity into consideration. In this manner, the accuracy of prediction of surgical case durations can be improved over individual surgeon or OR information system estimates.

Methods of Estimating Surgery Case Length

  • Surgeon estimate. Some surgeons consistently shorten their case duration estimates because they have too little allocated OR time and need to "fit" their cases into their allocated OR time. The result is that these surgeons' estimates are, on average, too short.

  • Other surgeons tend to purposely overestimate case durations to keep control or access of their allocated OR time, so that if a new case by a different surgeon appears, their OR time is not given away to permit booking of the new case. The result here is that their average estimates are too long.

  • Analysis of historical case durations.

  • Using surgeon estimate in combination with historical data to create new estimates.

  • Adjust for case complexity (simple, average, or complex).

  • Combine all of the above.

Several complicating factors can interfere with the ability to accurately predict how long a particular surgical case will last. These include the following:

  • Few appropriate historical cases are available on which to base the estimate for a new case.

  • Statistical distributions of surgery case times do not follow a normal (bell-shaped) curve.

  • Prediction of case duration based on "booking mnemonics" is intrinsically flawed because even though the required supplies and instruments are similar, the operative procedures are different.

These 3 complicating factors will be explored in greater detail.

Too Few Historical Cases

Major barriers to accurate scheduling are the wide variety of different procedures performed and the large number of surgeons on staff in most hospitals. The combination of these 2 facts means that for approximately half of the cases scheduled in ORs in hospitals in the United States on any given weekday, only 5 or fewer cases of the same procedure type and by the same surgeon have been performed during the preceding year.[1] With so few cases in the data bank, the average duration is difficult to pinpoint for many cases.

How can it be that so few historical cases exist on which to base the estimated duration of a new case? Although the answer may not be intuitive, one way to illustrate the concept is to ask any OR manager how many preference cards (which specify the type of surgical procedure and the specific surgeon) exist at his or her hospital. A typical number for a midsize hospital's surgical suite is about 4000 preference cards. If such a hospital performs about 12,000 cases per year, just 3 cases per preference card are performed on average and available to provide historical data for the estimated duration of a new case of that type.

Another method to determine a surgeon's number of repeat cases at a particular hospital is to analyze data from the hospital's computerized OR information system. For each case performed in a single year, the number of previous cases (of the same type of procedure performed by the same surgeon) was retrospectively identified at 2 facilities: a tertiary hospital inpatient surgical suite and an outpatient surgery center.[2] Because the surgeon and the surgical procedure are the 2 most important determinants of surgical time, cases were grouped together if they were of the same procedure type and were performed by the same surgeon.

"Procedure" was defined by Current Procedural Terminology (CPT) code(s).[3] The CPT code is a 5-digit number maintained by the American Medical Association, designed to communicate information about procedures to payers in a uniform way. If a surgical procedure had more than 1 CPT code, that set of codes was used to characterize it as a unique procedure. The CPT code or the combination of CPT codes for a given surgery reflects what was done to the patient in the OR. For example, if phacoemulsification and aspiration of cataract and insertion of intraocular lens were performed as part of a single case, the combination of these procedures counted as a single procedure for estimation of case duration.

Each procedure was then combined with a surgeon. For example, all unilateral total knee replacement cases performed by surgeon "Jones" were grouped together. Total knee replacement surgeries done by surgeon "Smith" were grouped separately. A third group, for example, consisted of bilateral total knee replacements performed by surgeon "Jones." Yet another group included laparoscopic cholecystectomies performed by surgeon "Adams." A laparoscopic cholecystectomy that also included a liver biopsy would be grouped separately because that combination of those 2 procedures defines a different surgical case.

The analysis for the inpatient surgery suite revealed that for 37% of newly scheduled cases, no cases at all of the same procedure type and by the same surgeon had been performed in the previous year. In the ambulatory surgery center, prediction was difficult for 28% of cases because no cases of the same procedure type and by the same surgeon had been performed during the preceding year (Table 1).

Table 1. Historical Surgical Case Data (Same Surgeon, Same Procedure)

Previous Cases Available for Estimating Duration of New Cases (Preceding Year) Tertiary Surgical Suite Outpatient Surgery Center
None 37% 28%
More than 4 36% 48%
More than 8 26% 39%
More than 18 12% 28%

In the tertiary inpatient surgery suite, 11,579 cases of 5156 different procedures were performed by 225 surgeons, with median ± quartile of surgical times of 2.5 ± 1.2 hours.A total of 7217 combinations of procedure and surgeon were performed during the year. In the ambulatory surgery center, 4842 cases of 1608 different procedures were performed by 160 surgeons with median ± quartile of surgical times of 1.1 ± 0.5 hours.A total of 2245 combinations of procedure and surgeon were performed in the ambulatory surgery center during that year.

Surgeons typically schedule more than 1 case into an OR. With a series of consecutive cases, the likelihood that at least 1 of these cases will be a surgical procedure that the surgeon has not performed recently (so that no historical data are available) is even higher.[4] One late-running case out of the several cases on the day's list in that OR can adversely affect the entire day's schedule.

By analyzing historical data for cases with the same surgeon and procedure, we can assess the uncertainty surrounding the estimate. In other words, case durations have a probability distribution, in that the expected case duration is not a point value, but a probability estimate. Therefore, a more informative answer to the question "How much time is left?" might be, for example, "There is a 67% probability that the case will be finished within 90 minutes." This is similar to the approach used to report weather forecasts.

Statistical Distributions of Case Times Do Not Follow a Bell Curve

The difficulty, of course, is that surgery case times are not distributed in a bell- shaped curve. The distributions are often skewed to the right and bounded to the left of the distribution by some minimally required time. As a result, unusually long cases (outliers) inflate the average estimated case duration (Figure).

Figure. Duration for a variety of cases scheduled as total knee replacement (including revisions). Should the mean or the median be taken as the estimate for the next scheduled case?

Even when many previous cases are available for estimating duration, cases still end later than their scheduled finish times because of the variability in surgical times among all such cases. This insight can be explained by considering the right-skewed curve in the Figure, which displays surgical times for a given procedure and surgeon combination. An increase in the number of previous cases permits a more accurate estimation of the central tendency or middle of the curve. However, the average length of time that surgeons finish late is affected predominantly by the variability or width of the curve.

For example, if the true median ± quartile deviation of surgical times for total knee replacements performed by surgeon "Jones" is 2.0 ± 1.0 hours, an increase in the number of previous cases used to estimate the surgical time of a future case may improve the accuracy of the estimated median from 1.8 hours to 1.9 hours. This 0.1-hour improvement in the accuracy of the estimated median would have no important effect on on-time performance relative to the effect of having a quartile deviation of 1.0 hour.

Because of the right skewness of data on case duration, alternatives for analyzing historical case duration data include:

  • The trimmed mean (delete outliers in the lower 10% and upper 10%);

  • The median, because it minimizes the effects of unusually long cases (outliers) on the estimate; or

  • The geometric mean, calculated by dividing the sum of the natural logs of case durations by the number of previous cases and then taking the exponential.

The case times of other scheduled operations have varying statistical distributions, preventing the simple use of average historical case duration. One example of this quandary is a Whipple (pancreatoduodenectomy) procedure. In about half of these cases, the abdomen is opened and the pancreatic cancer is found to be unresectable, so the case takes approximately 2 hours. The surgery takes 6 hours in the other half of cases because the tumor is resectable. Taking the average duration of the 2 case scenarios (2 hours and 6 hours), the OR information system will book 4 hours for a newly scheduled Whipple procedure, a duration that will never be correct.

The take-home message is that averaging historical case duration data does not increase prediction accuracy for a newly scheduled case as much as one would think or hope. This phenomenon has been made abundantly clear by reports from many facilities that have purchased OR information systems to address chronic complaints about inaccurate case scheduling, only to find that the OR schedule is perceived to be no more accurate after implementation of such a system.

How to estimate case duration without previous similar cases. The dilemma of how to estimate case duration when few recent similar cases have been performed can be handled several ways. The number of historical cases on which to base predictions could be increased by using data from more years, but this poses the risk that older surgical times may be confounded by other variables (eg, the learning curve of the surgeon or introduction of new surgical techniques). Lumping similar CPT codes together to increase the amount of historical data is impractical, because procedures with CPT codes that differ only in the final (fifth) digit have different case durations. For example, vitrectomy (67108) takes more than an hour longer than scleral buckle (67107).

Pooling case duration data from several hospitals could increase the database size from which to base predictions. A study of 4 academic medical centers that provided data for a total of 200,401 cases found that when a procedure was being performed for the first time at a facility, that same procedure had been performed previously (at least once) at 1 or more of the other 3 facilities only 13% to 25% of the time.[5]

When no historical time data are available for a new case, using the mean duration of similar cases (same scheduled procedure) performed by other surgeons is as accurate (unbiased and precise) a predictor as other, more sophisticated methods to analyze the data.[6] Practically, however, often the simplest approach is to use the booking surgeon's estimate.[7]

Predicting Case Duration With "Booking Mnemonics" Is Flawed

Within a hospital, multiple different procedure types and cases are often counted as 1 case when the case is called in to the OR scheduling office. This occurs because the required supplies, instruments, and surgical trays may be similar even though the operative procedure is different. Some hospitals use "mnemonics" to group such cases, a method to inform OR staff what to get ready for the next day. Because of the variety of surgical procedures grouped together under one such mnemonic, predicting case duration on the basis of the booking mnemonic is intrinsically flawed. Table 2 illustrates the variety of thoracotomy procedures posted under several different surgical procedure names that are then grouped together and designated in the computer scheduling system as CHES75 (Table 2).

Table 2. Thoracotomy Procedures Posted Under CHES75 Mnemonic

Procedure Mnemonic (Assigned When Case Is Booked) Surgical Procedure Performed Preoperative Diagnosis
CHES75 Left thoracotomy with wedge resection Left lung nodule
CHES75 Right upper lobectomy Right upper lobe mass
CHES75 Right thoracotomy with right middle lobe resection Right arterial venous malfunction
CHES75 Right thoracotomy Liver treatment, right pneumonia congenital diaphragmatic hernia
CHES75 Left thoracotomy; removal of mediastinal cyst Bronchogenic cyst (possible)
CHES75 Right thoracotomy with right pneumonectomy Right lung cancer
CHES75 Thoracotomy ligation of intercostal vessel Hematoma chest cavity; end-stage renal disease
CHES75 Right thoracotomy; resection of pleural tumors Recurrent thymoma
CHES75 Flexible fiberopticbronchoscopy; sleeve right upper lobectomy Right endobronchial carcinoid tumor

It seems counterintuitive that the wide variety of thoracotomy cases listed in Table 2 would all be booked as if they were identical cases. A given mnemonic covers a wide range of diagnoses and surgical strategies because the requirements, in terms of supplies and instruments, are similar. Comparing surgical times across facilities for purposes of benchmarking can be misleading if the mnemonic groupings at one hospital don't include the same procedure types as the comparator hospital.

In a recent study, the OR times for similar cases differed significantly among 10 hospitals in 8 countries.[8] In fact, the second-longest average OR time was 50% longer than the second-shortest average OR time for both laparoscopic cholecystectomy and lung lobectomy. Some of the variation observed among these countries can be explained by the presence of additional OR personnel but not by the use of induction rooms or locations apart from the OR to place peripheral nerve blocks. Although such locations were used widely at the studied hospitals, they were not used for induction of general anesthesia for the studied procedures.

Predicting Duration of a Case That Is Already Under Way

Every day in the hospital OR suite, the front desk administrator telephones the nurse in the OR to ask, "How much time is left in your case?" The reasons for this question include:

  • A desire to match staff to workload, so that on-call nurses and anesthesiologists are assigned to late-running rooms. Tardiness will be more excessive in facilities with long workdays because the longer the day, the more uncertainty about case start times. Tardiness does not necessarily depend on the individual durations of preceding cases or on the relative numbers of long and short cases. Rather, tardiness per case grows larger as the day progresses because the total duration of preceding cases increases.[9]

  • To help decide whether to move "to follow" cases from one OR to another so that the "to follow" case starts on time in a different room, if the previous case is delayed. A common practice at many hospitals is to move cases from one OR to another in an attempt to reduce tardiness. Although this greatly reduces tardiness for the few cases that are moved, the overall average gain is small when this reduction in tardiness is spread across all cases performed in one day. To have a significant impact on tardiness for a substantial number of patients, interventions must involve large numbers of cases.[10] A dynamic schedule can be created at the beginning of each day and continuously updated with new start times for each case, after compensating for lateness of first cases and case duration bias. These revised start times can then be used to determine when to have the next patient arrive so the patient does not have to wait any longer than necessary. Minimizing the time that patients must wait after they arrive at the hospital is an important goal for the OR manager. With a dynamic schedule, the start times of "to follow" cases are continuously updated.

  • To make sure that supplies and equipment required for the next surgery are available.

Asking someone in the room to make their best subjective guess may not be the best way to estimate how much time is left in a case. Statistical methods can analyze available historical case duration data with the objective of accurately predicting the expected time remaining in a case.[11] To accomplish this, the OR information system is programmed to automatically extract data on the surgeon's identity, the scheduled procedure, and the case's actual start time from the Anesthesia Information Management System server. (A growing number of academic medical centers in the US and in Europe are installing such systems.[12,13]) Then, ongoing Bayesian readjustments of the time remaining will be derived from how long the case has already been under way.

Bayesian analysis permits the combination of previous observations and new information to help determine the likelihood of a future event. The data-crunching is supplemented, if necessary, by electronically querying OR personnel for estimates of remaining time. These queries are particularly valuable the longer a case goes late and when very few, if any, historical cases are available to use for predictions.

As a case extends well beyond its scheduled finish time, the time remaining might be expected to decrease to zero. However, the median time remaining for repetitions of a particular scheduled operation actually remains relatively constant. This is explained, in part, by the fact that progressively more cases have already finished. Furthermore, a case that goes extraordinarily long could indicate that the procedure being performed is not the same procedure that was originally booked.

Alternatively, intraoperative complications or other random events can cause delays. When an OR orders more resources (new equipment, retractors, another surgeon, or blood products), this suggests that the case will be going over the scheduled time. Most cases are scheduled as if plan A will be executed, so if plan B goes into effect, the case will probably run over the predicted time. In other words, when a change in surgical approach or anesthetic procedure is identified (eg, at the preoperative briefing), the updated estimate of case duration should be used. Such updates are often better than original estimates.[14]

Managing Uncertainty

It is important to use the precise procedure(s), surgical team, and type of anesthetic when estimating case durations.[15] It would be nice to eliminate all uncertainty in the prediction of surgical case duration, but uncertainty persists. When we ask, "How long will the case last?" we are expecting a single numerical answer, such as, "There are 68 minutes left in the case." Such a response provides an "illusion of certainty" that feeds a human emotional need for certainty when none exists.[16]

For some decisions, the OR manager must consider the shortest time that a case could possibly last. For other decisions, the OR manager must determine the longest time a case could possibly last.[17] The goal for the OR manager is to accept the uncertainty of operative case times and work to manage it.

The OR manager can sequence each surgeon's list of cases in the same OR on the same day with the most predictable case first and the least predictable (often the longest) case last.[18]

In the OR suite of the future, patients might not show up at the same, constant amount of time in advance of planned surgeries. Rather, the time a patient is instructed to arrive at the hospital for surgery will vary on the basis of characteristics of the case(s) ahead of them. For example, if patient B is scheduled to follow case A (which has a known duration and little variability), then patient B need not arrive as far in advance of the planned start time. If patient B is scheduled to follow a case that has or cases that have highly uncertain durations (eg, a Whipple procedure), patient B's instructions might be to come in early.