Establishing Reference Intervals for Clinical Laboratory Test Results: Is There a Better Way?

Alex Katayev, MD; Claudiu Balciza; David W. Seccombe, MD, PhD


Am J Clin Pathol. 2010;133(2):179 

In This Article

Abstract and Introduction


Reference intervals are essential for clinical laboratory test interpretation and patient care. Methods for estimating them are expensive, difficult to perform, often inaccurate, and nonreproducible. A computerized indirect Hoffmann method was studied for accuracy and reproducibility. The study used data collected retrospectively for 5 analytes without exclusions and filtering from a nationwide chain of clinical reference laboratories in the United States. The accuracy was assessed by the comparability of reference intervals as calculated by the new method with published peer-reviewed studies, and reproducibility was assessed by the comparability of 2 sets of reference intervals derived from 2 different data sets. There was no statistically significant difference between the calculated and published reference intervals or between the 2 sets of intervals that were derived from different data sets. A computerized Hoffmann method for indirect estimation of reference intervals using stored test results is proved to be accurate and reproducible.


It is hard to underestimate the importance of clinical laboratory test results. Nearly 80% of physicians' medical decisions are based on information provided by laboratory reports.[1] A test result by itself is of little value unless it is reported with the appropriate information for its interpretation. Typically, this information is provided in the form of a reference interval (RI) or medical decision limit. An RI as defined by Ceriotti "is an interval that, when applied to the population serviced by the laboratory correctly includes most of the subjects with characteristics similar to the reference group and excludes the others."[2] (p115) No RI is completely "right" or "wrong." The majority of RIs in use today refer to the central 95% of the reference population of subjects. By definition, 5% of all results from "healthy" people will fall outside of the reported RI and, as such, will be flagged as being "abnormal."

There are many problems associated with the calculation of RI. The latest edition of the Clinical and Laboratory Standards Institute–approved guideline, "Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory," recognizes the difficulties and controversies surrounding the establishment of RIs and the verification process: "…the working group recognizes the reality that, in practice, very few laboratories perform their own reference interval studies," "…instead of performing a new reference interval study, laboratories and manufacturers refer to studies done many decades ago, when both the methods and the population were very different."[3] (p1)

It has been recommended that an RI be established by selecting a statistically sufficient group (a minimum of 120) of healthy reference subjects. However, it is noted in the guideline that "Health is a relative condition lacking a universal definition. Defining what is considered healthy becomes the initial problem in any study…."[3] (p8) In reality, there will always be some level of uncertainty with a given selection protocol not only because of the definition of health that was selected but also because of the very real possibility that some of the selected subjects may, in fact, have subclinical disease.

Recruiting a valid group of reference subjects and obtaining informed consent in today's environment is costly, time-intensive, and virtually an impossible task for most laboratories. The challenge is further magnified in establishing RIs for different age groups (eg, pediatric patients and geriatric patients), uncommon sample types (eg, cerebrospinal fluid and aspirations), timed collections, challenge tests, and serial measurements.

In light of these difficulties, most laboratories elect not to establish their own RIs, but rather choose to verify RIs that have been reported by the manufacturer or as established by another laboratory. This is a relatively simple study and requires only 20 healthy subjects to recruit.[3] The underlying assumption here is that the laboratory analytic system is calibrated and producing similar results as the method that was originally used in the published RIs. However, this may not be true because in many cases, the details of the reference study such as its design, the inclusion and exclusion criteria used for selecting the healthy recruits, preanalytic sources of variation, etc, are lacking.

A laboratory can elect to "transfer" the RIs that were in use with an older method (or from another laboratory) to a new method. To do this, the laboratory must first demonstrate that the 2 methods produce comparable results. It is well known that analytic systems drift over time, and there is no guarantee that the method of today is producing results that are comparable to those that were produced at the time of the original RI study. This technique is the main reason why many laboratories today are using RIs that were established decades ago and are out-of-date.

Even if a laboratory was able to obtain a rigorously selected statistically sufficient number of healthy subjects and perform all the necessary testing, the next step would require statistical analysis of data. What statistical technique should be used: parametric, transformed parametric, non-parametric, or many others described? The judgment in most cases will be made subjectively because there are no clear guidelines, and the resultant intervals will differ depending on the method used.

Laboratories are often faced with test data that exhibit a multimodal or an asymmetric distribution. This may reflect a large prevalence of subclinical disease within the selected population or subgroup-related differences in normal ranges. The latter requires partitioning of test subjects by sex, age, race, and other factors. Partitioning by sex is relatively easy (select a minimum of 120 males and 120 females). However, partitioning by age groups is not a simple matter. What age cutoffs should be used? How many groups should be studied? There are some complex statistical techniques available, but none seem ideal for solving partitioning problems.[3]

The last major challenge is cost. In the modern environment when laboratories are struggling to stay profitable, not everyone is willing to budget the appropriate resources for a lengthy and expensive RI study.

An alternative approach for establishing RIs is to do an indirect so-called a posteriori study of the patient data already collected and stored in the laboratory database. This is appealing because the data are readily available and will result in time and cost savings. A number of publications discuss this approach.[4–8] Most of these studies were able to report clinically relevant and meaningful RIs. All of them used various sophisticated filters to exclude results from "unhealthy" subjects, and some used data from hospital laboratories and some from outpatient care settings or noninstitutionalized population study databases. Most of these studies used complex statistical algorithms to derive the final intervals. However, current guidelines do not endorse these methods as a primary approach for establishing RIs, mainly out of concern for the fact that most of the data may not come from reference or healthy subjects.[3] This position may be justified for test results collected from hospitalized patients but is questionable when considering a very large number of results that have been collected in outpatient settings. Indeed, there is no disease with prevalence close to 50%. On the other hand, as discussed, the recommended direct sampling techniques are not without their own assumptions.

The reliability of an RI study should be a function of its accuracy and reproducibility and have a direct relationship with the number of observations used and method standardization. Statistically, it is more robust to analyze thousands of measurements that may include some unhealthy subjects than 120 measurements that are assumed to be from healthy subjects. The main problem with most of the reported indirect studies is that they used statistical analyses designed for a direct sampling technique. Hoffmann, in his classic JAMA article from 1963, described a technique designed for indirect estimation of RIs using all available test results from a laboratory's database: "This statistical technique can be used for obtaining any normal values in medicine where a group of measurements are available and the mathematical assumptions are reasonable."[9] (p868) Although his work has been widely cited, few authors have actually applied the Hoffmann method in their calculations.

A notable exception is the manual of pediatric RIs by Soldin et al[10] that is now in its sixth edition and published by American Association for Clinical Chemistry. This fundamental work was limited by the relatively small number of observations (typically 50–100) that were used and by the semimanual application of Hoffmann analysis of data, which added subjectivity to the calculations.[10]

Our goal in this study was to assess the reliability of the Hoffmann approach using a newly developed computer program designed to remove subjectivity from RI calculations.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.