# The Use of Thyroid Function Tests in the Diagnosis of Hypopituitarism: Definition and Evaluation of the TSH Index

### Abstract and Summary

#### Abstract

**Background** TSH secretion in hypopituitary patients may be decreased due to TSH deficiency but it also remains under feedback inhibition by free thyroxine (fT4). We propose a TSH index (TSHI), as 'fT4-adjusted TSH', that corrects for any physiological TSH suppression, to provide a true estimate of pituitary thyrotroph function and any pathological pituitary suppression.

**Methods** A total of 9519 thyroid function tests (TFTs) (Bayer Immuno-1®) in 4064 patients of our institution were examined, including 444 patients investigated for hypopituitarism. Based on the physiological log-linear relationship between fT4 and TSH, we estimated the amount of feedback-induced change in log TSH per change in fT4, which allowed the extrapolation of log TSH to a fixed fT4 of 0, defining the TSHI. TSHIs were compared with other measures of pituitary function.

**Results** Feedback inhibition was estimated to cause a 0·1345 decrease in log TSH (mU/l) for 1 pmol/l increase in fT4 concentration, therefore TSHI = log TSH + 0·1345 × fT4. Patients with lower peak-stimulated GH and cortisol concentrations had a significantly lower TSHI (*P <* 0·0001). TSHIs measured before pituitary stimulation tests predicted highly significantly the risk of test failure (*P* = 0·0002). Of all potential fT4–TSH combinations within the current reference ranges, 21·9% were identified as abnormal on the basis of the TSHI.

**Conclusion** The TSHI provides an accurate estimate of the severity of pituitary dysfunction in hypopituitary patients based on simple TFTs. It predicts the probability of pituitary stimulation test failure and extends the diagnosis of TSH deficiency into areas of the normal TFT reference ranges.

#### Summary

The role of thyroid function tests (TFTs) in the diagnosis of central hypopituitarism is currently limited to those with TSH and/or free thyroxine (fT4) below the reference ranges. Such abnormalities may not occur in milder forms of pituitary dysfunction.^{[1]} TSH values alone cannot be a true measure of the severity of hypopituitarism because TSH secretion still remains under negative feedback inhibition by peripheral fT4 concentrations, and is therefore dependent on, for example, the amount of T4 supplementation.^{[2]}

Furthermore, many hypopituitary patients have both normal TSH and normal fT4 values, although often at the lower ends of the reference ranges. Such low-normal TFTs may raise the suspicion of TSH deficiency, but the firm diagnosis remains speculative (Fig. 1).

**Figure 1.**

Clinical scenario.

If it was possible to estimate the amount of physiological feedback inhibition of TSH for any given fT4 concentration, an 'uninhibited TSH' could be extrapolated for a hypothetical fT4 = 0 (i.e. an 'fT4-adjusted TSH'), which should provide a better estimate of true pituitary thyrotroph function.

Such an approach of correcting one physiological parameter for another covariate is used frequently in other derived variables, for example the body mass index (adjusting body weight for height) or 'corrected calcium' (adjusting total calcium for serum albumin concentrations), allowing a more accurate quantification of the severity of obesity or hypocalcaemia.

The basis for estimating the amount of physiological feedback inhibition of TSH by fT4 is their approximately log-linear relationship, which is maintained in patients with or without pituitary disease;^{[3–5]} a decrease in fT4 causes a proportional increase in TSH values on a logarithmic scale (assuming stable pituitary function). If a correction coefficient beta (β) was defined as the amount of change in log TSH as a result of change in fT4, log TSH levels could be extrapolated to any fixed fT4 level; for our purpose to an fT4 of 0 pmol/l to define an 'adjusted log TSH', that is the TSH Index (TSHI).

If such a derived parameter (the TSHI) was an accurate estimate of pituitary thyrotroph function and able to detect and quantify pathological TSH suppression, simple TFTs could provide a very convenient measure of pituitary function in the absence of, or in addition to, traditional stimulation tests.

### Patients and Methods

#### Patients

We studied 9519 TFTs analysed at our institution between 2002 and 2005, with both TSH and fT4 inside the reporting range (i.e. TSH ≥ 0·02 mU/l and fT4 ≥ 2 pmol/l), in 4064 adult patients (mean age 51 years, range 19–93 years, 41% women). Based on the diagnoses registry and radiotherapy records of our institution (a tertiary referral centre for oncology and endocrinology patients) and the departmental GH deficiency database, the following patient subgroups were defined:

Patients who had undergone functional pituitary assessments (444 patients,

*Pit*); those patients included confirmed hypopituitary individuals with a history of pituitary surgery before 1999 for a pituitary tumour (78,*PitSurg*: either nonfunctioning (45) or secreting ACTH (16)/prolactin (12)/GH (5), with no progressive residual tumour on serial pituitary scans, and biochemical cure in those treated for hormone excess states).Patients with no history of pituitary or thyroid disease or radiotherapy treatment and all their thyroid function test results within the reference ranges constituted a reference group (2008,

*Ref*).Patients without a history of pituitary disease or radiotherapy but treated primary thyroid disease were included in the estimation of the correction coefficient β (

*Hypothy*, 328, with treated hypothyroidism and TSH below the upper limit of the reference range, and*Thytox*, 83, with treated thyrotoxicosis and TSH above the lower limit of the reference range).

#### Pituitary Assessments

Functional pituitary status was categorized according to the results of dynamic function tests (861 tests/444 patients, 1999–2005), using peak cortisol levels during an insulin tolerance test (ITT, 452/300), a short synacthen test (SST; 30 min cortisol, 164/123) or a glucagon stimulation test (GST, 39/38), and peak GH levels during an ITT (452/300), an arginine stimulation test (345/266) or a GST (39/38). The presence or absence of gonadotrophin failure was documented for hypopituitary patients entered in the departmental GH database.

Mean peak values were applied if more than one assessment per patient per axis existed. For the purpose of test failure prediction, only the most recent stimulation test result was used for each patient, and only if a corresponding TSHI value was obtained within the preceding 2 weeks.

#### Assays

**TSH.** A heterogeneous sandwich magnetic separation assay (MSA) (third-generation) on the Immuno-1® System (Bayer, Pittsburgh, PA, USA), limit of detection 0·005 mU/l, normal range 0·35–5·0 mU/l. The intra- and interassay coefficients of variations (CVs) are 1·9–2·4% and 2·3–4·5%, respectively, for TSH > 0·5 mU/l.

**Free T4.** A heterogeneous sandwich MSA on the Immuno-1® System (Bayer), sensitivity 1·287 pmol/l, normal range 9–26 pmol/l (12·87 pmol/l = 1 ng/dl). The intra- and interassay CVs are 2·7–8·9% and 3·1–15·4%, respectively, across the entire analytic range.

**GH.** A two-site chemiluminescence immunoassay (Nichols Institute Diagnostics, San Clemente, CA, USA), sensitivity 0·02 ng/ml (1 ng/ml = 3·0 mU/l), calibrated against the second IS 98/574 International Standard. The intra- and interassay CVs are 6·2–8·7% (for GH 0·1–16·2 ng/ml) and 2·8–5·4% (for GH 0·8–17·1 ng/ml). Results reported as < 1 mU/l (< 0·33 µg/l) were imputed as 0·5 mU/l (0·17 µg/l).

**Cortisol.** A heterogeneous competitive MSA on the Immuno-1® System (Bayer), sensitivity 5·52 nmol/l (1 µg/dl = 27·6 nmol/l). The intra- and interassay CVs are 3·4–5·2% and 4·7–7·9%, respectively, for cortisol 88–920 nmol/l. Results reported as < 50 nmol/l (< 1·8 µg/dl) were imputed as 25 nmol/l (0·9 µg/l).

#### TSHI Model and Statistical Analysis

Statistical analysis was performed with the statistical software package *R*, version 2.6.2.^{[6]}

Based on the fact that changes in peripheral fT4 concentration cause proportional changes in TSH on a logarithmic scale,^{[3–5]} a correction coefficient β was defined as the estimate of the feedback-induced log TSH change per change in fT4: β = Δlog TSH/ΔfT4. For the estimation of β, only TFTs of the groups *PitSurg*, *Hypothy* or *Thytox* were included, collectively referred to as the *Combined* group, assumed to be the patients whose TFTs best displayed the fT4–TSH feedback relationship.

Linear mixed effects models^{[7]} were used to estimate the regression of log TSH on fT4 and the patient group while accounting for repeated measurements on subjects. All models were based on subject-specific intercepts (i.e. TSHIs). Regression slopes (i.e. 'correction coefficients') were considered as either group specific (*Hypothy*, *Thytox*, *PitSurg*) or group independent (*Combined*).

The estimate for the Combined β was used to extrapolate log TSH to a fixed fT4 of 0 (i.e. hypothetical uninhibited TSH secretion), defining the TSHI as TSHI = log TSH – β × fT4. The distribution of TSHI in *Ref* patients (using the single most recent TFT per patient) defined the TSHI reference range and was used to calculate standardized TSHI values (expressed as standard deviation score, SDS). A patient's TSHI value refers to the mean standardized TSHI value of all the patient's TFTs, unless otherwise specified. TSHIs of patients with different severities of pituitary failure were compared using repeated-measures analysis of variance (ANOVA). The probability of failing a GH or cortisol stimulation test was modelled as a function of standardized TSHI using both logistic regression (for TSHIs below the *Ref* mean, i.e. TSHI < 0 SD)^{[8]} and generalized additive models.^{[9]}

### Results

Based on the TFTs of the *Combined* group (412 patients, 1167 TFTs), feedback inhibition was estimated to cause an approximate 0·1345 decrease in log TSH per 1 pmol/l increase in fT4 concentration [i.e. β = Δlog TSH/ΔfT4 = –0·1345, 95% confidence interval (CI) –0·1484 to –0·1207, *P* < 0·0001]. The predicted TSH correction coefficients were similar for all three subgroups (*PitSurg:* –0*·*1326, *Hypothy*: –0·1333, *Thytox: –*0·1377) with little evidence against the use of the *Combined* correction coefficient (β = –0·1345; *P* = 0·96, likelihood-ratio test).

Extrapolation of log TSH values to the fixed fT4 of 0 pmol/l defined the TSHI as TSHI = log TSH + 0·1345 × fT4. TSHI values for *Ref* patients followed a normal distribution (*n* = 2008, *P* = 0·28, Shapiro–Wilk test) and defined the TSHI reference range (mean = 2·70, SD = 0·676). Standardized TSHIs were calculated as (TSHI – 2·70)/0·676.

Lower peak-stimulated hormone concentrations were associated with significantly reduced TSHI values (*P <* 0·0001 for GH, *P* < 0·0001 for cortisol, repeated-measures ANOVA, see Fig. 2). A single TSHI obtained within 2 weeks before a pituitary stimulation test predicted the risk of failing the stimulation test (*P* = 0·0002 for GH, *P* = 0·0002 for cortisol, for logistic regression of failure risk on TSHIs below the reference mean, see Fig. 3). For TSHIs below the reference mean, a TSHI decrease by 1 SD increased the odds of failing to achieve a peak GH level ≥ 1 µg/l by a factor 1·86 (95% CI 1·34–2·58, *P* = 0·0002 for logistic regression), and the odds of failing to achieve a peak cortisol level ≥ 400 nmol/l by a factor 1·81 (95% CI 1·32–2·48, *P* = 0·0002). Formal evidence for preference of general additive fits over simple linear logistic regression was limited (*P* = 0·65 for GH, *P* = 0·44 for cortisol). Of all patients with peak-stimulated cortisol concentrations < 500 nmol/l, severe GH deficiency (peak GH < 3 µg/l) was present in 100% (10/10 patients) of those with a subnormal mean TSHI, compared with 55% (16/29) of those with a normal TSHI (*P* = 0·016, Fisher's exact test).

**Figure 2.**

TSH Indices in patients with different severities of pituitary hormone deficiencies. Bars and vertical lines indicate mean standardized TSHI values ± 2 SEM; ND, not detected (GH < 0·3 µg/l); number of patients per group is indicated above the *x*-axis; *P* = probability (repeated-measures ANOVA).

**Figure 3.**

Probability of stimulation test failure for GH (upper two lines, peak GH < 1 µg/l) and cortisol (lower two lines, peak cortisol < 400 nmol/l) dependent on the patient's standardized TSHI obtained ≤ 2 weeks before the stimulation test. Solid lines represent generalized additive fits with 4 degrees of freedom for the TSHI effect, and dashed lines indicate linear logistic regression fits (for TSHI < 0). *P* denotes the significance of the TSHI effect in univariate linear logistic regression models.

The TSHIs of hypopituitary patients with gonadotrophin failure were significantly lower than TSHIs of those without [111 *vs.* 99 patients, –2·18 *vs.*–0·63 (SD), *P* < 0·0001, repeated-measures ANOVA].

A total of 21·9% of all currently considered 'normal' TSH–fT4 combinations (i.e. normal TSH *and* normal fT4) have abnormal TSHIs. The use of the TSHI extends the diagnosis of TSH deficiency into areas of the normal TFT reference ranges (see Fig. 4).

**Figure 4.**

Revised TFT reference range. The dashed rectangle outlines the fT4 and TSH reference ranges. The shaded part indicates fT4–TSH combinations with normal TSHI values (78% of the total reference range area). The 'RR' lines indicate the predicted relative risk of failing a cortisol stimulation test < 400 nmol/l, the 'TSHI' lines indicate standardized TSHI values. The TSH axis is log-scaled.

### Discussion

#### TSHI Model

The principally linear relationship between fT4 and log TSH is well documented, in both thyroid^{[3–5]} and pituitary disease.^{[2,10]} Although certain deviations from the log-linear model and subject-dependent slope variations have been recognized,^{[2,10,11]} the implementation of a multivariate model in everyday clinical practice is problematic due to its complexity, for example the need for several TFTs for each patient as opposed to a single TFT for the TSHI estimation. The purpose of the TSHI was to provide a sufficiently accurate estimate of the severity of hypopituitarism from basic thyroid function tests in the absence of, or in addition to, dynamic pituitary function test results. We validated its accuracy against other measures of pituitary insufficiency.

The patients used for the estimation of the correction coefficient β (significant pituitary disease or treated primary thyroid disease) were chosen under the assumption that changes in their TSH values were mainly due to changes in fT4 concentration (e.g. due to variations in T4 dose) rather than dynamic changes in pituitary function. (TFTs of the *Ref* group were not used for the estimation of β because their limited variability would have been mostly due to random excursions from the patient's mean values and less meaningful in characterizing the feedback relationship.)

The extrapolation of log TSH to a fixed value of 0 is relatively arbitrary and a choice of convenience; extrapolation to other fixed fT4 values (*F*) can be calculated as TSHI = log TSH – β × (fT4 – *F*). The choice of a different *F*-value does not alter the resulting standardized TSHI values, that is if the TSHI is being used for its intended purpose as a comparative value rather than actual prediction of TSH values. Using a more central value of *F* such as the median fT4 for the population under consideration would be more complex to implement but could give a lower variance for TSHI.

The primary derivation of the TSHI model independently of the stimulation test results was a deliberate decision; our intention was first to elucidate the underlying physiological relationship between fT4 and TSH as the described TSHI from a larger population, and only second to apply this derived parameter in other circumstances such as the stimulation test failure prediction. Although this does not provide the same verification as prospective studies of the TSHI in other populations (which we advocate), our approach should have increased the robustness of the TSHI model and should obviate repeated remodelling for other applications.

#### TSHI as a Marker of Pituitary Function

More severe degrees of hypopituitarism were associated with relatively lower TSHIs. This relationship was observed over the full spectrum of pituitary dysfunction, including very mild degrees of hormone deficiency. Notably, stimulated cortisol levels in the range of 550–700 nmol/l were still associated with TSHI values significantly below the reference mean. The significant TSHI decrease with mild forms of hypopituitarism suggests that thyrotroph involvement occurs early in pituitary dysfunction, which reverses previous beliefs of a completely intact pituitary–thyroid axis in the majority of hypopituitary patients.^{[1]} We interpret these novel findings as an indication of the superior sensitivity of the TSHI in the detection of TSH insufficiency. The early involvement of the pituitary–thyroid axis may explain why symptoms in some hypopituitary adult patients persist,^{[12]} or some children's growth rates fail to normalize,^{[13]} despite adequate cortisol, sex steroid and GH replacement therapy, and could provide the rationale for a trial of earlier T4 replacement in such patients.

TSH deficiency (defined by subnormal TSHI values) in combination with another pituitary deficit may obviate formal testing for GH deficiency in view of its very high probability under those circumstances. The significant association between lower TSHI values and the higher probability of failing a subsequent pituitary stimulation test may allow risk stratification of patients suspected of pituitary dysfunction. Monitoring TSHIs over time may also be useful in earlier recognition of evolving subclinical pituitary failure.

#### Limitations and Strengths of the TSHI Model

It is important to emphasize that the intended role of the TSHI is not to supersede the analysis and categorization of thyroid dysfunction based on actual fT4 and TSH levels, but to provide supplementary insight into the integrity of the fT4–TSH feedback relationship at pituitary level, especially with TFTs within the traditional reference ranges.

Several sources of error would have decreased the predictive power of our TSHI estimation derived from TFTs sampled under nonstandardized conditions; we did not account for the diurnal variability of TSH secretion, timing or irregularities of T4 administration, the effects of concomitant medication and other hormone replacement, or nonthyroidal illness. Many of those effects are likely to be random rather than systematic, and the use of a mean TSHI per patient would have reduced their impact on our results.

Systematic errors could have been caused by bioinactive TSH variants, which have been reported in hypopituitary patients^{[14]} with, for example, normal TSH values despite subnormal fT4 levels. In our experience, such TFT results are rare (in this study, a combination of subnormal fT4 and normal TSH occurred in no more than 1·3% of all TFTs in patients with or at risk of pituitary dysfunction), but would have to be considered as a potential cause of higher than expected TSHI values. The interpretation of low TSHI values, however, remains unaffected by these possible issues. Other systematic errors may have been caused by T3 supplementation or excess, TSH-secreting tumours (extremely rare among pituitary tumours), thyroid hormone resistance or progressive pituitary pathology, which would have affected only a very small proportion of our study patients.

Our TSHI reference range relates to patients not known to suffer from thyroid or pituitary disease, but who might have been (randomly) affected by nonthyroidal illness and medication. The use of healthy volunteers may facilitate accurate determination of the TSHI reference range, as well as calculation of the correction coefficient β by T4 administration to these individuals. Obtaining repeat TFTs under standardized sampling conditions is probably the simplest and most effective measure to further increase the TSHI accuracy.

Although the TSHI is able to redefine the 'normal' TFT reference ranges, considerable overlap still exists between pituitary disease and normality, a weakness shared with many other hormone reference ranges. A consistently subnormal TSHI, however, should be strongly considered as a possible marker of central hypothyroidism in patients without primary thyroid disease. Determination of the range of normal intraindividual variability may define a 'least significant change', and would allow detection of a significant TSHI decrease even within the TSHI reference range.

The TSHI model is a basic approximation of the fT4–TSH feedback relationship, but it is an improvement over current methods of assessing thyrotroph function, and should ultimately be valued for its practical ability to deliver clinically relevant results, a criterion shared with other derived medical indices such as the body mass index.

### Summary and Outlook

The TSHI is a numerical estimate of the pituitary thyrotroph function that can be derived from simple thyroid function tests, allowing easier detection of pathological TSH suppression in hypopituitarism regardless of concomitant T4 supplementation.

The TSHI should facilitate detailed longitudinal analysis of thyrotroph function in chronic pituitary disease (e.g. monitoring pituitary tumours or the long-term effects of pituitary irradiation), but also and uniquely in the acute setting (e.g. assessing severity and time-course of pituitary suppression during nonthyroidal illness, quantifying the impact of pituitary surgery, etc.).

The wide availability of inexpensive thyroid function assays, the stability of the molecules, nonreliance on staff-intensive testing procedures, the potential for automatic inclusion of the calculated TSHI in the current laboratory reporting format (Fig. 1) and the availability of our reference data make the TSHI an attractive parameter for evaluation of pituitary dysfunction.

The role of the TSHI in primary thyroid disease is beyond the scope of this paper, although preliminary analysis (data not shown) suggests similar potential of the TSHI as an indicator of 'functional TSH deficiency', for example after prolonged uncontrolled thyrotoxicosis or in the presence of TSH receptor stimulating antibodies.

We encourage assay manufacturers to make correction coefficients and TSHI reference ranges available for their thyroid function assays, and advocate the inclusion of the TSHI in the standard TFT reporting format (Fig. 1).

#### References

Littley, M.D., Shalet, S.M., Beardwell, C.G.

*et al*. (1989) Hypopituitarism following external radiotherapy for pituitary tumours in adults.*Quarterly Journal of Medicine*, 70, 145–160.Shimon, I., Cohen, O., Lubetsky, A.

*et al*. (2002) Thyrotropin suppression by thyroid hormone replacement is correlated with thyroxine level normalization in central hypothyroidism.*Thyroid*, 12, 823–827.Spencer, C.A., LoPresti, J.S., Patel, A.

*et al*. (1990) Applications of a new chemiluminometric thyrotropin assay to subnormal measurement.*Journal of Clinical Endocrinology and Metabolism*, 70, 453–460.Ercan-Fang, S., Schwartz, H.L., Mariash, C.N.

*et al*. (2000) Quantitative assessment of pituitary resistance to thyroid hormone from plots of the logarithm of thyrotropin versus serum free thyroxine index.*Journal of Clinical Endocrinology and Metabolism*, 85, 2299–2303.Andersen, S., Pedersen, K.M., Bruun, N.H.

*et al*. (2002) Narrow individual variations in serum T(4) and T(3) in normal subjects: a clue to the understanding of subclinical thyroid disease.*Journal of Clinical Endocrinology and Metabolism*, 87, 1068–1072.R Development Core Team (2007)

*R: a Language and Environment for Statistical Computing*: Vienna, Austria. http://www.r-project.org/.Pinheiro, J.C. & Bates, D.M. (2000)

*Mixed-Effects Models in S and S-PLUS*. Springer-Verlag, New York.Collett, D. (1991)

*Modelling Binary Data*. Chapman & Hall, London.Hastie, T.J. & Tibshirani, R.J. (1990)

*Generalised Additive Models*. Chapman & Hall, London.Meier, C.A., Maisey, M.N., Lowry, A.

*et al*. (1993) Interindividual differences in the pituitary–thyroid axis influence the interpretation of thyroid function tests.*Clinical Endocrinology*, 39, 101–107.Leow, M.K. (2007) A mathematical model of pituitary–thyroid interaction to provide an insight into the nature of the thyrotropin-thyroid hormone relationship.

*Journal of Theoretical Biology*, 248, 275–287.Koltowska-Haggstrom, M., Mattsson, A.F., Monson, J.P.

*et al*. (2006) Does long-term GH replacement therapy in hypopituitary adults with GH deficiency normalise quality of life?*European Journal of Endocrinology*, 155, 109–119.Price, D.A., Wilton, P., Jonsson, P.

*et al*. (1996) Efficacy and safety of growth hormone treatment in children with prior craniopharyngioma: an analysis of the Pharmacia and Upjohn International Growth Database (KIGS) from 1988 to 1998.*Hormone Research*, 49, 91–97.Persani, L., Ferretti, E., Borgato, S.

*et al*. (2000) Circulating thyrotropin bioactivity in sporadic central hypothyroidism.*Journal of Clinical Endocrinology and Metabolism*, 85, 3631–3635.Jostel, A., Blower, M.G. & Shalet, S.M. (2006) Highly preserved correlation between free thyroxine and thyrotropin concentrations in patients with established panhypopituitarism [P2–762]. In: ENDO 2006 Program and Abstracts Book. Endocrine Society Press, Chevy Chase, MD, USA.

Jostel, A., Ryder, W.D.J. & Shalet, S.M. (2007) The concept of 'thyroid intercepts' and its application in the diagnosis of central hypothyroidism and pituitary dysfunction [P2–561]. In: ENDO 2007 Program and Abstracts Book. Endocrine Society Press, Chevy Chase, MD, USA.

Jostel, A., Ryder, W.D.J., Shalet, S.M.

*et al*. (2008) The TSH Index: definition and clinical evaluation [P2–751]. In: ENDO 2008 Program and Abstracts Book. Endocrine Society Press, Chevy Chase, MD, USA.