Delay to Formalin Fixation (Cold Ischemia Time) Effect on Breast Cancer Molecules

Thaer Khoury, MD


Am J Clin Pathol. 2018;149(4):275-292. 

In This Article

Abstract and Introduction


Objectives The gold standard of examining breast biomarkers, including estrogen receptor (ER)/progesterone receptor (PR)/human epidermal growth factor receptor 2 (HER2)/Ki-67, is to perform these assays on formalin-fixed, paraffin-embedded tissue. However, preanalytical variables may confound these assays. One of these factors is delay to formalin fixation (DFF). The purpose of this review is to evaluate each study that investigated the effect of DFF on breast biomarkers and other molecules.

Methods Thirteen primary research articles were identified by the literature search. The credibility of the studies was judged based on the degree of controlling other confounding factors. Nine studies had a prospective design with a high number of controlled variables.

Results Most of the studies concluded that DFF had an effect on ER/PR/HER2. Some of these studies showed that DFF had negative effect on other markers used either clinically or for research purposes.

Conclusions The vast majority of the studies agree that DFF has negative effect on breast biomarkers.


Hormone receptors (estrogen receptor [ER]/progesterone receptor [PR]) status and human epidermal growth factor receptor 2 (HER2) status are the major drivers of clinical decision making for patients with breast cancer (BC). Targeted therapy has been used in patients with BC for decades, including hormonal therapies for ER+ tumors[1] and anti-HER2 therapies for HER2+ tumors.[2,3] A number of clinically available assays categorize patients with ER+/HER2– and node-negative tumors into chemosensitive vs nonchemosensitive. Most of these assays are RNA based.[4–6] However, some are still using immunohistochemistry (IHC) staining on formalin-fixed, paraffin-embedded tissue. One of these assays is IHC4, which includes Ki-67 in addition to ER, PR, and HER2.[7] Therefore, accurate results of these biomarkers are very important in patient care.

Many factors affect the outcome of the biomarkers assays, including fixation delay; fixative type; time in fixative; reagents and conditions of dehydration, clearing, and paraffin impregnation; and conditions of slide drying and storage.[8] Studies that investigated preanalytical variables have focused on the impact of prolonged formalin fixation.[9–11] The other factor that has a role in the biomarkers results is the difference in expression between the core needle biopsy (CNB) and the excisional biopsy (EB).[12–15] Masood et al[16] proposed that delay to formalin fixation (DFF), also known as cold ischemia time, is a confounding factor affecting ER/PR. This variable has been long ignored up until the publications performed by the author.[17–20] Thereafter, a number of publications followed by examining the hypothesis.[21–28] The most recent guidelines for ER, PR, and HER2 testing recommended minimizing DFF to less than 1 hour.[29,30] The College of American Pathologists (CAP) made it mandatory to include it in the pathology report.[31]

The purpose of this review is to evaluate the studies that investigated the effect of DFF on breast biomarkers, including ER, PR, HER2, Ki-67, and other molecules. The author evaluated the studies' design and the results and then suggested recommendations on how to handle the samples to avoid prolonged DFF and comply with the CAP guidelines.

Studies Included in This Review (Summary of Design and Methods)

Overview of the Studies. A Google Scholar search for studies that cited Khoury et al[17] was conducted. The resulting studies, along with other studies found on PubMed, using the key words cold ischemia time and delay to formalin fixation, were included in the study review (n = 13). Table 1 summarizes the studies' design and methodology. The number of cases reflected the number of patients from which the tumor tissues were excised. In this table, the number of samples was defined as the number of tissue pieces taken from the individual patient's tumor, in a form of core cut or tissue microarray (TMA). In the studies that had a prospective design, the samples were purposefully exposed to DFF time periods. They were compared with each other and with the clinically submitted tissues. In the studies that had retrospective design, there were various types of comparisons. Under the category of DFF time periods, the sample types (EB, CNB, and/or TMA) were listed along with the DFF time periods. Also listed are the studied variables, markers, and clones.

A list of confounding factors was recorded in Table 1 as well. The author evaluated each variable in each study and decided if the variable was controlled, partially controlled, not controlled, or unknown based on certain criteria. These factors included the following:

  1. DFF time periods: Two types of studies were recognized: prospective, which evaluated samples subjected to sequential DFF time periods, or retrospective, which compared CNBs (with an assumption that DFF time was negligible) with subsequent EBs (with or without formally recorded DFF time periods). Only prospective studies were considered controlled for this variable, since in the retrospective studies, DFF time periods were either estimated or assumed to be negligible.

  2. Time in formalin: Defined as the time lag between immersing the tissue in formalin and processing.[9] This variable was considered controlled if the time in formalin was between 6 and 72 hours.[31]

  3. Number of IHC runs: This variable was considered a confounding factor due to the possibility of staining variability from run to run as previously described in a study that tested HER2.[33] The staining was considered controlled if all samples were stained simultaneously in the same run.

  4. Interobserver variability: The interobserver variability was considered controlled if more than one pathologist scored the test and used a semiquantitative methodology or when an automated scoring method was used. It has been proposed that using a semiquantitative scoring methodology minimizes the interobserver variability.[34] Therefore, when this method was used and only a single pathologist scored the test, the interobserver variability was considered partially controlled.

  5. Avoid delay between harvesting the tissue and receiving the sample in the laboratory: In clinical practice, only palpable tumors are sent directly to the laboratory right after being surgically removed. Patients with nonpalpable tumors normally undergo a needle localization procedure, which would delay transferring the tissue from the surgery suite to the laboratory. This delay is difficult to control or measure, which would create nonuniformity among the cases. If the study used nonpalpable tumors, this factor was considered not controlled.

  6. No neoadjuvant therapy: It has been shown that chemotherapeutic agents may affect biomarker expression.[35] This variable was considered controlled if the patient was treated in the adjuvant setting.

  7. Tumor heterogeneity: Heterogeneity is a known fact in breast biomarkers.[36] In a large epidemiologic study performed by the author, it was found that intratumoral heterogeneity was the source of discordance in BC biomarker classification. The study used TMA to examine this hypothesis.[37] Since most of the studies in this review used small samples (TMA or CNB), heterogeneity was considered a confounding factor. To minimize the effect of this factor, some investigators took certain measures. In the author's opinion, there were two ways of partially controlling tumor heterogeneity: first, if the investigators defined scoring changes between samples, taking into consideration tumor heterogeneity,[22] and, second, when a relatively large number of cores were extracted and constructed in the TMA block.[38–40]

Methodology and Design of Each Study. The first study that investigated the effect of DFF on ER and PR was published in 1998 by Masood et al,[16] who studied not only DFF but also other confounding factors, including antibody clones and processing. They chose three cases and divided each into 11 samples. The reference samples were immediately fixed and kept in formalin for 24 hours. One of the other 10 samples was exposed to DFF time period of 12 hours. All samples (n = 6), including the reference samples and the DFF samples, had time in formalin for 24 hours.

Khoury et al[17–19] and Qiu et al[20] prospectively collected BC (n = 10) cases and studied the effect of DFF on ER, PR, HER2 (IHC, in situ hybridization [ISH], fluorescence ISH (FISH) and dual ISH), Ki-67, histomorphology and other IHC markers. The tumors were excised and underwent immediate gross evaluation. The process is illustrated in schematic Figure 1. All patients were treated in the adjuvant setting. Only palpable tumors were elected for the study. Each tumor was divided into eight samples and consecutively fixed after 0, 10, and 30 minutes and 1, 2, 4, and 8 hours; one section was kept in saline and stored overnight in the refrigerator at 4°C. All cases had time in formalin between 6 and 48 hours. Four 0.6-mm cores were extracted from each case and constructed in a total of two TMA blocks.

Figure 1.

Schematic figure showing the process of performing the prospective study. Note the tissue is exposed to warm ischemia during surgery before receiving it in the laboratory. The tissue is dissected into small samples and exposed to various delay to formalin fixation (DFF) time periods. Then the samples are immersed in formalin for a period between 8 and 48 hours. The tissue is then processed and tissue microarrays (TMAs) are made. ON, overnight.

Pinhel et al[21] prospectively collected BC (n = 28) cases and studied the effect of DFF on ER, PR, HER2 (IHC), Ki-67, p-AKT, and pErk1/2. However, only 20 cases had a complete set of samples. A 14-gauge core cut was taken immediately after tumor resection and placed in formalin, designated as core A. Another core cut was taken after the specimen was sent to x-ray, designated as core B. The excision was then placed in formalin and subjected to the histopathology department's routine fixation. It is worth noting that sample A was placed in formalin immediately; however, the investigators mentioned that the median and range time for samples A and B was 30 minutes (20-80 minutes).

Yildiz-Aktas et al[22] prospectively collected cases and exposed the samples to DFF (0.5, 1, 2, 3, 4, 24, and 48 hours) within the refrigerator (n = 23) or at room temperature (n = 25). They studied the effect of DFF on ER, PR, and HER2 (IHC). All patients were treated in the adjuvant setting. However, the statuses of a few variables were unclear, including the type of procedure (needle localization or palpable mass), type of samples (TMA or full sections), and staining procedure (in a single run or multiple runs). All samples had time in formalin between 8 and 48 hours. The results of these samples were compared with the corresponding CNB scores. A single pathologist conducted the scoring using H-score.

Li et al[23] retrospectively collected BC cases (n = 97) for which corresponding CNBs were available. They studied the effect of DFF on ER. They calculated DFF time periods and excluded those that had DFF time period less than 1 hour. Patients were treated in the adjuvant or neoadjuvant setting. The samples were x-rayed in the pathology department after the needle localization procedure. Some samples exceeded the required 72 hours in formalin. The samples were stained in separate runs. DFF time periods for the resection specimens ranged from 64 to 357 minutes (median, 109 minutes). The distribution of DFF time periods was as follows: 58 under 120 minutes, 25 between 121 and 180 minutes, 11 between 181 and 240 minutes, and three greater than 241 minutes.

Neumeister et al[24] investigated the effect of DFF on a number of markers, including ER, PR, HER2 (IHC), Ki-67, and other markers. Retrospectively collected BC (n = 93) cases were constructed in a TMA with 25 matched CNBs and EBs. The TMA (0.6-mm core size) tissues had a DFF time period ranging from 25 to 415 minutes, while matching CNBs were generally fixed within 3 minutes. DFF time periods for EBs were calculated only by judging their general practice that the time was at least 60 minutes. The type of surgery (needle localization or palpable mass) and the staining procedure (in a single run or multiple runs) were not mentioned.

Pekmezci et al[25] retrospectively collected CNBs and the subsequent EBs for BC (n = 190) cases. They studied ER, PR, and HER2 IHC. DFF time periods were estimated based on their practice (CNBs less than 1 hour, EBs more than 1 hour). No other variables were controlled in this study. The stains were scored using the Allred scoring system. However, the number of scorers was not mentioned.

Portier et al[26] retrospectively collected BC (n = 82) cases from the archive. They included only the cases that had known DFF time periods. The tumors were constructed in TMA blocks (two 0.6-mm cores each). DFF time periods were less than 1 hour (n = 45), 1 to 2 hours (n = 27), 2 to 3 hours (n = 6), or more than 3 hours (n = 6). They performed HER2 assays using three methods: IHC, FISH, and dual ISH. Then they compared the results of HER2 (IHC vs FISH vs dual ISH) for each case within each group but not between the groups. Moreover, there was no comparison made between the samples in any given patient. In addition, they compared the signal intensity among the cases with regard to each DFF time period. None of the variables was controlled except for the interobserver variability.

Lee et al[27] prospectively collected BC (n = 9) cases and divided them into six parts. The samples were exposed to DFF time periods of 1 hour, 2 hours, 4 hours, and 24 hours. Then they were stained for HER2 IHC with three clones (HeceptTest, Leica Oracle HER2 staining kit [TA9145] and 4B5) and HER2 FISH (PathVysion). All samples had at least 10 hours in formalin. The type of surgical procedure (needle localizing or palpable mass) and the therapy modality (adjuvant or neoadjuvant) were not mentioned. They were scored following the UK guidelines by two pathologists.[32]

Moatamed et al[28] prospectively studied a single BC case and stained it with HER2+ (IHC [HeceptTest] and FISH [PathVysion]). The patient was treated with mastectomy in the adjuvant setting. The sample was stored fresh without any fixative at 4°C for 4 days. Then the tissue was cut into core biopsy-sized pieces and fixed in various types of fixatives for different time periods. The study was designed to evaluate multiple preanalytical variables, including DFF, time in formalin, and the type of fixatives. We selected the samples that fit the scope of this review (10% formalin-fixed sample with time in formalin between 8 and 72 hours). It is unclear how many pathologists interpreted the IHC and FISH assays.

Results of the Studies

The results are divided into five groups: ER/PR, HER2 IHC, HER2 ISH, Ki-67, and other molecules. The results are combined in one table Table 2, in which the order of the studies matches those in Table 1.

Estrogen and Progesterone Receptors. Masood et al[16] found that ER and PR in the properly fixed samples (immediate fixation with time in formalin for 24 hours) had a score of 100% each. The samples that were exposed to 12 hours of DFF had an ER score of 65.4% and a PR score of 40.4% with no statistically significant difference.

Khoury et al[17] evaluated ER (clone 1D5; DAKO) and PR (PgR636; DAKO). The stains were scored using a Q-scoring system by two pathologists (more details about this methodology are described in Khoury et al[17]). They found that the mean Q-score started to decline at the 1-hour mark for PR and the 2-hour mark for ER. This decline was not statistically significant. Leaving a specimen in a 4°C refrigerator was equivalent to leaving the sample at room temperature without fixation for 8 hours.

In another study, Qiu et al[20] preformed IHC staining for ER and PR on the same TMAs. They performed staining for two additional clones for ER and PR from Novocastra (6F11 and 16) and Ventana (SP1 and 1E2), respectively. They found that the mean Q-score for ER started to decline at 2 to 4 hours for clones 1D5 and 6F11 and at 1 hour for SP1. SP1 was superior to 1D5 at the 8-hour mark with statistical significance. One case was converted from positive to negative when exposed to DFF using clone 1D5 or 6F11. Another case was converted from positive to negative when exposed to DFF using the SP1 clone. The Q-score for PR started to decline at 1 hour for clones PgR636 and 16 and 4 to 8 hours for clone 1E2. Although not statistically significant, the change occurred in both the number/percentage of positive cells and the intensity of the stain.

Pinhel et al[21] studied ER (clone 6F11) and PR (clone 16) in three samples from the same tumor. There was no statistically significant difference between the three samples. However, when the scores of sample A (immediate fixation) and sample B (fixed after specimen x-ray) were averaged and compared with the scores of sample C (routine processing), ER had a trend toward less expression in sample C.

Yildiz-Aktas et al[22] performed IHC staining using the Ventana platform and clone SP1 for ER and 1E2 for PR. The authors used a semiquantitative scoring methodology of H-score (more details about this methodology are described in Yildix-Aktas et al[22]). The case was deemed to have a mild reduction when the DFF sample had half to three-fourths of the CNB score, while it was considered a significant reduction when the DFF sample had less than half of the CNB score. It was considered a true reduction in the scores when consistent reduction was seen in all samples. It was considered focal reduction (tumor heterogeneity) when at least one sample showed no reduction. Then the authors compared the scores of DFF samples vs CNBs using the t test. For the refrigerated samples, the earliest significant reduction was seen at the 4-hour mark (both ER and PR). For the nonrefrigerated samples, the earliest significant reduction was seen at the 2-hour mark (both ER and PR). However, mild reduction was seen as early as 0.5 hours for PR. The number of cases that had any type of reduction at any time was 32% for ER and 24% for PR (refrigerated samples) and 48% for ER and 43% for PR (nonrefrigerated samples). The average and median ER and PR scores for different DFF time periods for the refrigerated and the nonrefrigerated samples were also recorded. These numbers were translated into graphs with significant P values (using t test) marked on the DFF time periods. Only nonrefrigerated samples with DFFs of 3, 4, 24, and 48 hours for ER and 24 and 48 hours for PR were statistically significant Figure 2.

Figure 2.

Graphs showing (A, C) estrogen receptor (ER) and (B, D) progesterone receptor (PR) H-score reduction for different delay to formalin fixation time periods in (A, B) refrigerated and (C, D) nonrefrigerated samples. Note steady reduction of H-scores in all graphs, with more steady reduction in the nonrefrigerated samples vs refrigerated samples. P values (using t test) were statistically significant at 3, 4, 24, and 48 hours for nonrefrigerated samples for ER and at 24 and 48 hours for PR.22

Li et al[23] stained samples (CNBs and EBs) using Leica Microsystems with an anti-ER (clone 6F11) antibody. Both the percentage and the intensity of the stained cells were recorded. ER results were divided into three categories: positive, low positive, and negative. They considered any difference in ER staining greater than 5% that resulted in a change in category an event. According to this definition, they found that the staining category changed for five (5%) cases. All these samples had DFFs around 2 to 3 hours. However, their patients were treated with neoadjuvant therapy, and the samples were tested pre- and posttherapy. They also found four cases in which the staining was 0% on the CNBs but 1% to 5% on the EBs. They considered these cases not significant for their analysis. When the 2-hour cutoff was used, there was a difference between the CNBs and the EBs. They considered this difference a trend. They also examined the effect of DFF on the staining intensity. For that they included only cases with weak staining (n = 12). They had mixed results, where some cases were negative in the CNBs but weak positive in the EBs and vice versa.

Neumeister et al[24] performed ER (clones 1D5 and SP1) and PR (clones 636 and PgR A/B C89F7) staining using immunofluorescence methods and measured by an automated quantitative analysis (AQUA) system. They analyzed the data of the TMAs by using least squares univariate linear regressions and computed the slope and intercept. ER and PR expressions had a nonstatistically significant trend toward decreased antigen level with longer DFF time periods. For the paired CNBs and EBs, there was no statistically significant loss in ER or PR.

Pekmezci et al[25] retrospectively compared ER (clone SP1) and PR (clone 1E2) between CNBs and the subsequent EBs in BC (n = 190) cases. They found 3.4% of the cases had ER+ in the CNB and negative in the EB and PR positive in 7.1% in the CNB and negative in the EB.

HER2 Tested by IHC. Khoury et al[17] stained the TMAs with HER2 (HerceptTest; DAKO). There was only a single case with diffuse strong staining. There was no difference in the staining with regard to DFF time periods. They did not interpret the change for the other cases that had 1+ or 2+ staining due to tumor heterogeneity.

Pinhel et al[21] performed a rabbit monoclonal antibody for HER2. There was a statistically significant difference between samples A and B (properly fixed) vs sample C (exposed to DFF with routine processing), with the latter having more 0+ and 1+ stained cases. However, there was no difference in the number of cases with 3+ or 2+ staining.

Yildiz-Aktas et al[22] found that while reduction of staining was seen in 24% of the refrigerated samples, 48% of the nonrefrigerated samples showed staining reduction. There was only one case with 3+ staining, which had the same expression in both refrigerated and nonrefrigerated samples for all time periods. Among the cases with 2+ staining, one case showed a one-step staining reduction at the 4-hour mark in the refrigerated samples. In contrast, six cases had a one- to two-step staining reduction for the nonrefrigerated samples (two cases starting at 3 hours, single case at 4 hours, single case at 24 hours, and two cases at 48 hours).

Neumeister et al[24] found a nonstatistically significant trend toward less level of HER2 with prolonged DFF. When AQUA scoring was used instead of recording the percentage of positive cells, there was no decrease or even a trend in HER2 results with regard to the DFF time periods.

Pekmezci et al[25] compared the results of HER2 IHC between the CNBs and the subsequent EBs (n = 164). They found no clinically significant difference between the two types of samples.

Lee et al[27] examined the effect of various DFF time periods on HER2 IHC and FISH. All samples were positive for HER2 (3+ by IHC). There were five samples tested with HeceptTest, nine with Oracle, and nine with 4B5. All DFF time periods had samples available for HerceptTest scoring. One case had a HER2 score converted from 3+ to 1+ at the 1-hour mark. All cases that were tested with Oracle had tissue samples available for interpretation at the 8-hour mark. There was a single case converted from 3+ to 0+ and another from 3+ or 2+. Similarly, all cases tested with 4B5 had samples available for interpretation at the 8-hour mark. A single case was converted from 3+ to 0+ and another from 3+ to 2+.

In Moatamed et al[28] study, we selected the results of the samples that were fixed in 10% formalin for 8 to 72 hours. The samples had varied HER2 IHC results, 2+ or 3+. None of these samples converted to 0+ or 1+.

HER2 Tested by ISH. Khoury et al[17] interpreted the quality of the tissue tested with FISH but not the results of HER2/CEP17 ratios. They defined a sample to be compromised for FISH evaluation if one of the following was identified: (1) vague cellular outline, (2) nonuniform weak signal (>25% unscorable because of weak signal), (3) nonoptimal enzymatic digestion (poor nuclear resolution, autofluorescence), or (4) background obscures signal (>10% of signal over cytoplasm).[41] They found that the tissue started to be compromised at the 1-hour mark. However, the compromise started to be statistically significant at the 2-hour mark. Interestingly, they found that HER2 signal but not CEP17 signal was weaker/faded with prolonged DFF time periods.

Portier et al[26] studied the effect of DFF on HER2 tested by two methods of ISH, FISH and dual ISH. They found that there was no significant change in HER2/CEP17 ratios by FISH or dual ISH within each group of cases that had similar DFF time periods. They also scored the signal intensity and preservation as a measure for the quality of the tissue. They compared each ISH assay individually and found no significant degradation of signal intensity with a DFF time period up to 3 hours. However, FISH showed a significant degradation in signal intensity at more than the 3-hour DFF time period in detecting HER2 and CEP17 in the stroma, as well as CEP17 in the tumor.

Subsequently, Khoury et al[19] performed dual ISH staining on the same group of cases that had been previously studied by FISH. They counted HER2 and CEP17 signals and calculated HER2/CEP17 ratios. There was no change in HER2 or CEP17 signals or HER2/CEP17 ratio with regard to DFF time periods. However, when they evaluated the percentage of cells with a lost signal, they found a statistically significant steady decline with DFF time periods. The largest difference was seen between 1-hour and 2-hour samples (11% and 21%, respectively). Moreover, they investigated troubleshooting issues such as red haze and nuclear bubbling. They found that these changes started to be statistically significant as early as 30 minutes of DFF.

Lee et al[27] investigated the effect of DFF time periods on HER2 tested by FISH. They only recorded HER2/CEP17 ratios with relation to the DFF time periods. They found only a single case with a possible DFF effect on the ratio converting it from amplified (ratio = 2.17 at 0-hour mark) to nonamplified (ratio = 1.25 at 4-hour mark and 1.13 at 8-hour mark).

Finally, Moatamed et al[28] studied a single case with HER2 amplification. They compared the clinically submitted sample with samples exposed to DFF of 4 days in a 4°C condition. They found no difference in HER2/CEP17 ratios between the samples.

Ki-67. Khoury[18] found a minimally insignificant Ki-67 expression decline with DFF time at the 4-hour and 8-hour marks. Neumeister et al[24] performed Ki-67 staining using two different clones (MIB1 and SP6). They found a nonstatistically significant trend toward less level of Ki-67 with prolonged DFF time periods. When AQUA scoring was used instead of scoring the percentage of positive cells, there was no decrease or even a trend with DFF time periods. Pinhel et al[21] found no difference or a trend when comparing samples A vs B vs C or between samples A and B vs C.

Other Molecules

Khoury[18] evaluated the effect of DFF on the histomorphology and on a list of IHC markers usually used in breast diseases. They found histomorphologic changes in nine of 10 cases. These changes started to appear at the 2-hour mark. Notably, a single case showed changes in morphology from carcinoma in the properly fixed sample to lymphoma in the DFF sample. The investigators divided the markers into three groups based on the location of expression (nuclear, membranous, and cytoplasmic). There was no change in the nuclear marker p53, but there was a slight change in Ki-67 (see above). There was a trend toward less expression in the membranous markers of E-cadherin and epidermal growth factor receptor, which started to appear at the 2-hour mark. For the cytoplasmic markers including keratins (AE 1/3, CAM-5, and CK7), EMA, GCDFP-15, and mammaglobin, there was only a change in the quality of the staining.

Pinhel et al[21] found that p-AKT and p-Erk1/2 expression were highly significantly lower in sample C than in the mean of samples A and B. Near-complete absence of staining (H-score <5) occurred in six of 21 C samples for p-Akt and eight of 21 samples for p-Erk1/2, despite mean A/B values for some of these being among the highest.

Neumeister et al[24] found evidence of increased antigenicity in acetylated lysine, AKAP13, and HIF1A, which are proteins known to be expressed in conditions of hypoxia. The loss of antigenicity for phosphorylated tyrosine and an increase in expression of AKAP13 and HIF1A were confirmed in the biopsy/resection series. These changes were statistically significant.