Stratification of Follicular Lymphoma

Time for a Paradigm Shift?

Steven H. Kroft, MD


Am J Clin Pathol. 2019;151(6):539-541. 

Follicular lymphoma (FL) is a common malignancy that has been recognized as a morphologic entity for many decades; has been extensively studied clinically, pathologically, and biologically; and is easily diagnosed by most pathologists. It is stratified for prognostication and management purposes by division into morphologic grades using a clearly explicated system (Mann-Berard)[1] that has been employed for many years and that has been in essentially universal, exclusive use since it was codified as the preferred grading method in the 2001 World Health Organization classification.[2]

Yet, we are bad at grading FLs.

Most of the time, FL consists of an admixture of centrocytes (sometimes referred to as small cleaved cells) and centroblasts (sometimes called large noncleaved cells), mimicking the cellular composition of reactive follicles. The Mann-Berard system requires the assessment of the average number of centroblasts in 10 representative ×40 fields of neoplastic follicles. Centroblast counts of 0 to 4, 5 to 14, and 15 or more define grades 1, 2, and 3, respectively. Subsequently, grade 3 FL was subdivided into grades 3a and 3b, with FL3b defined as follicles composed purely of centroblasts, with no admixed centrocytes. Although there have been surprisingly few studies formally evaluating the reproducibility of the Mann-Berard system, there has been little doubt since shortly after it was proposed in 1983 that interobserver reproducibility was poor.[3] The Non-Hodgkin Lymphoma Classification Project found that five expert hematopathologists agreed with a consensus FL grade only 61% to 73% of the time.[4] Again, these were experts. Furthermore, a relatively recent study documented that five hematopathologists independently assigned grades that among them spanned the entire spectrum of grades 1 to 3 in 35% of FL cases.[5] The reproducibility of the 3a/3b distinction has not been systematically studied. However, two recent studies employed a category of FL grade 3-unclassified for cases in which consensus could not be reached on the distinction between grades 3a and 3b.[6,7]

The causes of this lack of reproducibility are multiple. Grading of FL would be fairly straightforward if FL cells always followed the rules. But, alas, by their very nature, malignant cells are very bad at staying in the boxes we create for them. Thus, we run into problems such as small centroblasts, large centrocytes, and blastoid cells. We also have to discriminate follicular dendritic cells (large and nucleolated but nonlymphoid) from centroblasts. Next, throw in intratumoral heterogeneity of centroblast counts (how to select those 10 "representative" high-power fields?). Then, layer on the frequent issues of poor fixation and poor-quality histology, which exacerbate all of the previously mentioned issues. Finally, there is the banal but nettlesome fact that not all high-power fields are the same size. With all of this working against us, can we find ways to make pathologists better at this? Yes, but it takes a lot of work and rigidly controlled conditions.[5]

What makes this lack of reproducibility particularly alarming in 2019 is that grading is driving choices between dramatically different forms of treatment. Specifically, grades 1 to 2 (grouped as "low-grade" FL") are typically being managed conservatively, with a watch-and-wait approach, rituximab alone, or rituximab accompanied by an immunomodulator, depending on clinical findings. In contrast, FL3b is commonly treated similarly to diffuse large B-cell lymphoma, with multiagent chemotherapy plus rituximab. Treatment of FL3a remains more unsettled, although increasingly this group is being managed similarly to grades 1 to 2. However, these approaches to "high-grade" FL are not based on strong clinical data but instead on extrapolations from apparent biologic differences between the grades. Specifically, there is little doubt that FL1 and FL2 constitute a biologically cohesive group, with uniform genetic and immunophenotypic features (eg, high rate of BCL2 rearrangement, rare BCL6 rearrangement, and expression of CD10 and bcl-2), whereas grade 3b seems quite different (eg, low rate of BCL2 rearrangement, frequent BCL6 rearrangement, low rates of CD10 and bcl-2 expression).[7] FL3a lives somewhere between these extremes but is generally perceived as being more similar to low-grade FL than FL3b. This alignment is conceptually appealing to morphologists, insofar as grade 3a shares with grades 1 to 2 an admixture of neoplastic centrocytes and centroblasts, whereas FL3b is a tumor composed exclusively of centroblasts and is also usually accompanied by areas of diffuse large B-cell lymphoma (DLBCL). However, there are conflicting data as to whether grade 3a FL behaves more similarly to low-grade FL (incurable with a multiply relapsing course) or more aggressive lymphomas (sharp survival curve drop-off but with a plateau indicating potential curability).[6,8] Data are also conflicting with respect to whether FL3b is truly clinically similar to DLBCL.[9] Furthermore, even the abovementioned biologic distinctions are not clear cut, with gene expression profiling data suggesting that FL3b is more closely related to other types of FL than to DLBCL, as well as a close relationship between grades 3a and 3b.[10,11] In addition, grade 3b cases with and without diffuse components may be biologically distinct. When considered in the context of the poor reproducibility of morphologic grading, however, all of these conflicting data are perhaps not surprising.

Nevertheless, despite all of this uncertainty, FL morphologic grading is driving management decisions in the real world. As a means of making weighty decisions between very different treatment regimens, though, it is hard to avoid the conclusion that FL grading is a bad laboratory test. So, is there a better way to go about this? The value of Ki-67 proliferation rate has been explored by several groups. Proliferation index (PI) clearly increases with increasing morphologic grade, and while early data failed to demonstrate added value of PI on top of morphologic grading,[12] more recent data appear to show that a high PI identifies a group of low-grade FL with more aggressive behavior.[13,14] In this issue of the American Journal of Clinical Pathology, Khieu et al[15] explore a different approach, applying immunohistochemistry for phosphohistone 3 (PHH3) to the enumeration of mitotic activity in the neoplastic follicles of FL. In their study, these authors show that staining for PHH3, a mitotic-specific marker, is an easily assessed, reproducible means of assessing mitotic activity in FL, with a good intraclass correlation coefficient of 0.86 (although it is worth noting that they also found good intraclass correlation coefficients for both Ki-67 PI and H&E mitotic counts, 0.85 and 0.78, respectively). The authors also found that high PHH3 counts had better sensitivity and specificity for grade 3 FL (94.1% and 80.8%) than either Ki-67 PI (94.1% and 61.5%) or H&E mitotic count (88.2% and 57.7%), using optimized cutoffs. Notably, the authors documented discrepancies between PHH3 count and Ki-67 PI in 28% of cases. This included six low-grade FLs (out of a total of 26) with Ki-67 counts above the optimal cutoff but with low PHH3 counts. The authors do not provide outcome information for the discrepant cases.

These data are interesting but are probably best considered a proof of principle, as well as a starting point that suggests additional avenues of investigation. The issue, of course, is that Khieu et al[15] employed a flawed gold standard: morphologic grading. (Notably, the authors confirmed the reproducibility problem, reaching a consensus grade that differed from that in the original diagnostic report in 47% of cases.) The frame of reference employed in this and many other studies, both pathologic and clinical, could be called "morphocentric." This approach is understandable since, after all, morphology remains the foundation of the diagnosticprocess in anatomic pathology and is the manifestation of tumor "biology" most accessible to histopathologists. However, since the revised European-American lymphoma classification was published in 1994,[16] we have been engaged in a progressive and inexorable paradigm shift, in which gold standards are defined by any and all means at our disposal, and individual tests (eg, morphologic grading) are assessed for their performance characteristics against that gold standard. FL grading seems to be a good example of the limitations of morphology as a window into biology and may no longer be a good starting point and frame of reference for investigating this disease. It seems likely that we will not finally achieve clarity in our understanding of FL, let alone provide reliable guidance for management decisions, until we change our conceptual vantage point and leave our morphocentric approach behind.