Breast Cell Atlas Driven by Genomics

Sharon Worcester

December 16, 2019

SAN ANTONIO — Researchers at MD Anderson Cancer Center in Houston and the University of New South Wales (UNSW) in Sydney are among teams from around the world working toward human breast cell atlas development using single-cell genomics, and their efforts to date have yielded new understanding of both the normal breast cell ecosystem and the breast cancer tumor microenvironment.

The work at MD Anderson, for example, has led to the identification of a number of new gene markers and multiple cell states within breast cell types, according to Tapsi Kumar Seth, who reported early findings from an analysis of more than 32,000 cells from normal breast tissue during a presentation at the San Antonio Breast Cancer Symposium.

At the UNSW's Garvan Institute of Medical Research, Alexander Swarbrick, PhD, and his colleagues are working to better define the tumor microenvironment at the single-cell level. At the symposium, Dr. Swarbrick presented interim findings from cellular analyses in the first 23 breast cancer cases of about 200 that will be studied in the course of the project.

Improved understanding of the cellular landscape of both normal breast tissue and breast cancer tissue should lead to new stromal- and immune-based therapies for the treatment of breast cancer, the investigators said.

The Normal Breast Cell Ecosystem

The MD Anderson researchers studied 32,148 stromal cells from pathologically normal breast tissues collected from 11 women who underwent mastectomy at the center.

Unbiased expression analysis identified three major cell types, including epithelial cells, fibroblasts, and endothelial cells, as well as several minor cell types such as macrophages, T-cells, apocrine cells, pericytes, and others, said Ms. Seth, a graduate student in the department of genetics at the center and a member of the Navin Laboratory there.

The work is designed to help identify the presence and function of cells and explain how they behave in a normal breast ecosystem, she said.

"We know that a female breast undergoes a lot of changes due to age, pregnancy, or when there is a disease such as cancer, so it's essential to chart out what a normal cell reference would look like," she said.

Toward that goal, a protocol was developed to dissociate the tissue samples within 2 hours due to the decline in viability seen in cells and RNA over time. Analysis of the cell states revealed different transcriptional programs in luminal epithelial cells (hormone receptor positive and secretory), basal epithelial cells (myoepithelial or basement-like), endothelial cells (lymphatic or vascular), macrophages (M1 or M2) and fibroblasts (three subgroups) and provided insight into progenitors of each cell types, she said.

A map was created to show gene expression and to identify transcriptomally similar cells.

"We were able to identify most of the major cell types that are present in human breasts," she said. "What was interesting was that the composition of these cells also varied across women."

For example, the proportion of fibroblasts was lower in 3 of the 11 patients, and even though the cells were pathologically normal, immune cell populations, including T-cells and macrophages, were also seen.

Adipocytes cannot be evaluated using this technology because they are large and the layer of fat cells must be removed during dissociation to prevent clogging of the machines, she noted, adding that "this is really a limitation of our technology."

A closer look was taken at each of the major cell types identified.

Epithelial Cells

Both canonical and new gene markers were used to identify luminal and basal epithelial cells, Ms. Seth noted.

Among the known markers were KRT18 for luminal epithelial cells and KRT5, KRT6B, KRT14, and KRT15 for basal epithelial cells. Among the new markers were SLC39A6, EFHD1 and HES1 for the luminal epithelial cells, and CITED4, CCK28, MMP7, and MDRG2 for the basal epithelial cells.

"We went on and validated these markers on the tissue section using methods like spatial transcriptomics," she said, explaining that this "really helps capture the RNA expression spatially," and can resolve the localization of cell types markers in anatomical structures.

For these cells, the expression of the newly identified gene markers was mostly confined to ducts and lobules.

In addition, an analysis of cell states within the luminal epithelial cells showed four different cell states, each of which have "different kinds of genes that they express, and also different pathways that they express, suggesting that these might be transcriptomally different," Ms. Seth said.

Of note, these cells and cells states are not biased to a specific condition or patient, suggesting that they are coming from all of the patients, she added.

Two of the four cell states – the secretory and hormone responsive states – have previously been reported, but Ms. Seth and her colleagues identified two additional cell states that may have different biological functions and are present in the different anatomical regions of the breast.


Fibroblasts, the cells of the connective tissue, were the most abundant cell type. Like the epithelial cells, both canonical collagen markers (COL6A3, MMP2, FBN1, FBLN2, FBN, and COL1A1) and newly identified gene markers (TNXB, AEBP1, CFH, CTSK, TPPP3, MEG3, HTRA1, LHFP, and OGN) were used to identify them.

Endothelial Cells

Breast tissue is highly vascular, so endothelial cells, which line the walls of veins, arteries, and lymphatic vessels, are plentiful.

"Again, for both these cell types, we identified them using the canonical marker CD31, and we identified some new gene markers," she said, noting that the new markers include CCL21, CLDN5, MMRN1, LYVE1, and PROX1 for lymphatic endothelial cells, and RNASE1 and IFI27 for vascular endothelial cells.

Two different groups – or states – of vascular endothelial cells were identified, with each expressing "very different genes as well as very different pathways, again suggesting that they might have different biological functions, which we are still investigating," she said.

Additional Findings and Future Directions

In addition to stromal cells, some immune cells were also seen. These included T cells that came mostly from two patients, as well as macrophages and monocytes, which comprised the most abundant immune cell population.

Of note, all of these cells are also found in the tumor microenvironment, but they are in a transformed state. For example cancer-associated fibroblasts, tumor endothelial cells, tumor-associated macrophages, and tumor-associated adipocytes have been seen in that environment, she said.

"So what we are trying to do with this project is...learn how these cells are, and how these cells behave in the normal ecosystem," she explained, noting that the hope is to provide a valuable reference for the research community with new insights about how normal cell types are transformed in the tumor microenvironment.

In an effort to overcome the adipocyte-associated limitation of the technology, adipocytes are "now being isolated by single nucleus RNA sequencing."

"This [sequencing] technology has helped us identify multiple cell states within a cell type; and most of these cell states may have different biological functions, which probably can be investigated by spatial transcriptomic methods," she said.

Spatial transcriptomics also continue to be used for validation of the new gene markers identified in the course of this research, she noted.

The Breast Tumor Microenvironment

At the Garvan Institute, current work is focusing more on defining the landscape of the breast tumor microenvironment at single-cell resolution, according to Dr. Swarbrick, a senior research fellow and head of the Tumour Progression Laboratory there.

"Breast cancers...are complex cellular ecosystems, and it's really the sum of the interactions between the cell types that play major roles in determining the etiology of disease and its response to therapy," he said. "So I think that going forward toward a new age of diagnostics and therapeutics, there's wonderful potential in capitalizing on the tumor microenvironment for new developments, but this has to be built on a really deep understanding of the tumor microenvironment, and — I might say — a new taxonomy of the breast cellular environment."

Therefore, in an effort to address "this limitation in our knowledge base," his lab is also working toward development of a breast cell atlas.

A fresh tissue collection program was established to collect early breast cancer tissues at the time of surgery, metastatic biopsies, and metastatic lesions from autopsies. The tissues are quickly dissociated into their cellular components and they undergo massively parallel capture and sequencing using the 10x genomics platform, he said.

Thousands of cells per case are analyzed using single-cell RNA sequencing (RNA-seq), as well as "RAGE-seq" and "CITE-seq," which are performed in parallel to the RNA sequencing to address some of the limitations of the RNA sequencing alone and to "try to gain a multi-omic insight into the cell biology," he explained.

RAGE-seq, which Dr. Swarbrick and his team developed, "is essentially a method to do targeted long-read sequencing in parallel to the short-read sequencing that we use for RNA-seq," and CITE-seq is "a really fantastic method developed at the New York Genome Center that essentially allows us to gather proteomic data in parallel to the RNA data," he said.

Based on findings from the analyses of about 125,000 cells from 25 patients, a map was created that showed the cell clusters identified by both canonical markers and gene expression signatures.

"We find the cell types we would expect to be present in a breast cancer," he said.

The map shows clusters of myeloid, epithelial-1 and -2, cancer-associated fibroblast (CAF)-1 and -2, endothelial, T Reg, B, and CD8 and CD4 T cells.

Next, each cell type is quantified in each patient, and a graphic representation of the findings shows large variability in the proportions of each cell type in each patient.

"Ultimately, our goal is to be able to relate the frequencies of cell types and molecular features to each other, but also to clinical-pathological features from these patients," he noted.

A closer look at the findings on an individual case level demonstrates the potential for development of better therapies.

For example, a case involving a high-grade triple-negative invasive ductal carcinoma exhibited each of the cell types found overall.

"One of the things that strikes us early on is we see a number of malignant epithelial populations," he said, noting that proliferation is one of the drivers of the heterogeneity, but that heterogeneity was also seen for "other clinically relevant features such as basal cytokeratins," which were heterogeneously expressed in different cell-type clusters.

"This was kind of paralleled in the immunohistochemistry results that we obtained from this patient," he said. "We could also apply other clinically used tests that we've developed on bulk (such as PAM50 intrinsic subtyping) and ask whether they can be applied at the single-cell resolution.

"We think that these are going to be great tools to try to now get in and understand the significance of this heterogeneity and try to identify the lethal cells within this patient, and potentially therapeutic strategies to eradicate those cells," he added.

Targeting Fibroblasts

A notable finding of this project was the presence of "not one, but two populations of fibroblasts," Dr. Swarbrick said, noting that fibroblasts are typically discussed as a single entity.

"This is arguing that there are at least two major types present within the breast, and almost every case has these populations present at roughly equal amounts," he said.

This is of particular interest, because it has been shown in prior studies that targeting fibroblasts can have therapeutic outcomes.

"So we think this is a very important population within the tumor microenvironment," he added.

With respect to gene expression features, CAF-1 is dominated by signatures of extracellular matrix deposition and remodeling, which "look like the classic myofibroblasts that we typically think of when we study cancer-associated fibroblasts."

"In contrast, the CAF-2 population...have what appears to be quite a predominant secretory function, so we see a lot of cytokines being produced by these cells, but we also see a very high level of expression of a number immune checkpoint ligands," he said, adding that his team is actively pursuing whether these cells may be undergoing signaling events with infiltrating lymphocytes in the tumor microenvironment.

The signatures for both CAF types are prognostic within large breast cancer data sets, suggesting that they do actually have an important role in disease, he noted.

Markers for these cells include ACTA2, which was previously known to be a marker, and which is almost exclusively restricted to CAF-1, and the cell surface protein CD34 — a progenitor marker in many different cellular systems, "which is actually beautifully expressed on the CAF-2 population" as demonstrated using CITE-seq.

"So we're now using this as a way to prospectively identify these cells, pull them out of tumors, and conduct biologic assays to learn more about them," he said.

The Immune Milieu

"We're in the age of immunotherapy, and this is an area of huge interest, but we have a long way to go in making it as effective as possible for breast cancer patients," Dr. Swarbrick said. "I believe part of that is through a very deep understanding of the taxonomy."

RNA data alone are useful but insufficient to fully identify subsets of immune cells due to a "relatively low-resolution ability to resolve T cells."

"But because we're now using the panel of 125 antibodies in parallel, we can now start to use protein levels to split up these populations and we can start to now identify, with higher resolution, more unique populations within the environment," he said, noting that the availability of protein data not only helps identify subtypes, but is also therapeutically important as it allows for certainty regarding whether the protein target of therapeutic antibodies is expressed on the surface of cells.

Ultimately the hope is that this effort to build a multi-omic breast cancer atlas will continue to drive new discoveries in personalized medicine for breast cancer, Dr. Swarbrick concluded, adding that the field is moving fast, and it will be very important for labs like his and the Navin Lab to communicate to avoid needlessly duplicating efforts.

"I think it's going to be really exciting to start to put some of these [findings] together," he said.

The MD Anderson project is funded by the Chan Zuckerberg Initiative as part of its work in supporting the Human Cell Atlas project. Ms. Seth reported having no disclosures. Dr. Swarbrick's research is funded by the Australian Government/National Health and Medical Research Council and the National Breast Cancer Foundation. He reported having no relevant disclosures.

This story first appeared on

Source: Seth T et al. SABCS 2018, Abstract GS1-02; Swarbrick A et al. SABCS 2018, Abstract GS1-01


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.