The Effects of Soy Supplementation on Gene Expression in Breast Cancer: A Randomized Placebo-controlled Study

Moshe Shike; Ashley S. Doane; Lianne Russo; Rafael Cabal; Jorge S. Reis-Filho; William Gerald; Hiram Cody; Raya Khanin; Jacqueline Bromberg; Larry Norton


J Natl Cancer Inst. 2014;106(9) 

In This Article


Study Design

The objective of this randomized, placebo-controlled study was to investigate the effects of soy supplementation on the molecular features of BC, including gene expression profiles and markers of BC risk ( identifier: NCT00597532; The primary endpoint of the study was comparison of changes in proliferation (Ki67) and apoptosis (Cas3) between the two groups. Secondary outcomes were changes in gene expression by NanoString and expression by microarrays and qPCR. Women with invasive BC scheduled for resection were randomly assigned to receive supplements of soy protein (intervention) or milk protein (placebo). Supplementation lasted from the initial surgical consultation to the day before surgery (minimum 7 days, maximum 30 days). Tissue from the diagnostic core biopsy was analyzed for gene expression by NanoString and for markers of cell proliferation and apoptosis by immunohistochemistry (IHC). Results were compared with those in the posttreatment tissue obtained at the time of surgery. Expression analysis by oligonucleotide microarray and qPCR were performed using total RNA isolated from frozen tissue. Microarray, qPCR, and NanoString gene expression studies were performed whenever tissue was available for research purposes. For inclusion and exclusion criteria, see the SupplementaryMethods (available online). The study was approved by MSKCC's Institutional Review Board, and all patients signed informed consent. All patients completed a modified dietary intake questionnaire including soy foods.[20,21]


Soy and placebo were dispensed by the hospital pharmacist in identically appearing packets containing 25.8g soy protein powder or 25.8g milk protein. All patients were counseled to consume two packets/day, mixed with water or juice from the day of consent through the day before surgery. Research staff and participants were blinded to group assignments. Full contents of soy and placebo are listed in the Supplementary Methods (available online).

Plasma Isoflavones

Blood samples were obtained at time of consent and day of surgery to measure plasma genistein and daidzein by high performance liquid chromatography (HPLC).[22,23]

NanoString and qPCR

NanoString and qPCR analyses are described in the Supplementary Methods (available online).

Immunohistochemistry and Pathology

Routine pathologic assessment of the initial diagnostic and subsequent surgical specimen was performed on all specimens. Tissues were fixed in 10% buffered formalin for 6 to 48 hours, routinely processed and embedded in paraffin. Formalin-fixed, paraffin-embedded (FFPE) tissue sections were used for the study when available, following completion of routine clinical histopathologic examination and sign out.

Immunohistochemical detection was performed using streptavidin-biotin-peroxidase and microwave antigen retrieval methodology.[24] Human epidermal growth factor receptor 2 (HER2) positivity was defined as 3+ by IHC, or 2+ by IHC with gene amplification of 2.0 or greater. Amplification was measured by fluorescence in situ hybridization (FISH).[25] ER status was determined by IHC, and samples were considered positive if greater than 1% of cell nuclei were immunoreactive.

IHC for Ki67 and Cas3 was performed on representative FFPE tissue sections identified by the study pathologist in the Research Immunohistochemistry Core Laboratory of MSKCC on a Discovery XT instrument (Ventana). The Cas-3 (Asp175) antibody was from Cell Signaling (catalog #9661, rabbit polyclonal), and the dilution was 1:400 for 60 minutes. The Ki-67 antibody was clone MIB-1 (Dako Cytomation, Catalog# M7240, mouse monoclonal) and dilution was 1:400 for 60 minutes. Cell lines and tissue samples known to express the antigen under study were used as positive controls.

IHC scoring was performed using deidentified samples, without any information on clinical characteristics or study group assignment. Cells with positive Cas3 and Ki67 staining were counted in 10 high-power (40x objective) fields selected to represent the spectrum of staining seen on review of the whole section.[26] IHC score was expressed as the percentage of positively staining tumor cells among the total number of tumor cells counted. At least 1000 malignant cells were evaluated for each specimen, and only nuclear staining was considered positive.

Statistical Analysis

The clinicopathological and demographic characteristics were compared between soy and placebo groups using the Fisher's exact test for categorical variables and the Wilcoxon rank-sum test for continuous variables. Plasma isoflavones, and Ki-67 and Cas3 indices (% of positive cells) were assessed within groups (patient matched post/pre) using the Wilcoxon matched-pairs, signed-rank test, and between groups (soy-treated difference vs placebo-treated difference) using the Wilcoxon rank-sum test. The effect of treatment on NanoString gene expression was evaluated within groups using the paired t test. To compare NanoString gene expression between groups, the fold change (post/preratio) for each sample was compared using the Student's t-test. For qPCR data, the average normalized qPCR value for a gene was used to compare gene expression between groups, and statistical significance was determined by unpaired t test with Welch's correction (assume unequal variance). Correlation between genistein and daidzen and association between paired values were computed using the Pearson method.

All statistical tests were two sided, and P values of less than .05 for NanosString and less than .01 for microarrays were considered statistically significant. Analyses and data visualization were performed using GraphPad Prism for Mac OSX v. 6.0b (GraphPad Software,, Partek Genomics Suite 6.6, R version 3.0.1 (, and Bioconductor version 2.13.[27]

Microarray and Bioinformatics Analysis

Affymetrix Human U133 Plus 2.0 microarray gene expression was analyzed using tools in Partek Genomic Suite 6.6 software (Partek Incorporated, The Robust Multiarray Analysis (RMA) algorithm was used for global normalization and probeset summarization. Differentially expressed (DE) genes were determined using Student's t test (unpaired equal variance) at a statistical significance level 0.01 and absolute fold-change of 2 or greater. Hierarchical clustering was performed by Euclidian distance and average linkage method in Partek Genomics Suite 6.64. Ingenuity Pathways Analysis ( was performed with default settings. DAVID Functional Gene Classification Tool with default settings was used for pathway analysis with FDR = 0.01.

Gene Set Analysis (GSA) was performed using the Bioconductor package piano ( in R statistical language ([28] The main function runGSA was applied with default parameters using fold changes as gene-level statistics and gene-set collections from the Broad Molecular signatures database (MSigDB: Only gene-sets with adjusted P values less than .01 were reported. An Additional filtering step was applied that limited gene sets to those in which at least half of the genes in the gene set showed fold changes of at least 50%.

To predict molecular subtype of samples measured by microarrays, we obtained an independent set of 204 BCs with known molecular class assignments (data set GSE12276).[29] PAM50 genes were mapped to 139 probe sets according to gene symbol using the online NetAfxx portal ( Molecular class was predicted for each BC sample of the training and test sets using a nearest centroid model based on the expression of the PAM50 genes and using the Partek Genomic Suite 6.6 software.[30,31] A leave-one-out cross validation was used to estimate prediction accuracy in the training set.