Understanding the Biology of Sex and Gender Differences: Using Subgroup Analysis and Statistical Design to Detect Sex Differences in Clinical Trials

Sarah K. Keitt, MPH, Catherine R. Wagner, BS, Sherry A. Marts, PhD

Abstract and Introduction

In May 2000, a General Accounting Office (GAO) report revealed that although women are now participating in clinical trials in numbers proportionate to their numbers in the general population, data collected in these trials are not routinely analyzed by sex.[1] Without such sex analysis, clinically relevant information about potentially lifesaving treatments could be lost. In July 2001, the Society for Women's Health Research convened a workshop to address strategies for conducting subgroup analyses to detect sex differences. Workshop participants concluded that understanding sex differences will enable medical researchers to design healthcare interventions for both men and women more effectively and that one can plan for and conduct sex analysis without compromising the quality of the study or making the study prohibitively expensive.

In May 2000, a General Accounting Office (GAO) report revealed that although women are now participating in clinical trials in numbers proportionate to their numbers in the general population, data collected in this research are not routinely being analyzed by sex.[1] Without such analysis, clinically relevant information about potentially lifesaving treatments could be lost. On July 17, 2001, the Society for Women's Health Research convened a workshop in Washington, DC, for clinical researchers. "Subgroup Analysis and Statistical Design for Detecting Sex Differences" was the second in a series of Society-sponsored workshops designed to address the issues surrounding the inclusion of women in clinical research, and specifically addressed the need and the methodologies available for appropriate sex analysis of clinical trial data. This article provides a summary of presented lectures and discussion during the conference.

As argued in the Institute of Medicine's (IOM) landmark publication, Exploring the Biological Contributions to Human Health: Does Sex Matter?, understanding sex differences at all levels of the human body -- from the single cell to the whole organism -- will enable medical researchers to more effectively design healthcare interventions for both men and women.[2] Such understanding will not come without careful analysis of research data by sex. Critics of sex analysis have claimed that designing and conducting scientifically rigorous trials with enough statistical power to detect sex differences is both cost and time prohibitive.[3] However, the scientific community is beginning to recognize the importance of examining sex differences in the pharmacokinetics and pharmacodynamics of drugs.[4-6]


Susan Wood, PhD, Director of the Office of Women's Health at the U.S. Food and Drug Administration (FDA), provided the charge to the workshop participants by urging those involved in the fields of analysis and statistics to take up the challenge of conducting scientifically rigorous subgroup analyses with the ultimate goal of improving health outcomes for both men and women.

Dr. Wood recalled how, prior to 1993, women had limited opportunities to participate in medical research as a result of 2 medical disasters. In the 1950s and early 1960s, thousands of babies were born with severe limb malformations as a result of in utero exposure to thalidomide.[7] In the early 1970s, research revealed that the daughters of women who took diethylstilbestrol during pregnancy had an increased risk of vaginal cancer.[8] Together, these medical disasters led the FDA, industry, researchers, and the public at large to conclude that women who could become pregnant were not appropriate subjects in clinical drug trials. In 1977, the FDA issued guidelines that required women of childbearing potential to be excluded from drug trials (except for drugs used in the treatment of life-threatening or serious diseases) until teratogenicity data from animal studies of the drug were available.[9] Since most of these teratogenicity studies were not completed until after phase 2 and 3 trials were under way, the guideline effectively barred women from most early-phase clinical trials.

Throughout the 1980s and 1990s, several landmark reports led to a reexamination of the issue of inclusion of women in clinical trials. In 1985, a report of the Public Health Task Force on Women's Health Issues concluded that the lack of a research focus on women's health issues compromised the quality of health information available to women as well as the healthcare they received.[10] In answer to this report, in 1986 the National Institutes of Health (NIH) issued guidelines encouraging, but not requiring, the inclusion of women in federally funded clinical trials.[11] However, a 1990 report by the GAO found that the inclusion guidelines were not enforced and that women were not routinely included in clinical trials.[12]

In 1992, a GAO investigation of information submitted to the FDA found that 25% of manufacturers made no effort to recruit representative numbers of women into drug trials. The report noted that although women were included in the clinical trials for all the classes of drugs examined in the survey, they were generally underrepresented.[13] To rectify this situation, in 1993 the FDA reversed its guideline banning women of childbearing age from early-phase clinical trials. Changes to policies and guidelines continued throughout the 1990s. In 1997, the FDA Modernization Act charged the FDA to examine its regulations and policies regarding the inclusion of women and minorities in clinical trials. By the year 2000, the GAO concluded that women were participating in clinical trials at rates proportional to disease incidence. Unfortunately, sex analysis of resulting data was not routinely conducted.[14] The importance of both inclusion and data analysis was made clear in a report issued by the GAO in 2001. The GAO found that of the 10 prescription drugs that were withdrawn from the market between 1997 and 2001, 8 had greater health risks for women.[15]

Many drugs exhibit sex differences in pharmacokinetics (the measurement of the absorption, distribution, metabolism, and excretion of drugs by the body); however, dosing changes due to established pharmacokinetic differences are rare. Dr. Wood described a study conducted by the FDA that examined 300 new drug applications (NDAs) and determined that 72 drugs out of the 300 (24%) examined were metabolized via the cytochrome P450 3A4 (CYP3A4) pathway and exhibited a sex difference in pharmacokinetics. Some of these drugs were metabolized more rapidly in women than in men; others were metabolized more slowly in women. Despite these differences, there were no differences in the recommended doses for men and women.

Dr. Wood summarized the findings from a 2001 report by the IOM, "Small Clinical Trials: Issues and Challenges,"[16] which outlined strategies for conducting clinical trials and data analysis on studies with a small sample size. This report recommended that researchers design their studies to detect sex differences a priori, clarify their methods of reporting sex differences, exercise caution in interpreting data related to these studies, and research alternative study designs to discover potentially critical sex differences.[16]

From Basic Research to Post-Market Testing: What Are the Questions We Should Be Asking?

In discussing the importance of pharmacokinetics in drug development, Dr. Helen Pentikis, Director of Pharmacokinetics-Biopharmaceutics and Director of Product Development and Management at Globomax, LLC (Hanover, Maryland), stated that sex differences in pharmacokinetics are well established. Dr. Pentikis described pharmacokinetic analysis in early-stage phase 1 safety trials during which safe dosing is determined by statistical evaluation of such pharmacokinetic measures as maximum drug concentration (Cmax), area under the curve of the plot of concentration vs. time, time to maximum concentration, and half-life of the drug in the circulation.

Dr. Pentikis presented examples of how both traditional and population pharmacokinetic studies can be used to detect sex differences in drug action. She described a study by Luzier and colleagues,[17] who used traditional pharmacokinetic methods to examine the pharmacokinetics and pharmacodynamics (see footnote*) of the beta-adrenergic-blocking drug metoprolol in 10 male and 10 female volunteers. Although the population size was small, researchers found a clear difference in the pharmacokinetic profile of the men compared with that of the women. Women had greater drug exposure than men, with higher maximum concentration and larger area under the plasma concentration-time curve. Women also had a greater reduction in exercise heart rate and systolic blood pressure. However, the area under the effect curve was significantly greater only for heart rate. Luzier and colleagues concluded that sex differences exist in the pharmacokinetics of metoprolol, resulting in greater drug exposure in female subjects. However, pharmacodynamic data showed that the relationship between drug concentration and effect on systolic blood pressure did not differ between men and women. Therefore the observed differences in drug effects were the result of sex-specific differences in metoprolol pharmacokinetics, not pharmacodynamics.[17]

She also described population pharmacokinetic methods that can be applied during phase 2 and 3 trials. Population pharmacokinetic methods allow a researcher to describe the concentration-time profile and examine the drug's metabolism in volunteers comparable to those for whom the drug would be prescribed.

Dr. Pentikis provided an example of a study that used population pharmacokinetic methods to detect subgroup differences. The study of an investigational drug for steroid-dependent Crohn's disease involved 244 participants receiving an investigational drug. Researchers collected pharmacokinetic data for 1561 timed samples and analyzed the results for the effects of age, sex, race, total body weight, lean body weight, kidney function, liver function, and concomitant medications. After controlling for variables other than sex, there was much lower drug exposure and concentration of the drug in the circulation of men than in women. Dr. Pentikis stressed that normalizing the dose by weight did not provide for equal exposure and suggested that dosing of this drug should have been stratified by sex. At the time of the workshop, the study sponsor commented on the difficulty of marketing a drug with differential dosing.

Carl Peck, MD, Professor of Pharmacology and Medicine at the Center for Drug Development Science at Georgetown University (Washington, DC) noted that during the drug development process, pharmacokinetic/pharmacodynamic analysis is often done after phase 2 or 3 trials are well under way or completed. Many companies seem willing to move forward with information from male subjects, and only later look for sex differences, often after dosing regimens have been established.

Dr. Peck referred to the lipid protease inhibitor tirilazad as an example of how untimely subgroup analysis can lead to a drug's failure in the approval process. In animal models of stroke, tirilazad was highly effective in reducing damage in the brain from infarcts. After 4 phase 2 trials of tirilazad in Europe and the United States failed to show positive results, Dr. Peck performed a retrospective review of the data. He found that the most likely explanation for the lack of effectiveness was that the study subjects were underdosed. A careful study of the pharmacokinetic and pharmacology data revealed that the drug clearance rate in women was 149% greater than the clearance in men. In other words, the same dose of drug would result in two thirds less drug in the blood and brain of women compared with men. Because men and women in the trials were given the same dose of the drug, women received subadequate doses and the drug appeared to be ineffective. If women were given a higher dose, it is possible that the drug could have been shown to be effective, and because the data from the phase 1 and 2 trials were not analyzed by sex, a potentially efficacious drug was lost.

Dr. Peck proposed a novel model for conducting phase 1 clinical trials. The first step in early clinical development would be a phase 1 trial in which data on pharmacokinetics, metabolism, safety, and pharmacodynamics would be collected from approximately 18 male subjects. Next, using Bayesian analysis methods and data from the male subjects, one would evaluate similar data from approximately 9 women. If the distribution of the data from the small number of females was found to be similar to that of the males, one could assume that there are no significant sex differences and proceed to phase 2 and 3 trials using the same dose for both males and females. However, if the female distributions were found to be different, the next step would be an intensive study of approximately 19 women to determine appropriate dosing for females. During later phase trials, researchers could normalize the study population to reflect the sex distribution of the target population.

Dr. Peck's proposal addresses the need for a more efficient method of study design and data analysis in early-phase trials in both sexes. This would prevent drug development from entering phase 2 and 3 trials without the necessary preliminary data on sex differences.

Analysis of data for sex differences does not end with the phase 2 and 3 clinical trials. It should extend well into the postmarketing period to detect adverse events that may not have been discovered during the clinical trial process. Ana Szarfman, MD, PhD, Medical Officer, Center for Drug Evaluation and Research (CDER), U.S. Food and Drug Administration, provided workshop participants with an overview of methodologies used by the FDA as part of their postmarketing surveillance (see footnote†). She explained that since 1998, the FDA has been studying the application of empirical Bayesian statistical techniques coupled with data visualization tools to analyze data from the FDA MedWatch database of voluntary reports of adverse drug events (see footnote‡).[18-28] Evaluation of this database has helped detect new serious drug events and population subgroups at a higher risk for adverse events.[29,30]

There are significant problems associated with systematically analyzing and interpreting voluntarily submitted data, such as variations in reporting and coding practices due to changes in requirements over the years and the presentation of information on multiple drugs, medical conditions, and events in a single report. Other difficulties include chronic underreporting of adverse drug reactions, occasional publicity-driven and litigation-driven episodes of overreporting and misreporting; considerable variability in the quality and completeness of the information contained in each data field, including dosage, formulation type, timing of exposure, and length of exposure and follow-up; and lack of meaningful and reliable drug exposure data and adverse event background information. Analysis must be done without the benefit of an a priori research protocol randomization. The extraction of useful information from this database presents multiple challenges, including managing, storing, and analyzing a large amount of data and resolving data miscoding.[18] There is a need for analytical tools to systematically detect and analyze potential serious adverse events, especially the multiple, synergistic interactions that may lead to these events. Unpredictable and complex associations cannot be efficiently studied by conventional statistical methods.

To address these issues, the FDA developed the Gamma Poisson Shrinker (GPS) algorithm and computer program, which allows systematic detection of drug-event combinations occurring with greater than expected frequency, without the use of external exposure data or adverse event background information. The GPS program computes signal scores for drug-event pairs by event, drug, sex, and age group. A signal is defined as a lower 95% Bayesian confidence limit that is ≥ 2 for the adjusted ratios of the observed counts over expected (O/E) counts. This criterion ensures with a high probability that, regardless of count size, the particular drug-event combination occurs at least twice as often as expected under the assumption of randomly paired drug/event reports.

Coupled with the GPS algorithm, the FDA uses the more generalized Multi-Item Gamma Poisson Shrinker (MGPS). The multi-item signals reveal synergistic drug interactions and adverse reaction syndromes. The MGPS algorithm systematically identifies relatively subtle associations even when operating on a database that is known to contain considerable noise. The signals provide an objective and systematic view of the data and alert FDA reviewers to critically important new safety signals.

The graphical results for more than 11,000 product profiles of adverse event associations are posted to an internal CDER Web site for ready access by risk evaluators and other reviewers. Over the past years, studies have been performed to systematically validate the signal scores from retrospective and prospective examination against signals detected by current methods, signals in literature reports and in drug labeling.[18]

Dr. Szarfman uses the MGPS data analysis method to compute signal scores for drug-event pairs separately for each sex. Scores are then visually compared by sex, drug, drug class, event, and event group. MGPS identifies signals affecting nonreproductive organs that are "higher than expected" in one sex over the other. Women systematically have higher signal scores than men for a number of severe adverse drug events, including torsade de pointes, QT prolongation, agranulocytosis, bleeding, renal events, pancreatitis, liver events, rhabdomyolysis, and drug interactions. These signals are seen after stratifying by age and adjusting for the multiplicity of drugs and events per record and appear regardless of the sex distribution of the population using these drugs. However, Dr. Szarfman noted that these sex differences could be related to patient weight or to dose effects that are difficult to discern from spontaneous reports. These results are consistent with the finding of Martin and colleagues that adverse drug reactions to marketed drugs are more common in women than in men.[31]

The standardized data structure of the MedWatch database greatly facilitated the systematic data analyses performed by Dr. Szarfman and her colleagues. The implementation of common and standardized data structures across other clinical databases[27] -- for example, health maintenance organization records, pharmacy benefit management databases, and military databases -- would facilitate the development of complementary techniques capable of systematically analyzing sex-related safety signals and synergic interactions between drugs as the data are collected. This implementation will not only help sort out the effects of sex seen in the database of spontaneous reports but will reduce the time currently needed to systematically identify sex-related adverse drug events and synergistic interactions between drugs that require sex-specific dosage adjustments.

*Pharmacodynamics (PD) is the response of the body to a given dose of a drug over time. The lack of documented PD differences is due in part to the more reliable methodology and data analysis techniques available to determine and analyze pharmacokinetic differences. Dr. Pentikis explained that trials are routinely designed to look for pharmacokinetic differences, whereas analysis for PD differences is rare.
†This section was coauthored by Ana Szarfman, MD, PhD ( szarfman@cder.fda.gov) and Stella Green Machado, PhD, ( machados@cder.fda.gov), Quantitative Methods and Research Staff, Office of Biostatistics, Office of Pharmacoepidemiology and Statistical Science, Center for Drug Evaluation and Research, Food and Drug Administration.
‡The Bayesian data analysis and visualization tools were developed with grants from the Office of Women's Health, CDER, FDA, and from an "Unmet Needs" grant from the National Centers for Disease Control and Prevention (CDC), Department of Health and Human Services. Drs. Szarfman and Machado thank Henry Rolka from CDC for his valuable feedback.

Meta-Analysis: The Good, the Bad, and the Ugly

Meta-analysis refers to the statistical analysis of a large collection of results from individual studies for the purpose of integrating the findings.[32] In clinical trials, meta-analysis is often used to combine results from several studies to produce a single estimate of the effect of a treatment, and can present both potentials and pitfalls for investigators. Meta-analyses have the potential to prevent delays in the introduction of effective treatments. They are, however, subject to numerous biases both at the level of the individual trial ("garbage in, garbage out") and the dissemination of trial results (publication bias).[33]

Dalene Stangl, PhD, Associate Professor of the Practice of Statistics and Public Policy, Duke University (Durham, North Carolina), used a controversial meta-analysis of the Global Utilization of Streptokinase and Tissue Plasminogen Activator for Occluded Coronary Arteries (GUSTO) clinical trial to illustrate both the challenges and rewards of meta-analysis. The GUSTO trial compared the efficacy of the older and less expensive therapy, streptokinase, against that of a novel and more costly therapy, tissue plasminogen activator (t-PA), in improving 30-day survival following an acute myocardial infarction.

The original analysis of the GUSTO data was presented in The New England Journal of Medicine (NEJM) in 1993. In the NEJM paper, there were 4 comparison groups: (1) streptokinase with subcutaneous heparin, (2) streptokinase with intravenous heparin, (3) t-PA with intravenous heparin, and (4) both t-PA and streptokinase with intravenous heparin. The difference between streptokinase with subcutaneous and intravenous heparin was small, .002% for 30-day mortality. Averaging the streptokinase groups, there was a 1% difference in 30-day mortality (P = .001) between the streptokinase groups. The GUSTO investigators also showed 95% confidence intervals for odds ratios comparing each treatment group to the t-PA treatment group. Each interval, even the one for t-PA and streptokinase combined, fell clearly to the left of 1, meaning treatment with tPA improved the outcome.[34] A 1% difference, P value of .001, and odds ratios clearly less than 1 are convincing evidence of efficacy to most clinicians.

The evidence was not so convincing to James Brophy and Lawrence Joseph of McGill University (Montreal, Quebec, Canada). Their concern was the disregard for 2 very large-scale clinical trials, Gruppo Italiano per lo Studio della Sopravvivenza nell'Infarto Miocardico (GISSI)-2 and International Study of Infarct Survival (ISIS)-3, and the arbitrary selection of 1% difference as demonstrating clinical superiority. They used meta-analytic methods to reexamine the data from 30,000 patients in GUSTO plus data from the other 2 trials.[35]

Outcomes presented in the meta-analysis included deaths at 30 days, nonfatal strokes, and combined deaths and strokes. The GUSTO study showed the death rate for streptokinase was greater than the death rate for t-PA by 1%. For GISSI-2, the death rate for t-PA was greater by 0.7%, and for ISIS-3, the death rate for streptokinase was higher by 0.3%. For nonfatal strokes, the rate was higher for t-PA in all 3 studies by 0.1%, 0.2%, and 0.2%, respectively.

Next, Brophy and Joseph used Bayesian analysis methods to show distributions for rate differences using GUSTO data only. The probability that the t-PA death rate was lower than the streptokinase death rate by at least 1% was only 36%. Assuming a neutral opinion regarding the GISSI-2 and ISIS-3 data (ie, a 50% probability that there is clinical superiority of either drug), the probability that the t-PA death rate was lower than the streptokinase death rate by at least 1% was near zero. Indeed, the probability that the t-PA death rate was lower at all was less than 50%. Incorporating the GISSI-2 and ISIS-3 data at 100%, they showed that the probability that the death rate for patients given t-PA was lower than the death rate for patients given no thrombolytic agent was about 10%. They stated, "This analysis suggests that the clinical superiority of t-PA over streptokinase remains uncertain."[35]

The synthesis of the GISSI and ISIS trials with GUSTO via meta-analysis forced researchers to take a second look at the conclusions from the GUSTO study and, more generally, at the differences between very large-scale trials and meta-analyses. This example demonstrates the challenges of conducting meta-analysis, including the need for technical and substantive expertise, critical analytic thinking, and time. These challenges should not dissuade researchers from conducting meta-analysis; unless researchers critically analyze and synthesize research findings, therapeutic decision making will be compromised.

Jesse Berlin, ScD, Professor of Biostatistics, University of Pennsylvania School of Medicine, Philadelphia, provided information on the benefits of conducting individual subgroup analysis over meta-analysis. As shown in Dr. Stangl's example, in meta-analyses, study design and patient populations often vary across studies. Different studies may use a different timing, dose, or duration of therapy, and may include different populations of subjects. For example, the distributions of severity of illness, stage of disease, age, and sex may all vary from study to study. Estimated treatment benefits also often vary across studies, with some studies showing larger benefits and some showing small or no benefits. The challenge in these situations then becomes to relate the variability in the estimated treatment effect to the variability in study design.

It is often of particular interest to evaluate the effectiveness of a therapy in specific subgroups of patients. A variety of meta-analytic tools are available to explore these subgroup questions. Most meta-analyses are based on published group-level data, and data on individual patients are not available. A common technique for subgroup analysis of group-level data, meta-regression, assesses the relationship between the magnitude of the treatment effect and the percentage of patients who are included in the subgroup of interest. This type of "ecological analysis" risks producing incorrect answers for drawing inferences about the individual patient characteristics. The alternative is to retrieve the actual patient-specific data, conduct sex-specific analyses, and perform formal statistical tests to compare the actual treatment effects in men with those in women. Although the patient-specific approach has clear hypothetical advantages in the exploration of subgroup effects, it would be important to know whether in real clinical situations the group-level and patient-level analyses lead to different results.

Dr. Berlin was able to explore this hypothesis by obtaining both published group-level data and unpublished patient-level data from 5 randomized controlled trials of prophylactic antirejection (induction) therapy to prevent loss of transplanted kidneys. He attempted to identify subgroups of kidney transplant patients in whom induction therapy might be particularly effective. Induction therapy entails beginning antirejection drugs on the day of the transplant surgery instead of waiting for patients to show signs of rejecting their transplant to administer the drugs. These antirejection drugs are expensive and have the potential for inducing long-term adverse effects. Therefore, understanding the characteristics of those who will benefit from induction therapy would be useful clinical information.

The patient characteristics of interest were diabetes, closeness of the "match" of the donor kidney to the recipient, history of a prior transplant, ethnicity, and levels of panel reactive antibody (PRA) -- a measure of immune system sensitization that was characterized as either "high" or "low." At 5 years of follow-up, the analyses that were based only on group-level data failed to identify any subgroups of patients for whom treatment was either particularly effective or ineffective. However, on the basis of the individual patient data from the same 5 trials, Dr. Berlin found that treatment was extremely effective in patients with elevated PRA, reducing the rate of transplant failure by 80%, whereas treatment was ineffective in patients with low PRA. Because only about 15% of patients in these trials had elevated PRA, the suggestion is that large numbers of patients could avoid treatment with an expensive and, in their case, ineffective therapy that might only stand to increase their risk of subsequent long-term adverse events. Conversely, a relatively small subgroup of patients seems to benefit from induction therapy.

As illustrated by this example, Dr. Berlin recommended that when subgroup questions are of interest, individual patient data should be used to investigate those effects whenever possible. However, further empirical studies are needed to determine whether this example represents an isolated finding or whether the discrepancy between patient-level analyses and group-level analyses is typical.

Implications of Subgroup Analysis for Industry and Clinicians

The pharmaceutical industry has a natural interest in the use and implications of subgroup analysis. Results of subgroup analyses are often less likely to show statistical significance due to small sample size. Similarly, if the results of the overall trial do not reach statistical significance, significant results found in subgroup analyses can be controversial. Because the goal of a clinical trial is to reach reliable conclusions about treatment effects, the sample size should be large enough to ensure high probability (power) of detecting clinically important treatment differences. If the clinical trial is not designed to have sufficient power within subgroups, then the overall results should be used to indicate the treatment effect in a particular subgroup.

Theresa Stern, PhD, Senior Statistician for Pfizer Global Research and Development, explained some of the strategies available to industry for conducting subgroup analysis. Multiple comparisons or multiplicity in subgroup analysis also can lead to a spurious finding of statistical significance. To combat this problem, multiple comparison procedures should be used. One such procedure, Bonferroni, is a conservative approach that corrects for multiplicity by using the formula:

P < alpha / I

with alpha being the overall significance level, and I being the number of analyses done. Dr. Stern warns, however, that this adjustment does not help with the low power for subgroup analyses and does not ensure clinically significant results.

Dr. Stern also discussed conducting separate subgroup analyses vs interaction tests. Separate subgroup analyses look at the treatment effect within a subgroup, without comparison to other subgroups or to the data from the total study population. Separate hypothesis tests to determine treatment differences within subgroups can lead to erroneous conclusions. Tests for interaction are more valid than separate tests in determining whether a differential treatment effect exists. Dr. Stern recommended using more rigorous statistical procedures, with both interaction tests and correction for multiplicity when analyzing trial data. In the event that subgroups must be looked at separately, Dr. Stern recommended that confidence intervals be used rather than P values alone because confidence intervals demonstrate the range of reasonable treatment differences, given the observed data, and reduce the confusion between "not significant" and "not different." Dr. Stern encouraged researchers to establish subgroups a priori during study design, rather than relying on post hoc analysis. The International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) Guidelines E-9 and E-5 discuss subgroup and interaction analyses, stating that these types of analyses should be clearly identified as exploratory and should be interpreted with caution.[36,37] In addition, the guidelines state that subgroup analyses are not meant to "salvage" a nonsignificant study, but they can help in refining drug labeling or patient or dose selection.

Even when subgroup analyses are conducted, information about sex differences does not always make its way into drug labeling. Brenda Evelyn, a Public Health Specialist with the Office of Special Health Issues, Office of the Commissioner, FDA, outlined findings from her report, "Women's Participation in Clinical Trials and Gender-Related Labeling: A Review of New Molecular Entities Approved 1995-1999."[38] The study revealed that out of 185 product labels examined, 125 (68%) had some mention of sex-specific results. Forty-one of these 125 labels (22%) contained a statement that there had been differences found between the sexes. Sex differences were reported with varying frequency depending on the product type. Although many labels indicated sex differences and most of these were pharmacokinetic differences, no sex-specific dosing changes were recommended.

Marion Gruber, PhD, Scientific Reviewer for the Office of Vaccine Research and Review, Center for Biologics Evaluation and Research (CBER), FDA, described a project sponsored by the FDA Office of Women's Health that examined the extent to which females have been included in clinical trials for biological products, such as vaccines, therapeutics, and blood products, and to what extent the data from these studies have been analyzed with respect to sex. Dr. Gruber and colleagues found that documentation with regard to sex composition and sex analysis was not consistent. In addition, Dr. Gruber found that clinical trials were not prospectively designed to evaluate potential sex differences. Sex analysis was done by subgroup analysis using sex as demographic variable and was performed and reported in a small percentage of the total clinical trial summaries reviewed. Dr. Gruber noted that data on safety and effectiveness of the product were not presented by sex, and sex-specific information included in the product label was generally limited to the pregnancy subsection of the label. The challenges facing FDA reviewers include implementing procedures to enhance the collection of gender- and sex-specific data; developing an FDA-wide review, guidance, and training module to discuss procedure changes; and establishing protocols to determine when and how to conduct subgroup analysis to detect sex differences.

Even when information about sex differences is incorporated in drug labeling, challenges remain in communicating that information to clinicians and their patients. Patricia Byrns, MD, of the University of North Carolina at Chapel Hill noted that the longest lag in the bench to bedside model of therapeutics is getting the drug off the pharmacy shelf and to the appropriate patient. Disseminating new information effectively is an enormous task for several reasons. About 2.8 billion prescriptions are written each year by more than 290,000 physicians and other prescribers. The drug label itself may not be their primary source of information, and it may be difficult to get new information incorporated into their prescribing habits. Further, the type of new information may influence the attention paid to the new information and the rapidity with which it is incorporated.

Dr. Byrns suggested that understanding the process of prescribing may shed light on improving labeling and the dissemination of new information related to sex differences. Dr. Byrns used Vrroom's Expectancy Theory[39,40] to describe the process a physician may use in deciding to prescribe a medication. After diagnosis, the first step is risk assessment. This assessment leads to a decision about what, if any, type of therapy to use; 71% of the time, doctors choose to prescribe a drug. Second, a physician considers efficacy. Safety is the next consideration, factoring in individual characteristics such as the patient's allergies, previous experience with the drug, comorbidities, and other drugs they may be taking. Next, a doctor will consider the relative cost of the drug. Dosing and frequency are usually the considerations. Often all these decisions -- writing the prescription, talking with the patient about the decisions, and giving instructions for taking the drug and monitoring outcomes -- happen in fewer than 5 minutes. Integrating new information about a drug into this process has proven difficult for the physician, particularly when the drug is one that he or she has already prescribed. Challenges for new information regarding subgroup analysis of sex differences include the fact that physicians do not generally have sex differences on their "mental checklist." There are currently little data on whether the point in the medication use process where new information must be incorporated affects how the information should be presented.[41] Dr. Byrns noted the need to develop standard dissemination processes across a number of medical disciplines in a time-sensitive manner and in language that a physician can easily translate into patient-specific prescriptions.

From Sex Differences to Individual Differences: Where the Science is Taking Us

Detecting sex differences in drug trials is a step toward an era of personalized medicine. Penelope Manasco, MD, Vice President of Clinical Genetics at GlaxoSmithKline, Research Triangle Park, North Carolina, identified the future of personalized therapy as resulting in the right drug given at the right dose to the individual patient. She predicted an increase in the use of genetic subtypes in diagnosis rather than relying solely on phenotypes (such as male/female).

Currently, single nucleotide polymorphism (SNP) technology is leading the movement toward individualized therapy. The human genome is made of 3 billion base pairs, and for every thousand base pairs there is a variable base pair that gives rise to an SNP, resulting in 3 million SNPs in the human genome. SNPs serve as markers for mapping the genome. Ten pharmaceutical companies and the Wellcome Trust Research Laboratory have worked with 5 academic centers through the SNP Consortium to develop and disseminate an SNP map of the human genome.

Currently, SNPs are being used at GlaxoSmithKline to look for a difference in allele frequency between cases of a particular disease (for example, migraine headache, which affects 18% of women and 6% of men) and healthy controls. Using SNPs, scientists were able to identify the insulin receptor gene as a susceptibility gene for migraine. This is borne out by the higher incidence of migraine in patients with noninsulin-dependent diabetes. This information gives scientists a platform to work from in developing a therapy and information about what genes and systems to target.

Pharmacogenetics, according to Dr. Manasco, is the future for research. Pharmacogenomics is the use of genetic information to predict the safety, toxicity, and/or efficacy of drugs in individual patients or groups of patients.[42] Pharmacogenetics analysis can be used to develop a medicine response profile (MRP) for individual patients. For example, patients with Alzheimer's disease who have an ApoE4 allele were found to benefit from an Alzheimer's therapy that was ineffective in those who lacked ApoE4. Similarly, in the treatment of asthma with 5-lipoxigenase inhibitors, Jeff Drazen of Harvard University (Boston, Massachusetts) found that subjects with the normal size of the promoter region for the 5-lipoxigenase gene responded to a 5-lipoxigenase inhibitor, whereas those with an abnormal size of the promoter region did not respond.[43]

One example of a pharmacogenomic therapy currently in use is trastuzumab (Herceptin), a drug used to treat breast cancer. Trastuzumab targets the production of protein from the HER2 gene, which is overexpressed in 20% to 25% of breast cancers. These "HER2-positive" tumors are associated with more rapid metastasis, decreased survival, and increased tumor recurrence. Trastuzumab blocks overexpression of HER2. Patients who had overexpression of the HER 2 genes responded to herceptin, whereas those without HER2 overexpression did not.

Currently, a goal for pharmacogenomics research is to make the technology affordable and more widely available. Additionally, changes may be needed in the regulation and oversight of drug development. Genetic analysis raises ethical issues of privacy and potential discrimination that must be addressed when performing clinical studies of gene-specific therapies.

This report presents some facets of the current debate regarding the design of clinical trials and the analysis of resulting data to accurately and efficiently detect sex differences. As is evident from the presentations at this conference, greater attention must be given to the design of early-phase clinical trials. Further, novel statistical approaches should be explored to better analyze data from these trials. Detecting sex differences in clinical trials and translating those findings into appropriate dosing regimens will serve to promote safer and more effective drugs for both men and women. As the field of pharmacogenomics advances, clinical trial design and statistical analysis will become even more important as we move into an era of personalized medicine.


  1. NIH Has Increased Its Efforts to Include Women in Research. Washington, DC: United States General Accounting Office; 2000. GAO/HEHS-00-96.

  2. Wizemann TM, Pardue M-L, eds. Exploring the Biological Contributions to Human Health: Does Sex Matter? Washington, DC: Board on Health Sciences Policy, Institute of Medicine; April 25 2001.

  3. Meinert CL, Gilpin AK, Unalp A, Dawson C. Gender representation in trials. Control Clin Trials. 2000;21:462-475. Abstract

  4. Schwartz JB. The influence of sex on pharmacokinetics. Clin Pharmacokinet. 2003;42:107-121. Abstract

  5. Meibohm B, Beierle I, Derendorf H. How important are gender differences in pharmacokinetics? Clin Pharmacokinet. 2002;41:329-342.

  6. Anthony M, Berg MJ. Biologic and molecular mechanisms for sex differences in pharmacokinetics, pharmacodynamics, and pharmacogenetics: Part I. J Womens Health Gend Based Med. 2002;11:601-615. Abstract

  7. New Drug Amendments of 1962. U.S. Code Title 21, Chapter 9 Subchapter V Part A Section 355.; 1962.

  8. Herbst AL, Ulfelder H, Poskanzer DC. Adenocarcinoma of the vagina. Association of maternal stilbestrol therapy with tumor appearance in young women. N Engl J Med. 1971;284:878-881. Abstract

  9. General Considerations for the Clinical Evaluation of Drugs. Washington, DC: Department of Health, Education, and Welfare, U.S. Food & Drug Administration; 1977. Publ No. HEWFDA 88-3040.

  10. Report of the Public Service Task Force on Women's Health Issues: U.S. Public Health Service; 1985.

  11. NIH Guide for Grants and Contracts. Bethesda, MD: National Institutes of Health; 1986.

  12. National Institutes of Health: Problems in Implementing Policy on Women in Study Populations: United States General Accounting Office; 1990. GAO/T-HRD-90-50.

  13. Women's Health: FDA Needs to Ensure More Study of Gender Differences in Prescription Drugs Testing. Washington, DC: United States General Accounting Office; October 29, 1992. GAO-HRD-93-17.

  14. NIH Has Increased Its Efforts to Include Women in Research. Washington, DC: United States General Accounting Office; 2000. GAO/HEHS-00-96.

  15. Drug Safety: Most Drugs Withdrawn in Recent Years Had Greater Health Risks for Women 2001. Washington, DC: United States General Accounting Office; 2001. GAO-01-286R.

  16. Evans C, Ildstad S. Small Clinical Trials: Issues and Challenges. Washington, DC: Institute of Medicine; 2001.

  17. Luzier AB, Killian A, Wilton JH, Wilson MF, Forrest A, Kazierad DJ. Gender-related effects on metoprolol pharmacokinetics and pharmacodynamics in healthy volunteers. Clin Pharmacol Ther. 1999;66:594-601. Abstract

  18. Szarfman A, Machado SG, O'Neill RT. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA's Spontaneous Reports Database. Drug Saf. 2002;25:381-392. Abstract

  19. DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System [Reply]. The American Statistician. 1999;53:201-202.

  20. O'Neill RT, Szarfman A. Discussion: Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System. The American Statistician. 1999;53:190-196.

  21. Louis TA, Shen W. Discussion: Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System by William DuMouchel. The American Statistician. 1999;53:196-198.

  22. Madigan D. Discussion: Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System by William DuMouchel. The American Statistician. 1999;53:198-200).

  23. Szarfman A. 1999 Proceedings of the Biopharmaceutical Section. American Statistical Association; 1999.

  24. Szarfman A. The Application of Bayesian Data Mining and Graphic Visualization Tools to Screen FDA's Spontaneous Reporting System Database. 2000 Proceedings of the Section on Bayesian Statistical Science. American Statistical Association; 2000.

  25. DuMouchel W, Pregibon D. Empirical Bayes Screening for Multi-Item Associations. Paper presented at: Proceedings of the Conference on Knowledge Discovery and Data, 2001; Aug 26-29, 2001; San Diego, California.

  26. O'Neill RT, Szarfman A. Some FDA Perspectives on data mining for pediatric safety assessment. Workshop on Adverse Drug events in Pediatrics. Curr Ther Res Clin Exp. 2001;62:650-663.

  27. Szarfman A, Talarico L, Levine JG, eds. Analysis and Risk Assessment of Hematological Data from Clinical Trials. In Volume 4, Toxicology of the Hematopoietic System; 1997.: Elsevier Science Inc.; 1997.

  28. Sipes IG, McQueen CA, Gandolfi AJ, eds. Comprehensive Toxicology; Vol. 4. Pergamon Press; 1997.

  29. Graham D, Waller P, Kurz X. A view from regulatory agencies. In: Strom BL, ed. Pharmacoepidemiology. 3rd ed. New York, NY: John Wiley & Sons; 2000: 109-124.

  30. Trontell AE. How the U.S. Food and Drug Administration defines and detects adverse drug events. Curr Ther Res Clin Exp. 2001;62:641-649.

  31. Martin RM, Biswas PN, Freemantle SN, Pearce GL, Mann RD. Age and sex distribution of suspected adverse drug reactions to newly marketed drugs in general practice in England: analysis of 48 cohort studies. Br J Clin Pharmacol. 1998;46:505-511. Abstract

  32. Glass G. Primary, secondary and meta-analysis. Educational Researcher. 1976;5:3-8.

  33. Egger M, Smith GD, Sterne JA. Uses and abuses of meta-analysis. Clin Med. 2001;1:478-484. Abstract

  34. An international randomized trial comparing four thrombolytic strategies for acute myocardial infarction. The GUSTO investigators. N Engl J Med. 1993;329:673-682. Abstract

  35. Brophy JM, Joseph L. Placing trials in context using Bayesian analysis. GUSTO revisited by Reverend Bayes. JAMA. 1995;273:871-875. Abstract

  36. Statistical Principles for Clinical Trials. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Available at
    http://www.ich.org/pdfICH/e9.pdf. Accessed May 29, 2003.

  37. Ethnic Factors in the Acceptability of Foreign Clinical Data. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Available at:
    http://www.ich.org/pdfICH/e5.pdf. Accessed May 29, 2003.

  38. Evelyn B, Toigo T, Banks D, et al. Women's Participation in Clinical Trials and Gender Related Labeling: a Review of New Molecular Entities Approved 1995-1999. Washington, DC: U.S. Food and Drug Administration, Office of Special Health Issues, Office of International and Constituent Relations, Office of the Commissioner; 2001.

  39. Segal R, Hepler CD. Drug choice as a problem-solving process. Med Care. 1985;23:967-976. Abstract

  40. Lilja J. How physicians choose their drugs. Soc Sci Med. 1976;10:363-365. Abstract

  41. Yasuda SU, Woosley RL. Reducing inappropriate prescribing of sublingual nifedipine. J Pharm Technol. 1995;11:21-22. Abstract

  42. The Biospace Glossary. Biospace, Inc. Available at:
    http://www.biospace.com/gls_detail.cfm?t_id=1728. Accessed May 22, 2003.

  43. Drazen JM, Yandava CN, Dube L, et al. Pharmacogenetic association between ALOX5 promoter genotype and the response to anti-asthma treatment. Nat Genet. 1999;22:168-170. Abstract