How to Create a Nice Graph for a Research Paper, and Why Your High School Math Teacher Was Right

Andrew J. Vickers, PhD


August 10, 2009

Another difference between normal people and statisticians: A normal person might say "high school; those were the happiest days of my life," whereas statisticians tend to be more particular in stating that it was "high school math" that made them happiest. As it happens, much of the math that I did in high school was far more advanced than what I do in my day-to-day work. (I don't have to solve any differential equations to run a linear regression.) However, one thing that I learned really stuck with me: how to draw a graph.

What I was taught about graphs was that you have an x-axis and a y-axis, and to draw a line, you state the value of y in terms of x, for example, y = 3x + 4. I was also taught that you can put in other powers of x (eg, y = 2x2 - 3x + 4). Doing so allows you to have a curve, rather than a straight line. (This is called a "nonlinear" relationship.) The number of times this curve can change direction is related to the number of different x terms that you have (Figures 1a-c).

Figure 1a. Graph for y = 3x + 4.

Figure 1b. Graph for y = 2x2 - 3x + 4.

Figure 1c. Graph for y = 0.6x3 + 2x2 - 3x + 4.

In school I also had the x- and y-axes represent something, such as age and height, and put a mark for each observation at the appropriate coordinate: For a 5-ft 6-in 15-year-old, I'd draw an imaginary line up from the x-axis at 15 and an imaginary line across from the y-axis at 5 ft 6 in, and then put a dot where the 2 lines meet. I would then draw a line through the graph so that it comes closest to the dots representing each person's height.

It turns out that this is just about all you need to know for most medical research papers, which begs the question of why good graphs are so few and far between in the medical literature: Here are 2 graphs that are fairly typical of the form (Figures 2a, b).

Figure 2a. Results of a randomized trial of acupuncture for headache. Blue bar: headache frequency at baseline; red bar: headache frequency at posttreatment follow-up.
From Lilja H, et al.[1]

Figure 2b. Results of an epidemiologic study examining the relationship between prostate specific-antigen (PSA) level at age 44 -50 and prostate cancer diagnosed clinically up to 25 years subsequently. Numbers of cases (blue) and controls (red) are given against quintile of PSA level. A quintile is a fifth of the data: patients in the top quintile, for example, have PSA levels higher than 80% of the population.

Now, for those of you still awake, a short review of why those graphs are so bad: First, they are boring to look at; second, they don't give an immediate visual impression of the results; and third, they don't provide information that anyone could actually use.

So, back to high school: Let's present the results on an x and y graph (Figures 3a, b).

Figure 3a. Results of a randomized trial of acupuncture for headache. Blue dots: individual patient results in the control group. Blue line: "regression line" for control group. The "regression line" can be interpreted as the average reduction in headache for a given level of baseline headache. Red dots: individual patient results in the acupuncture group. Red line: regression line for the acupuncture group.
Reprinted with permission from Lilja H, et al.[1]

Figure 3b. Probability of prostate cancer before age 75 by prostate-specific antigen level at age 44-50, with 95% confidence interval (blue lines).
Reprinted with permission from Vickers AJ, et al.[2]

A couple of things to note: First, both graphs present information that can be immediately used by a clinician, either a patient's long-term risk for prostate cancer on the basis of his prostate-specific antigen level or the anticipated reduction in headaches with and without acupuncture treatment. Second, to make both graphs, I used what are called non-linear terms (such as the square of the prostate-specific antigen level or headache score as appropriate) so that the relationship between prostate-specific antigen and cancer, or baseline headaches and change in headaches, could be a curve rather than a straight line. Third, the graph for the clinical trial shows the actual results of each patient in the trial. (It's what graphs are meant to do -- show the data.)

A final point: Have a look in the acupuncture graph at the left end of the regression lines (the lines drawn to come closest to all the dots). It looks as though acupuncture has no effect for patients who don’t have many headaches at baseline. However, look again and you'll see that this is based on only a small number of patients. This is a reminder that we have to be careful about applying average statistical results to patients at the extremes. Then again, at least you have something to apply, other than a boring blue bar.

Note: The papers associated with each graph are from Journal of Clinical Oncology 2007;25:431-436 and BMJ 2004;328:744, respectively.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as: