Optimal Recall Period and Response Task for Self-Reported HIV Medication Adherence

Minyi Lu; Steven A. Safren; Paul R. Skolnik; William H. Rogers; William Coady; Helene Hardy; Ira B. Wilson


AIDS and Behavior. 2008;12(1):86-94. 

In This Article


There are two main findings from this research. First, we found less overestimation of self-reported adherence with a 1-month recall period than with 3- or 7-day recall periods. Second, we compared three different response options for 1-month self-report, and found that the item asking patients to rate their adherence on a six-point scale from very poor to excellent was more accurate than either the frequency or percent response formats.

Several studies have compared self-reported and MEMS adherence, and showed larger differences than those we found (Arnsten et al. 2001; Deschamps et al. 2004; Liu et al. 2001; Pearson et al. 2007). Variations in study design (e.g., observational study versus intervention study), study populations, self-report methods, length of the recall period and levels of adherence all probably contributed to these differences, and make it difficult to directly compare their findings with ours. We are aware of only two previous studies that directly compare the accuracy of different response periods using MEMS as a standard. Walsh et al. studied 78 highly adherent (mean MEMS adherence 93%) patients in London, and reported that a 2-week self-report recall period had a higher correlation with MEMS (r = 0.62) than a 3-day recall period (r = 0.32) (Walsh et al. 2002). Oyugi et al. studied 34 highly adherent Ugandan patients (mean adherence by several different methods 91-94%) and compared adherence measures using a 3-day structured self-report, a 30-day visual analog scale (VAS) and MEMS (Oyugi et al. 2004), and found no significant differences between any of the measures (r = 0.87 for 3-day self-report and MEMS; r = 0.77 for 30-day VAS and MEMS). Taken together with these studies, our data support the assertion that longer time periods (at least up to 30 days) are probably more accurate than the shorter time intervals that have typically been employed in self-report studies, such as the original ACTG adherence questionnaire.

Because HIV RNA levels depend on adherence over a period of time, we would argue that 1 month or longer is probably a more clinically relevant time interval than 3 or 7 days. Investigators have used the shorter recall intervals because of the assumption that recall will be better for shorter periods, even if the clinically relevant time periods are considerably longer. Our analysis, and that of Walsh et al. (Walsh et al. 2002), suggest that this belief is incorrect—patients do not recall more recent pill-taking events more accurately. That is, empirically we find that there is no trade-off between a clinically desirable self-report time period (1 month), and the recall accuracy.

There are several possible reasons the 3-day recall period did not perform better than longer time periods. The 3-day adherence measure is derived from averaging the exact doses taken or missed for each of the last 3 days. Thus, the 3-day recall requires that respondents answer either three specific questions (for a once a day medication) or six specific questions (for a twice-a-day medication). A mistake of recall for even one dose will have a relatively large impact on the accuracy of the 3-day adherence measure. Another possibility is that for frequent, repetitive events like medication taking patients are actually estimating their adherence rather than remembering discrete medication-taking events (Dykema and Schaeffer 2000; Schaeffer and Presser 2003). If this type of estimation process is occurring for even very recent events, then the benefits of a shorter recall period would be expected to be small. Cognitive testing would be necessary to better understand these phenomena.

We chose to ask the frequency and rating items to assess 1-month adherence because we suspected that problems with health numeracy might limit the usefulness of a scale that used percents. Health numeracy has been defined as "the degree to which individuals have the capacity to access, process, interpret, communicate, and act on numerical, quantitative, graphical, biostatistical, and probabilistic health information needed to make effective health decisions." (Golbeck et al. 2005) Golbeck et al. defined four categories of health numeracy: Basic (e.g., understanding appointment dates, using a phone book), computational (e.g., determining net carbohydrates based on a nutritional label), analytical (e.g., understanding whether a cholesterol measurement is in the normal range, or understand basic graphs), and statistical (e.g., determining preferences of treatment based on probabilities of efficacy and side effects). Percentages and frequencies are the "analytical" category in this classification scheme, and as such are higher level numeric concept. Recognition of health numeracy as an issue for patient self-management and participatory decision-making is growing (Apter et al. 2006; Gazmararian et al. 1999; Montori and Rothman 2005; Peters et al. 2006; Schwartz et al. 1997; Woloshin et al. 2005). Research shows that health literacy and health numeracy are different skills (Montori et al. 2004), and that even those who are highly educated (Estrada et al. 1999) or who have confidence in their statistical skills (Woloshin et al. 2005) can perform poorly when tested. Finally, we note that the percent item that we used asked respondents to chose one of eleven boxes. We did not use a classical VAS, which allows respondents to place a mark at any point on a 10 cm line. It is possible that a more continuous measure such as classical VAS scale would have resulted in closer congruence with the MEMS. However, proper use of even a very simple VAS scale with 0, 50, and 100% marks assumes a certain facility with the concept of percents.

In addition to our concerns about respondents' numeracy skills, we chose to use the frequency and rating items because we theorized that they might be less upwardly skewed than the percent item. That is, most patients know that they should take their ARVs all the time, and not miss any doses, so they might be predisposed to check the 100%, or another very high, box. For example, for someone whose adherence is suboptimal, perhaps it is cognitively easier to endorse "most of the time" or "very good" than to check the 80% box. In addition, there are other theoretical reasons to use frequency and rating items. Relative frequencies are not simple translations of absolute frequencies; they incorporate potentially useful evaluative information (Schaeffer and Presser 2003). This is nicely illustrated by an example from Woody Allen's Annie Hall. 'Both Annie and Alvie Singer report that they have sex three times a week, but she characterizes this as "constantly," whereas his description is "hardly ever" (Schaeffer and Presser 2003, p. 74). Despite these considerations, the frequency and percent items performed similarly, and it was the rating item that was not significantly different from the MEMS adherence measurement. In addition to conveying information about preferences or expectations, relative frequencies may express how respondents compare themselves with similar others, which may have contributed to the relatively poor performance of the frequency format compared with the rating format.

There are several possible study limitations. First, we monitored only one ARV medication with MEMS for each participant. While previous work supports the appropriateness of this approach (Wilson et al. 2001), it is possible that our results might have differed had we monitored all ARV medications. Second, MEMS is widely appreciated to be an imperfect technology (Bova et al. 2005). We did not assume that MEMS is a "gold standard," only that it was more accurate, on average, than self-report. For example, if MEMS users remove several doses at the time of a single opening to put in a pill box or other container, MEMS could actually underestimate actual adherence. We asked about pillbox use, and found no significant differences in either MEMS or self-reported adherence between pill box users and non-users. This suggests, though does not absolutely prove, that this phenomenon was not common in the population we studied.

Third, we assigned the numerical values of 0, 20,...100 to the corresponding answer for 1-month frequency and rating self-report items, which assumes equal intervals between neighboring categories. While there has been debate about the assumption of interval properties, in most circumstances it is a reasonable assumption (Gaito 1982; Townsend et al. 1984). Fourth, one could question the clinical importance of the differences that we observed. For example, the frequency and percent response formats overestimated adherence by 8.5 and 9.2% compared with MEMS. One way to gauge the importance of differences of this magnitude is to compare these numbers with their respective SD. For frequency format this is 0.32 (8.5/26.4) SD units, and for percent format it is 0.34 (9.2/27.0) SD units, effect sizes generally considered to be clinically important (Cohen 1977).

Fifth, in our survey, the rating item followed the frequency and percent items. It is possible that there were some order effects, and future research should test this hypothesis. Sixth, the ideal comparison of self-report intervals would use identical items, and vary only the time frame, which we did not do. It is possible that item wordings, rather than recall period were responsible for the greater accuracy associated with the longer recall periods. Seventh, we did not test all possible 1-month self-report items, nor did we test additional intervals (i.e., 2 weeks). VASs have been tested by several investigators (Giordano et al. 2004; Oyugi et al. 2004), and this response format has the possible advantage of being more easily used with non-English speaking populations. Lastly, we cannot be sure that our findings will generalize to other populations using ARVs.

In conclusion, this study is the first to demonstrate significantly less over-reporting of adherence with a 1 month recall period than with either 3- or 7-day period, suggesting that 1 month may be a more useful recall period for self-report. While confirmation of these finding in other populations is desirable, there is relatively little empiric support for the labor intensive, 3 or 4 days, dose-by-dose, day-by-day approach initially used originally by the ACTG, and subsequently adopted by many investigators. Further research could compare 1 month recall with other recall periods such as 2 weeks. We also found that a rating response format was superior to frequency and percentage response formats. Verification of these finding in other populations is needed, as is further testing of the accuracy of other response tasks, such as VASs.


Comments on Medscape are moderated and should be professional in tone and on topic. You must declare any conflicts of interest related to your comments and responses. Please see our Commenting Guide for further information. We reserve the right to remove posts at our sole discretion.
Post as: