Development of a New Patient-reported Outcome Measure to Evaluate Treatments for Acne and Acne Scarring: The ACNE-Q

A.F. Klassen; S. Lipner; M. O'Malley; N.M. Longmire; S.J. Cano; T. Breitkopf; C. Rae; Y.L. Zhang; A.L. Pusic


The British Journal of Dermatology. 2019;181(6):1207-1215. 

In This Article

Patients and Methods

Our team followed international best practice guidelines for PROM development,[14–17] including specific recommendations for including children and adolescents.[18] We used a modern psychometric approach called Rasch Measurement Theory (RMT).[19] Figure 1 shows our multiphase iterative mixed-methods approach.[20] This paper describes phases I and II.

Figure 1.

Flow diagram showing the multiphase mixed-methods protocol for developing the ACNE-Q (reproduced with permission from Wong Riff et al.).20 QUAN, quantitative; QUAL, qualitative.

Phase I: Qualitative Research

Concept Elicitation. We obtained research ethics approval for phases I and II from the Hamilton Integrated Research Ethics Board (Canada) and The Hospital for Sick Children Research Ethics Board (Canada), and for phase II from Weill Cornell Medicine (New York, NY, U.S.A.). For both study phases, informed assent and/or consent was obtained from participants and guardians.

The qualitative phase took place between July and December 2015. We used an interpretative description qualitative approach.[21] Eligible participants were aged 12 years and older, fluent in English and pre- or post-treatment for acne and/or acne scars. Recruitment took place at McMaster University (Hamilton, ON, Canada), Hospital for Sick Children (Toronto, ON, Canada), Dermetics (Burlington, ON, Canada) and Ancaster Dermatology Centre (Ancaster, ON, Canada). Patients were invited to consider participation in the study by a member of the healthcare team. Those who agreed were contacted by a qualitative interviewer (N.M.L.), experienced in PROM development, to set up a date and venue (home or hospital) for the interview.

Interviews were used to elicit concepts and to create a comprehensive item pool for scale development. The interview guide covered appearance concerns, as well as symptoms and the psychosocial impact of acne. Interviews were recorded digitally and transcribed verbatim. Data were coded using a line-by-line coding approach where quotes (patients' words/phrases) were moved into Excel (Microsoft, Redmond, WA, U.S.A.) and categorized into conceptual top-level domains, and major and minor themes using constant comparison. Coding was performed by one researcher on the team (N.M.L.) and confirmed by a second member (A.F.K.). Interviews continued until the point of saturation was reached, where no further new concepts were elicited from additional interviews. An example of how the data were coded is shown in Table 1.

Scale Formation. The item pool was used to form scales in line with RMT.[19,22] In this approach, items are designed to map out a concept of interest in terms of a clinical hierarchy (from less to more of a concept). Data collected are analysed to see if the theorized construct is supported by the data (i.e. do the data 'fit' the Rasch model). When data fit the Rasch model, it is legitimate to sum the items of a scale for a total score.

Cognitive Interviews. Cognitive interviews were used to ensure the content of each scale was relevant, comprehensive and comprehendible.[23,24] Participants were drawn from the sample of patients who took part in the initial qualitative interviews. Interviews, which took place by telephone between November 2016 and January 2017, were audio-recorded, transcribed and coded. We used the think-aloud method to determine how patients understood the instructions, response options and items of each scale.[25] Verbal probing was used to identify content that was not relevant, insensitive, difficult to understand and/or ambiguous. Participants were encouraged to suggest revisions to wording for any part of the scale to improve comprehension. They were also asked to suggest new items they thought were missing from each scale. We completed interviews in two rounds. Each round included five participants (total sample = 10) to permit revision of scales between rounds.

Expert Review. A secure web-based Research Electronic Data Capture (REDCap)[26] survey was designed and used to obtain feedback from clinical experts in dermatology within our team's professional networks. Experts were asked to comment on the instructions, response options and items, and to suggest missing concepts. A link to the survey was e-mailed in February 2017, with one reminder sent.

Phase II: Field-test Study

Recruitment took place between June 2017 and August 2018 from Dermetics, Ancaster Dermatology Centre and Weill Cornell Medicine. Eligible participants were aged 12 years and older, at any point in their treatment trajectory (pre- or post-treatment for acne and/or acne scars) and able to read English in order to complete the ACNE-Q independently.

Patients checking in for an appointment were invited to complete the ACNE-Q using a paper booklet or an iPad® with data entered directly into REDCap. Demographic (age, sex) and acne-specific questions were included in the survey. Participants were asked to indicate for the face, chest and back the amount of acne they had (none, a little, a moderate amount, a lot). Participants were also asked to indicate areas of the face (forehead, cheeks, nose, chin, jawline and neck) with acne and with acne scars. Information provided was used to compute two variables to indicate how many areas of the face (0–6) had acne and acne scars.

We performed RMT analysis within Rumm2030 software,[27] which involved a set of statistical and graphical tests described in detail elsewhere and briefly here.[22,28–31] (i) Item response ordering: we examined ordering of thresholds between response options (e.g. 'not at all' and 'a little') to determine if scales' response categories were ordered, meaning that a '1' on a 4-point scale must sit lower in the continuum than a '2', etc. (ii) Item fit statistics: three fit indicators were examined: log residuals (item–person interaction), χ2-values (item–trait interaction) and item characteristic curves. Ideal-fit residuals are between −2·5 and +2·5, with χ2-values being nonsignificant after Bonferroni adjustment. (iii) Dependency: we inspected residual correlations between pairs of items, as residuals should be low. High correlations can artificially inflate reliability. For any residual correlations > 0·30, we conducted a subtest to determine the impact of the correlation on reliability. (iv) Targeting: we examined item locations to determine whether they were evenly spread over a reasonable range that matched the range of the construct experienced by the sample. We also computed the percentage of participants to score within the scale range for each scale. (v) Person separation index: this statistic measured the error associated with the measurement of people in a sample. Higher values show greater reliability.

We also computed Cronbach α-values and interclass correlation coefficients (ICCs). For the latter, participants who completed the ACNE-Q and provided their e-mail for participation in a test–retest (TRT) reliability study were invited to complete the ACNE-Q again 1 week after the initial assessment. Those that agreed were e-mailed a link to the survey with a reminder sent 1 week later. In the second assessment participants were asked: 'Has there been any change (yes/no) in your acne or acne scars since completing the ACNE-Q the first time?'

The Rasch logit scores were used to transform scores into 0–100. Higher scores indicate a better outcome for the appearance and symptom scales and worse for appearance-related distress. Construct validity was evaluated via hypothesis testing. Firstly, we examined intercorrelations between ACNE-Q scales to examine the extent to which subscales were related. We hypothesized that lower appearance scores would correlate with lower symptom scores and higher scores for appearance-related distress. We also hypothesized that patient characteristics [age, sex, race (white vs. other)] would correlate weakly with ACNE-Q scales. Secondly, we examined known-groups relationships between ACNE-Q scores and clinical variables. For the facial acne, acne scars, symptoms and distress scales, we hypothesized that worse scores would correlate with having more areas of the face with acne and with acne scars.

We used parametric (Pearson correlations, t-test or one-way ANOVA) tests for data that were normally distributed and the equivalent nonparametric tests for data not normally distributed. P-values < 0·05 were used to identify statistical significance.