Diagnostic Accuracy of Virtual Cognitive Assessment and Testing

Systematic Review and Meta-analysis

Jennifer A. Watt MD PhD; Natasha E. Lane MD PhD; Areti Angeliki Veroniki PhD; Manav V. Vyas MBBS MSc; Chantal Williams MSc; Naveeta Ramkissoon MPH; Yuan Thompson PhD; Andrea C. Tricco PhD; Sharon E. Straus MD MSc; Zahra Goodarzi MD MSc


J Am Geriatr Soc. 2021;69(6):1429-1440. 

In This Article


We published our protocol on Open Science Framework and registered it with PROSPERO (CRD42020186290). Our systematic review was written in accordance with the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies.[6]

Data Sources and Searches

We searched MEDLINE, EMBASE, CDSR, CENTRAL, and PsycINFO for citations published in any language from inception until April 1, 2020 (see File S1 for MEDLINE search strategy). We also searched the gray literature (File S2) and reviewed reference lists of included studies and related systematic reviews.

Study Selection

We included studies assessing the accuracy or reliability of virtual compared with in-person cognitive assessments for diagnosing dementia or MCI, identifying virtual cognitive test (e.g., TICS, MoCA) cutoffs suggestive of dementia or MCI compared with an in-person cognitive assessment, or describing the correlation between virtual and in-person cognitive test scores in adults.[4,7] Our population of interest included adults (1) without cognitive impairment, (2) diagnosed with MCI, or (3) diagnosed with dementia as per established criteria (e.g., Petersen, Diagnostic and Statistical Manual of Mental Disorders [DSM], National Institute on Aging-Alzheimer's Association).[3,8–10] After reviewer pairs (ZG, NL, and JW) reached at least 80% agreement in a pilot screening exercise, they independently screened (1) all citations and (2) full-text articles (of citations retained from step #1) to establish inclusion eligibility. Discrepancies regarding study inclusion were resolved by deliberation within reviewer pairs.

Data Abstraction and Quality Assessment

Pairs of reviewers (ZG, NL, NR, YT, MV, JW, and CW) independently abstracted data from each included full-text article and appraised each article's quality with the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool.[11] We rated studies at high risk of bias from study flow and timing if virtual and in-person cognitive tests were conducted >6 months apart. Where reported by study authors, we abstracted study and study-level participant characteristics from each study (i.e., authorship, year of publication, degree of cognitive impairment [i.e., no cognitive impairment, MCI, dementia], average age of study population, and proportion of women in each study population); details describing how cognitive assessments and tests were implemented (i.e., method of virtual cognitive testing delivery [e.g., telephone, videoconference], language of cognitive testing, barriers to implementing virtual cognitive testing); and outcome data comparing test scores from cognitive assessments and tests conducted virtually compared with in-person (i.e., correlation coefficients, sensitivity, specificity, true positives [TP], true negatives [TN], false positives [FP], false negatives [FN], and area under the receiver operating characteristics [AUC] curve). Discrepancies regarding data abstraction and quality assessment were resolved by deliberation within reviewer pairs or by a third reviewer.

Quantitative Data Synthesis

We derived summary receiver operating characteristic curves (SROC), optimal cutpoint thresholds (maximum value of the weighted sum of sensitivity and specificity), and AUC values at optimal cutpoint thresholds. We pooled TP, TN, FP, and FN values at cognitive test cutoffs using restricted maximum likelihood estimation from inverse variance weighted linear mixed effects models.[12] These meta-analytic models can incorporate TP, TN, FP, and FN values from multiple cutpoints in each included study and estimate the optimal cutpoint threshold by incorporating all data points from included studies into a single meta-analytic model.[12] SROC curves were presented with confidence regions.[12] We derived meta-analytic effect estimates from studies comparing a virtual cognitive test to an in-person reference standard, which we defined as assessments conducted by clinicians using established diagnostic criteria (e.g., DSM or Petersen criteria).[3,8] We presented positive (LR+) and negative (LR−) likelihood ratios to facilitate interpretation of meta-analytic estimates.[13] The TICS is administered by telephone; thus, we did not conduct a subgroup analysis based on modality of administration. We did a post hoc subgroup analysis of studies where the mean years of formal education completed was >8, which roughly corresponds to at least an elementary school level of education. We performed a sensitivity analysis by excluding studies at high risk of bias from conduct or interpretation of the reference standard.[11] We were unable to complete a sensitivity analysis by excluding studies at high risk of bias from patient flow through the study and timing of reference and index tests because there were too few remaining studies. Analyses were performed in R, version 4.0.0.[12]

Qualitative Data Synthesis

We conducted a qualitative meta-synthesis to understand potential barriers and facilitators associated with implementing virtual cognitive assessments and testing.[14,15] Two reviewers (JAW and ZG) coded barriers and facilitators inductively (independently and in duplicate) and categorized them into themes. JAW and ZG categorized codes by test or assessment timing (i.e., before, immediately preceding, or during) and deliberated discrepancies within the reviewer pair.