Diagnostic Accuracy of New and Old Cognitive Screening Tools for HIV–Associated Neurocognitive Disorders

M Trunfio; D Vai; C Montrucchio; C Alcantarini; A Livelli; MC Tettoni; G Orofino; S Audagnotto; D Imperiale; S Bonora; G Di Perri; A Calcagno


HIV Medicine. 2018;19(7):455-464. 

In This Article


To date, available screening tools have shown poor performance compared with the consensus criteria, failing to achieve acceptable levels of accuracy in predicting HAND.[7,10] To our knowledge, this is the first prospective evaluation of FAB and CDT in a large HIV–positive population. With an overall screening accuracy threshold of 70% being required for consideration for clinical use,[10] we report an almost acceptable accuracy for FAB and IHDS and a low clinical usefulness for 3QT and, contrary to our expectations, for CDT.

Our population was mainly composed of middle–aged men who have sex with men and previous injecting drug users, with a medium–low education level; HIV infection was often diagnosed in advanced stages. Although the majority were on HAART, fewer than three–quarters of our patients were virologically suppressed as a consequence of variable adherence, recent HAART initiation or low–level plasma viraemia. These characteristics, the legacy effect (associated with late presentation), the study time–frame and our selection criteria may explain why our HAND prevalence (49.8%) was higher than that recently reported in Europe (35%).[26] Furthermore, the estimate may have been biased by our reference standard, which was composed of an extensive number of tests evaluating different cognitive domains, which may have increased the likelihood of obtaining a lower score in two or more domains. Moreover, we experienced a small but not negligible loss of patients in referring them for the full neurocognitive evaluation. As it is recognized that patients diagnosed with HAND have a lower rate of retention in care,[27] we could have slightly underestimated our nevertheless significant prevalence of HAND. Considering that predictive values and CCR depend on the prevalence of the disease and that a higher prevalence may lead to higher PPV and lower NPV, our results may be most reliably applied to similar populations and may not directly translate to other clinical settings. Until further studies with larger international sample sizes, including subjects with less advanced disease, have been performed to assess these measures, our findings should be considered preliminary in nature. We chose a longer comprehensive neuropsychological battery as the reference standard to classify our subjects as accurately as possible; such a strategy is usually associated with lower SE values, as a consequence of the influence of reference battery width. The benefit and reliability of higher screening SE when using shorter reference batteries should be carefully assessed, taking into account the delicate balance between the risk of HAND overestimation and underdiagnosis. Thus, the possible increases in SE and overall diagnostic accuracy as a consequence of our high HAND prevalence may have been counterbalanced by the high comprehensiveness of our reference standard. The decision to refer for the comprehensive neurocognitive evaluation only those patients with an altered IHDS score or those complaining of neurocognitive symptoms may have introduced a selection criteria bias, affecting the assessment of IHDS accuracy and the randomness of patient sampling. Nevertheless, 107 patients with a normal IHDS score underwent the full evaluation as a consequence of cognitive complaints, reducing the possibility of such a bias, and finally we found an overall IHDS accuracy very similar to those reported in literature.[7–9] Finally, the variable use of the screening tests, with each patient not being concurrently screened with all four tools, did not allow us to perform a precise comparative analysis.

A strength of the study is that almost 30% of those enrolled were female patients, who are usually poorly represented in studies. Moreover, while only 54% of studies in this field were found to have used a representative sample,[8] our patients were randomly selected and quite representative of the heterogeneous population who would ideally receive the screening, suggesting a good ecological validity for similar high–risk populations. Frascati's criteria were applied, by which patients diagnosed with ANI are included, although some authors question their clinical relevance.[5,8,28] In this regard, we tested our screening tests for all HAND categories and for symptomatic HAND only, applying the Frascati criteria, which are the most suitable criteria for research purposes, allowing comparisons among studies and adherence to clinical guidelines and avoiding arbitrariness.[1] Furthermore, we used stringent exclusion criteria to rule out relevant confounding conditions that could have been misrepresented as HAND. As we carried out a cross–sectional study, we could not evaluate criterion validity stability, test–retest reliability, intra–subject variation or learning effects. In a longitudinal context, future studies should also assess these issues.

Our results generally confirmed the modest utility of IHDS in screening for the whole HAND spectrum. Compared with the pooled accuracy measures from meta–analysis, IHDS performed similarly in our population: SE, 64% (0–93%), SP, 59% (25–100%), CCR, 42–87% and overall AUROC, 0.73 versus observed SE, 74.4%, SP, 56.8%, CCR, 65.4% and AUROC, 0.73.[5,7–9] In contrast, we did not observe the expected improvement after administering IHDS to only MND/HAD patients, as we found an overall drop in its diagnostic accuracy, calling into question its previously reported high efficacy in detecting these types of HAND.[5,7] Some authors suggested raising the IHDS cut–off to ≤ 11[29,30] and reported higher SE when it was applied only to on–treatment patients.[30,31] Even in our population, a raised cut–off of ≤ 11 showed a very high SE, but to the detriment of SP and CCR, confirming a greater utility of the widely accepted cut–off of ≤ 10; furthermore, we observed no useful improvement when IHDS was applied to aviraemic patients only, which does not support the possibility of tailored screening according to viroimmunological characteristics.

FAB has shown good diagnostic accuracy in detecting dysexecutive syndromes[19] and its performance has been correlated with perfusion in the medial and dorsolateral frontal cortex in patients with neurodegenerative dementias,[32] but no data have been available for HIV–positive patients until now. Considering that executive dysfunction may also be attributable to lesions affecting the basal ganglia, thalamus, cerebellum and fronto–subcortical loops,[18,19] FAB may have a role in detecting impairment caused by lesions in extensive brain areas. Executive functioning assessment has been shown to be useful to predict functional capacity, vocational status, risky sexual behaviours and medication adherence in HIV–positive subjects, among whom poor performances in all the subdomains of executive function have been recorded.[17] No data are available on executive functioning involvement over the progression of the disease. It is possible that the prefrontal regions of the frontal lobes are mostly affected in advanced stages of HIV infection, where patients may be more likely to be diagnosed with MND/HAD, explaining why FAB performance improved in this group. At our prespecified cut–off, FAB showed a low to moderate efficacy in detecting HAND, with almost 30% false negatives. In our population, the optimal cut–off may be higher than that we adopted and should be regarded as either ≤ 15 or ≤ 16. In both these cases there would be a significant improvement in terms of SE and CCR, although an SE > 70% would be achieved only when using the higher cut–off, together with an acceptable reduction in SP. Moreover, yielding an index independent of cut–off point, HAND prevalence and significant sources of variability, the ROC analysis indicated that FAB is the most useful screening tool among the four tested; further studies are needed to confirm this promising result and to assess implementation of FAB administration in different clinical settings.

CDT broadly evaluates visuoconstructive and visuospatial skills, symbolic and conceptual representation, selective and sustained attention, semantic/verbal working memory, motor execution and executive functions in HIV–negative populations, detecting people with Alzheimer's dementia or mild cognitive impairment.[20,33] Performance on CDT is mostly attributable to the frontal, temporal and parietal cortex and has recently been associated with the cortical thickness of the parietal and supramarginal gyrus.[24,34] CDT has only once been previously tested in a small group of HIV–positive patients, showing promising results.[22] Contrary to our expectations, ROC analysis has shown that CDT predicted HAND with an accuracy that was no different from that obtained by tossing a coin. Moreover, 32.3%, 34.2% and 71.2% of our screened population obtained an almost perfect IHDS, FAB and CDT score, respectively, suggesting a ceiling effect for CDT that could further affect its utility. The discordance between our results and those reported by Levy et al.[22] may be explained by the different CDT interpretation model and gold standard battery we used. The observed poor performance may also be attributable to cortical imprinting,[34] which makes CDT less effective in revealing subcortical deficits, which are still more common in HAND patients.[35,36]

3QT, which is based on symptom reporting, has been proposed as an easy–to–use screening tool for HAND by Simioni et al..[11] However, self–reported symptoms have substantial limitations in the setting of HIV infection as a consequence of the prevailing asymptomatic nature of HAND and the evidence of poor metacognition and insight into objective measures of functional skills,[36,37] making reliance on symptom reporting risky for such a diagnosis. Moreover, self–reported attributions may be more closely related to affective distress than objective decline.[37,38] Not surprisingly, our results did not suggest any utility of 3QT, consistent with a previous study reporting a very poor performance and an association between altered 3QT results and worse perception of quality of life.[39]

All four tools assessed are free and easy to administer, test multiple domains and adhere to standard cognitive constructs reflecting Frascati criteria, but their SE, SP and overall accuracy per se were not sufficient for effective screening at the chosen cut–offs, even in our selected population at high risk of HAND. While IHDS showed the highest SE, FAB had a very high SP and the highest CCR, with a promising performance at higher cut–offs. Considering the extensive involvement of executive functioning in HIV–related neurocognitive impairment, FAB may be considered a promising screening tool for HAND if our results are confirmed by further studies in different clinical scenarios.