Functional Imaging of Neurocognition

Mark D'Esposito, M.D., Helen Wills Neuroscience Institute and Department of Psychology, University of California, Berkeley, California.

Semin Neurol. 2000;20(4) 

In This Article

Functional MRI: Statistical Issues

Many statistical techniques have been developed for the analysis of fMRI data, but no single method to date has emerged as the "gold standard." Inferential statistical methods are necessary for any fMRI experiment designed to contradict the null hypothesis, that is, to test the hypothesis that there is no difference between experimental conditions. If the difference between two experimental conditions is too large to be reasonably due to chance, then the null hypothesis is rejected in favor of the alternative hypothesis, which is usually the experimenter's research hypothesis. Unfortunately, any statistical test may be in error; we will never know when we have committed an error, and all we can do is try to minimize them.[54] Two types of statistical errors can occur. Type I error is committed when we falsely reject the null hypothesis when it is true (i.e., a difference between experimental conditions is found but a difference does not truly exist), and type II error is committed when we accept the null hypothesis when it is false (i.e., no difference between experimental conditions is found when a difference does exist).

In fMRI experiments, like all experiments, the experimenter chooses a tolerable probability for type I error, typically less than 5%, to allow adequate control of specificity (i.e., ability to control false positives). Two features of imaging data exist that can lead to unacceptable false-positive rates even when traditional parametric statistical tests are applied. The first is known as the problem of multiple comparisons. For the typical resolution of images acquired during fMRI scans, the full extent of the human brain comprises approximately 15,000 voxels (a three-dimensional unit) and thus with any given statistical comparison of two experimental condition, there are actually 15,000 statistical comparisons being performed. With such a large number of statistical tests, the probability of committing a type I error (falsely rejecting the null hypothesis) somewhere in the brain will increase. Several methods exist to deal with this problem. One method, called Bonferroni correction, assumes that each statistical test is independent and calculates the probability of type I error by dividing the chosen probability (alpha= 0.05) by the number of statistical tests performed. Another method, advocated by Worlsey and Friston,[55] is based on Gaussian field theory and calculates a probability of type I error when imaging data are spatially smoothed. Assuming that an experimenter wants to control the false-positive rate in his or her experiment, it would be unacceptable for one not to correct for multiple comparisons.

Another feature of fMRI data that can lead to an elevated false-positive rate has to do with the structure of the noise in fMRI data. BOLD fMRI data are temporally autocorrelated or "colored" under the null-hypothesis. That is to say, in fMRI data that are collected from human subjects in the absence of any experimental task or time-varying stimuli, greater power (i.e., variance) is seen at some frequencies than at others. The shape of this distribution of power has been well characterized by a 1/frequency function56 with increasing power at ever lower frequencies. The presence of this autocorrelated noise renders traditional parametric[56,57] and nonparametric58 statistical tests invalid, as these tests assume that each observation is independent of other observations. In particular, the false-positive rate observed for experimental paradigms that have most of their variance in the low-frequency end of the spectrum (e.g., a boxcar design with 60-second epochs, which is typical of many fMRI experiments) will be dramatically inflated. Statistical models can account for this temporal autocorrelation;[59,60] again, it seems unacceptable for a statistical test of fMRI data not to do so.

Type II error has received relatively little attention in studies using functional neuroimaging. When a brain map from an fMRI experiment is presented, typically there are several areas of activation that are attributed to some experimental manipulation. Although the focus of most imaging studies is on brain activations, it is nevertheless often implicitly assumed that all of the other areas that are not activated (typically most of brain) were not active during the experimental manipulation. Power is a statistical concept that refers to the probability of correctly rejecting the null hypothesis.[54] Unfortunately, power calculations for particular fMRI experiments have not been performed, although methods to do so have begun to appear in the literature.[59,60] Relatively simple strategies can be used to increase power in an fMRI experiment in certain circumstances, such as increasing the number of observations (i.e., number of scans or number of subjects) or designing behavioral tasks that take into account the sluggish nature of the fMRI signal and the 1/f distribution of the noise.[61]