Dynamic Human and Avatar Facial Expressions Elicit Differential Brain Responses

Lorena C. Kegel; Peter Brugger; Sascha Frühholz; Thomas Grunwald; Peter Hilfiker; Oona Kohnen; Miriam L. Loertscher; Dieter Mersch; Anton Rey; Teresa Sollfrank; Bettina K. Steiger; Joerg Sternagel; Michel Weber; Hennric Jokeit


Soc Cogn Affect Neurosci. 2020;15(3):303-317. 

In This Article

Abstract and Introduction


Computer-generated characters, so-called avatars, are widely used in advertising, entertainment, human–computer interaction or as research tools to investigate human emotion perception. However, brain responses to avatar and human faces have scarcely been studied to date. As such, it remains unclear whether dynamic facial expressions of avatars evoke different brain responses than dynamic facial expressions of humans. In this study, we designed anthropomorphic avatars animated with motion tracking and tested whether the human brain processes fearful and neutral expressions in human and avatar faces differently. Our fMRI results showed that fearful human expressions evoked stronger responses than fearful avatar expressions in the ventral anterior and posterior cingulate gyrus, the anterior insula, the anterior and posterior superior temporal sulcus, and the inferior frontal gyrus. Fearful expressions in human and avatar faces evoked similar responses in the amygdala. We did not find different responses to neutral human and avatar expressions. Our results highlight differences, but also similarities in the processing of fearful human expressions and fearful avatar expressions even if they are designed to be highly anthropomorphic and animated with motion tracking. This has important consequences for research using dynamic avatars, especially when processes are investigated that involve cortical and subcortical regions.


While we are becoming more experienced with computer-generated characters, or avatars, that are used in animated films, social media or as human–computer interfaces, we do not know how we react and adapt to interactions with our virtual counterparts (Kätsyri et al., 2017; Hsu, 2019). The use of avatars in entertainment and commercial settings is accompanied by an increasing use of avatars to investigate emotion perception since avatars enable highly standardized experiments that resemble real-life social situations (Zaki and Ochsner, 2009; Crookes et al., 2015; de Borst and de Gelder, 2015). As a result, facial expressions of avatars have been shown to influence human decision making and cooperative behavior, which is crucial for the commercial use of avatars (Choi et al., 2012; de Melo et al., 2011a, 2011b; Scherer and Von Wangenheim, 2014). Hence, it seems vital to investigate the underpinnings of human behavior and associated brain processes during interactions with human-like avatars (Cross, Hortensius, et al., 2019; Epley et al., 2008). In the present study, we investigate whether brain activation differs in response to dynamic human facial expressions and dynamic avatar facial expressions, and if so, which brain regions show activation differences.

Central to the processing of facial expressions and facial identity is a distributed network of brain regions that responds more strongly to human faces than to other visual information (Dricu and Frühholz, 2016; Fernández Dols and Russell, 2017). In this neural face perception network, the posterior and anterior superior temporal sulcus (STS) as well as the inferior frontal gyrus (IFG) form a dorsal pathway sensitive to dynamic features of faces like facial motion and gaze. Conversely, the inferior occipital gyrus, the fusiform gyrus (FG) and the anterior temporal lobe comprise the ventral pathway where invariant features of faces like form and configuration are processed (Haxby et al., 2000; Duchaine and Yovel, 2015).

Based on the functional characterization of these brain regions for face perception, one may hypothesize that differences in brain responses to dynamic human and avatar expressions depend on the specific functions of these pathways. Thus, the ventral pathway, which is tuned to invariant facial features, may respond equally to anthropomorphic avatar faces and their human counterparts. Especially the FG may show equal responses to human and avatar faces, given its importance in the holistic processing of the facial form, independent of motions and emotions. On the other hand, the dorsal pathway may be activated differently by dynamic human and avatar facial expressions. Computer-generated faces often lack subtle dynamic features, such as expression-related wrinkles. Since dorsal regions are mainly engaged in the processing of facial motion, it is plausible that the STS and the IFG show stronger responses to dynamic human expressions compared to dynamic avatar expressions.

In addition to the cortical pathways discussed above, previous research has identified a subcortical route that is particularly involved in the processing of emotional facial expressions (Haxby et al., 2000; Vuilleumier, 2005). This subcortical face processing route is formed by the amygdala, together with the pulvinar and the superior colliculus and may precede responses to dynamic human expressions in the ventral temporal cortex (Johnson, 2005; Méndez-Bértolo et al., 2016). It has been proposed that this rapid subcortical processing is made possible by a magnocellular channel to the amygdala that is tuned to low-spatial frequency input. Typically, low-spatial frequency input provides information about coarse stimulus features like the configuration or form of a face. Conversely, a slower parvocellular channel to face-sensitive cortical regions is attuned to high-spatial frequency information in faces. This fine-grained parvocellular input thus provides slow but high-resolution information about local features of faces like expression-related wrinkles (Vuilleumier et al., 2003; Kumar and Srinivasan, 2011; Dima et al., 2018).

Although the exact functional role of the subcortical route in face perception remains controversial (Pessoa and Adolphs, 2010; McFadyen et al., 2017), it is assumed that it enables the fast detection of fear- or threat-related environmental signals in the absence of slower cortical processing (LeDoux, 2000; Adolphs, 2001; Vuilleumier et al., 2003). To date, no study has investigated whether the subcortical route that conveys low-spatial frequency information to the amygdala also mediates the processing of dynamic human and avatar expressions. It is thus not known whether amygdala responses to dynamic human and avatar expressions differ from cortical responses. In general, we would assume that human and avatar faces both entail coarse low-spatial frequency information about face configuration and form activating the amygdala. However, the composition and the range of the spatial frequency spectrum of avatar faces may depend on their level of elaboration (e.g. detectable wrinkles or not) and thus may differ from human faces with a broad spatial frequency spectrum.

Given the increasing use of avatars, several behavioral and imaging studies have investigated processing differences between human and avatar facial expressions. On a behavioral level, previous studies have shown that facial expressions are reliably recognized in both static and dynamic human and avatar faces (Dyck et al., 2008; Gutiérrez-Maldonado et al., 2014). On a neural basis, however, ventral and dorsal regions of the face perception network (Haxby et al., 2000; Duchaine and Yovel, 2015) showed stronger responses to static human expressions than to static avatar expressions (Moser et al., 2007; James et al., 2015; Kätsyri et al., 2020). More precisely, the FG, the STS and the IFG were more activated by static human than avatar expressions (Moser et al., 2007; James et al., 2015; Kätsyri et al., 2020). Former results on amygdala responses to human and avatar facial expressions are mixed. Whereas two studies comparing static emotional expressions in human and avatar faces found no significant differences in amygdalar responses (Moser et al., 2007; Kätsyri et al., 2020), another study using neutral pictures of human and cartoon faces showed a stronger response of the amygdala to human faces (James et al., 2015).

These results indicate that human and avatar facial expressions are not processed in the same way in both dorsal and ventral regions of the face perception network. Moreover, there is also a processing difference between cortical and subcortical regions that may be attributed to their differential sensitivity to certain ranges of spatial frequency. Yet, those results have been obtained using static facial expressions. So, virtually nothing is known about the differential processing of dynamic human and avatar facial expressions. To help closing this gap, we assessed brain responses to dynamic facial expressions of actors and their customized avatar look-alikes, which have been developed for this study. During the acquisition of functional MRI data, participants watched short videos of fearful and neutral expressions of the actors and avatars. By asking our participants to rate the intensity of the presented expressions within 2 weeks after the scanning session, we were able to investigate whether the intensity of the expressions also influences brain activation.

Based on previous results with static avatar expressions (Moser et al., 2007; James et al., 2015; Kätsyri et al., 2020) and the role of the dorsal pathway for dynamic information in face perception (Haxby et al., 2000; Duchaine and Yovel, 2015), we expected the STS and the IFG to show stronger responses to dynamic human facial expressions than to dynamic avatar facial expressions. Furthermore, we presumed that the processing difference between dynamic human and avatar faces should be larger for fearful expressions than for neutral expressions. We expected such an interaction effect to be present in the STS and the IFG because those regions are sensitive to dynamic features of facial expressions, which characterize fearful more than neutral expressions.