We examined the perceived strength and genuineness of static, dynamic morphed, and video-recorded facial expressions, given the rising trend of synthetic dynamics in face perception research. We created dynamic morphs and static images from video frames, yielding stimuli with identical neutral and peak expressions that varied only in the dynamics between. We expected that video-recorded expressions would be perceived as stronger and more genuine than dynamic morphs and static images. Participants viewed videos (all emotions) as stronger than dynamic morphs, and they viewed happy videos as more genuine than happy dynamic morphs. Surprisingly, participants viewed static photographs and video recordings as equally strong for happy and fearful expressions, and equally genuine for sadness, anger, and fear. In this discussion, we will examine hypotheses relating to strength ratings, followed by genuineness ratings, before discussing the findings collectively and evaluating their implications.
Ratings of strength and genuineness were weakly correlated. Requiring participants to make both judgements for each stimulus may have contributed to their correlation in the current study, despite instructions that they were distinct concepts. However, previous research shows that perceived strength and genuineness are weakly correlated when asked separately, especially for smiles (Dawel et al.,
2015). Thus, we will interpret these concepts separately, before discussing the findings collectively and acknowledging their implications.
The Perception of Strength
Participants viewed emotions in videos as stronger than in dynamic morphs. Where Korolkova (
2018b) showed that dynamic morphs and videos have different time-inversion effects, our findings demonstrate that social judgements of morphed emotions differ from videos, and that this can occur with a temporal resolution as low as 25fps. As with studies using 3D avatars (Wallraven et al.,
2008), we show that people perceive synchronous facial motion (i.e., all features moving at once, as is characteristic of dynamic morphs) as less intense when compared with more naturalistic asynchronous facial motion (i.e., video recordings). This insight may be worth considering for researchers who prefer dynamic morphs over 3D avatars to enhance ecological validity, particularly those interested in emotion intensity.
Participants viewed happiness and fear as similarly intense in photos and videos, while they viewed static anger and sadness as stronger than original videos (though this effect was small). Our findings partly align with Kilts et al. (
2003), who reported no difference in strength ratings between static and video recorded conditions happy expressions. However, unlike our study, Kilts et al. also reported no difference for angry expressions. Although both studies utilised video recordings of trained actors exhibiting peak expressions, the stimuli in the Kilts et al. study depicted an actor displaying the emotion for 4 s and included head movements. It is possible that the brief neutral-to-peak transitions used in our study led to qualitative differences in static angry stimuli compared to those used in Kilts et al., resulting in a perception of greater strength in the static condition.
Other studies that have reported increased strength ratings for dynamic compared to static stimuli have primarily used dynamic morphs (e.g., Biele & Grabowska,
2006; Kamachi et al.,
2013), which may not be suitable for making predictions about video recorded stimuli. However, it is surprising that static photographs were rated as equal to or stronger than video recordings. This finding suggests that improved emotion recognition (Butcher & Lander,
2017; Butcher et al.,
2011) and unconscious emotion-congruent facial activity (Rymarczyk et al.,
2019) for video-recorded compared to static expressions may not be related to their perceived strength. The continuous display of peak expression in static stimuli could lead to an increased perception of strength without impacting facial responses or emotion categorisation abilities.
Some emotion recognition studies have found no dynamic advantage for video-recorded happy expressions (Ambadar et al.,
2009; Cunningham & Wallraven,
2009), but have shown an advantage for other expressions, such as sadness (Ambadar et al.,
2009; Cunningham & Wallraven,
2009) anger, and fear (Ambadar et al.,
2009). Our finding that static photographs of fear and anger were perceived as stronger than video recordings suggests the dynamic advantage for recognizing these emotions is not related to their perceived strength. Although we predicted participants would view static expressions as weaker than video recorded expressions, the fact that they were perceived as stronger can still be considered to indicate their inadequacy. Video recordings, while not a substitute for a present and interactive human, still capture the expression’s temporal sequence as it happens. Static peak expressions should be perceived similarly to the video recordings from which they were derived if they are to be validated as suitable stimuli in face perception research.
In our study, static photographs were consistently perceived as stronger than dynamic morphs for each emotion and on average across all emotions. This finding is surprising, as two studies reported dynamic morphed expressions were stronger than static counterparts (Biele & Grabowska,
2006; Rymarczyk et al.,
2011). Methodological differences might explain these discrepancies. Biele and Grabowska (
2006) presented static and dynamic stimuli randomly within the same task, while Rymarczyk et al. (
2011) used blocks of the same kind of stimuli. In contrast, we presented static stimuli separately to avoid confusion between stimuli types. This separation could have led participants to “reset” their rationale for strength ratings between tasks.
Uono et al. (
2010) found that the final image (peak) of dynamic morphs appeared more emotionally exaggerated than static facial expressions. However, this study asked participants to rate the intensity of the initial (static or dynamic) stimulus while viewing a second static image of the peak expression presented subsequently, and it is unclear whether this influenced results. More in line with our findings, (Kamachi et al.,
2013) reported no increase in intensity for dynamic morphs compared to static photographs. These inconsistent findings underscore the need for further research to clarify the conditions under which dynamic morphs might be perceived as stronger or weaker than static photographs.
The dynamic morphs used in the current study transitioned from neutral to peak emotion. Some dynamic morphs are truncated, ending before the peak expression is reached (e.g., Calvo et al.,
2016). This allows researchers to generate more challenging emotion recognition tasks, which are typically too easy when the peak expression is perceived, leading to ceiling effects (Kamachi et al.,
2013). Truncated dynamic morphs are assumed to portray a lower strength/intensity version of the full expression. It is unclear whether such truncated expressions are similar to a low intensity video-recorded emotion. In any case, our findings suggest that dynamic morphs do not adequately portray expression strength, relative to both photos and videos, even when they end on a true photograph.
The Perception of Genuineness
Participants viewed happiness in videos and photos as more genuine than other emotions, and more genuine than happy morphs. They viewed anger, fear, and sadness as similarly genuine across display types. We measured genuineness due to its high social value (Zloteanu et al.,
2018) and to avoid making participants aware of artificially animated stimuli through questions on naturalism. However, the video database used in the current study portrayed actors were coached to portray emotions naturally and accurately (van der Schalk et al.,
2011). Hence, there is no “correct” answer, as we do not know how genuinely each emotion was felt by actors. Such posed expressions are in some sense disingenuous, and are viewed as less genuine than spontaneously induced emotions (Krumhuber & Manstead,
2009; Zloteanu et al.,
2018). This may account for similarities between stimulus types for anger, fear, and sadness, which may be differentiated for spontaneous expressions.
Smiles hold a unique position among emotional expressions, as they serve multiple purposes: they signify genuine happiness and convey many social cues, such as shared understanding (Martin et al., 2017). Duchenne smiles, which involve extra muscle movements that cause wrinkling near the eyes (crow’s feet), have been said to signify true spontaneous happiness (Ekman et al.,
1990), and are perceived differently to posed and false smiles (Gunnery & Ruben,
2016). However, Duchenne smiles occur in both spontaneous and posed conditions, and posed Duchenne smiles do not have to accompany positive feelings (Krumhuber & Manstead,
2009). It is therefore possible that compared to other emotions, happiness was more genuinely felt, or more convincingly faked by the ADFES actors used in the current study. In any case, unlike dynamic morphs, photos appear capable of conveying the perceived genuineness of smiles observed in videos. Again, it appears that adding computer-generated motion to photographs makes them
less similar to a video, perhaps because it removes our ability to imagine the naturalistic motion that generated the expression.
Studies which measure similar constructs, such as naturalness and realism, may provide insight into our findings. Oda and Isono (
2008) found that non-linear expression trajectories were perceived as more “natural” than linear expressions overall, consistent with our findings for happy. As established in our study, the overall effect was not present for sadness, which was perceived as highly realistic for both linear and S shaped functions. While these findings don’t explain our null results for anger and fear, they do indicate that social perceptions related to genuineness for linear and naturalistic facial motion are influenced by the emotional expression. This is also consistent with findings from McLellan et al. (
2010) who found that participants could reliably detect whether video-recorded facial expressions were genuinely felt or simulated, although the pattern was not consistent across emotions, suggesting emotion-specific sensitivity.
The unnatural motion of dynamic morphs may be particularly evident for expression characteristics that reveal previously hidden or less visible features. This could explain why morphs are not commonly used to display blinking, as opening the eyes uncovers the iris. Similarly, toothy smiles expose the teeth. This is an inherent aspect of creating neutral-to-peak morphed expressions (Calder et al.,
1996), even with precise landmark placement. This is a significant disadvantage for dynamic morphs, as studies that measure emotion recognition for partially obscured videos have highlighted the importance of the mouth region (Blais et al.,
2012), especially for happy expressions (Hoffmann et al.,
2013). Further, mouth openness can influence the perceived meaning of a smile (e.g., amused, nervous, or polite smiling; Ambadar et al.,
2009). While the literature generally shows differences in the perception of dynamic morphed and static emotion (e.g., Calvo et al.,
2016; Recio,
2013) our results indicate that these dynamic effects differ from those of naturalistic facial emotion.
Constraints on Generality and Future Directions
Our discussion of ecological validity has focused on the biological accuracy of stimulus dynamism. Experiments which contain complex, dynamic, naturalistic stimuli can nevertheless lack ecological validity if the experimental environment fails to emulate the real-world situation of interest (Shamay-Tsoory & Mendelsohn,
2019). It is possible that the effects of naturalistic facial dynamism are contingent on the ecological validity of the experimental task and environment (Risko et al.,
2012). Future research may assess the effects of realistic dynamism in naturalistic and interactive settings.
There is evidence that education level (Demenescu et al.,
2014) and cultural background (Engelmann & Pogosyan,
2013) influence face perception. While an effort was made to recruit a diverse range of participants, online posts which gained the most engagement consisted primarily of student groups for universities located within Australia, Europe, and the United States. This is reflected in our country of residence data and likely contributes to the overrepresentation of Western, educated, industrialized, rich, and democratic (WEIRD) populations in research (see Roberts et al.,
2020). Additionally, we showed actors of Northern European and Mediterranean ethnic backgrounds, which may not fully capture the diversity of facial expressions encountered in daily life, particularly in multicultural societies such as those from which our participants were drawn. Notably, the perception of other races and ethnicities may differ for similar samples (Shriver et al., 2008). As the actors were culturally Dutch and emotional expressions can differ by culture (Srinivasan et al.,
2016), further research on non-western expressions is needed.