Please use this identifier to cite or link to this item: http://hdl.handle.net/10553/134753
Title: Facial Emotion Recognition with Inter-Modality-Attention-Transformer-Based Self-Supervised Learning
Authors: Chaudhari, Aayushi
Bhatt, Chintan
Krishna, Achyut
Travieso González, Carlos Manuel 
UNESCO Clasification: 120325 Diseño de sistemas sensores
610603 Emoción
Keywords: Computer vision
Contextual emotion recognition
Depth of emotional dimensionality
Inter-modality attention transformer
Multimodality, et al
Issue Date: 2023
Journal: Electronics (Switzerland) 
Abstract: Emotion recognition is a very challenging research field due to its complexity, as individual differences in cognitive–emotional cues involve a wide variety of ways, including language, expressions, and speech. If we use video as the input, we can acquire a plethora of data for analyzing human emotions. In this research, we use features derived from separately pretrained self-supervised learning models to combine text, audio (speech), and visual data modalities. The fusion of features and representation is the biggest challenge in multimodal emotion classification research. Because of the large dimensionality of self-supervised learning characteristics, we present a unique transformer and attention-based fusion method for incorporating multimodal self-supervised learning features that achieved an accuracy of 86.40% for multimodal emotion classification.
URI: http://hdl.handle.net/10553/134753
ISSN: 2079-9292
DOI: 10.3390/electronics12020288
Source: Electronics (Switzerland) [ISSN 2079-9292], v. 12 (2), 288, (Enero 2023)
Appears in Collections:Artículos
Adobe PDF (3,77 MB)
Show full item record

Google ScholarTM

Check

Altmetric


Share



Export metadata



Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.