Attention-Based Multimodal Fusion for Salience-Aware Blended Emotion Recognition

Salas-Cáceres, José; Castrillón-Santana, Modesto; Santana, Oliverio J.; Hernández-Sosa, Daniel; Lorenzo-Navarro, Javier

Title:	Attention-Based Multimodal Fusion for Salience-Aware Blended Emotion Recognition
Authors:	Salas-Cáceres, José Castrillón-Santana, Modesto Santana, Oliverio J. Hernández-Sosa, Daniel Lorenzo-Navarro, Javier
UNESCO Clasification:	2405 Biometría
Keywords:	multimodal emotion recognition Biometry human–machine interaction blended emotions multimodal fusions
Issue Date:	2026
Project:	Interaccióny Re-Identificación de Personas Mediante Machine Learning, Deep Learningy Análisis de Datos Multimodal: Hacia Una Comunicación Más Natural en la Robótica Social
Journal:	Multimodal Technologies and Interaction
Abstract:	Blended emotion recognition introduces the challenge of identifying not only which emotions are present in an expressive display but also their relative salience. The proposed methodology builds upon the pre-extracted features provided with the dataset and enhances performance through a combination of temporal modeling and multimodal fusion strategies. Unimodal experiments revealed that visual encoders consistently outperformed audio ones, with the multimodal HiCMAE encoder achieving the strongest single-encoder results with 34% presence accuracy and 18.23% salience accuracy. Multimodal fusion further improved performance, with the best validation results obtained using a combination of simple concatenation and attention-based fusion, reaching 47.86% in presence accuracy and 27.92% in salience accuracy. Overall, the proposed methodology surpasses the chosen baseline introduced in the original paper across a k-fold experiment, confirming the effectiveness of multimodal attention-based fusion for the accurate prediction of both emotion presence and salience in blended affective behaviour. The experimental results further indicate that multimodal expression recognition consistently outperforms unimodal approaches, highlighting the complementary nature of cross-modal information.
URI:	https://accedacris.ulpgc.es/jspui/handle/10553/167121
ISSN:	2414-4088
DOI:	10.3390/mti10050056
Source:	Multimodal Technologies and Interaction[EISSN 2414-4088],v. 10 (5), (Mayo 2026)
Appears in Collections:	Artículos

Adobe PDF (1,51 MB)

Show full item record

Adobe PDF (1,51 MB)

Page view(s)

Download(s)

Google Scholar^TM

Altmetric

Share

Export metadata

Dirección

Contacto

Legal

De interés

Adobe PDF (1,51 MB)

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Share

Export metadata

Dirección

Google Scholar^TM