Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/154925
Título: Multimodal Emotion Recognition via Multilevel Fusion of Visual, Audio, and Textual Data
Autores/as: Salas Cáceres, José Ignacio 
Lorenzo Navarro, José Javier 
Castrillón Santana, Modesto Fernando 
Clasificación UNESCO: 120304 Inteligencia artificial
Palabras clave: Multimodal data fusion
Emotion recognition
Biometry
Human-Machine Interaction
Fecha de publicación: 2026
Conferencia: 23rd International Conference on Image Analysis and Processing (ICIAP2025)
Resumen: Human–machine interactions are becoming increasingly common in society, making it important to improve their user experience. In this regard, an accurate emotion recognition system could substantially benefit the experience. This work presents a novel framework for multimodal emotion recognition that performs fusion at multiple levels, feature and score, to effectively combine visual, audio, and textual information. Modality-specific embeddings are extracted using VGGFace for visual data, a Wav2Vec2-Large-Robust model for audio, and BERT for text. These representations are unified via three different feature-level fusion strategies: concatenation, Embrace, and cross-attention. A subsequent score-level fusion employs an adaptive weighted sum to produce the final class probabilities. On the four-emotion classification task of the IEMOCAP dataset, our approach achieves an unweighted accuracy of 73.53%, which represents solid results comparable with some state-of-the-art baselines and demonstrates the added value of visual cues. Our experiments also analyze the impact of fusion and pooling choices, providing insights for future multimodal systems.
URI: https://accedacris.ulpgc.es/jspui/handle/10553/154925
ISBN: 978-3-032-10191-4
ISSN: 0302-9743
DOI: 10.1007/978-3-032-10192-1_45
Colección:Actas de congresos
Vista completa

Google ScholarTM

Verifica

Altmetric


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.