Estudio acerca del entrenamiento con alineamiento de representaciones multimodales en el espacio latente

Cárdenes Pérez, Ricardo Juan

Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/143106

Campo DC	Valor	idioma
dc.contributor.advisor	Hernández Tejera, Francisco Mario	-
dc.contributor.advisor	Hernández Cabrera, José Juan	-
dc.contributor.author	Cárdenes Pérez, Ricardo Juan	-
dc.date.accessioned	2025-07-20T20:01:29Z	-
dc.date.available	2025-07-20T20:01:29Z	-
dc.date.issued	2025	en_US
dc.identifier.other	Gestión académica
dc.identifier.uri	https://accedacris.ulpgc.es/handle/10553/143106	-
dc.description.abstract	In recent years, there has been significant interest in integrating into deep learning models the ability to process and interpret multiple data modalities, such as text and images, jointly. To achieve this, many modern architectures project these data into a shared latent space, where representations of equivalent concepts—although expressed in different forms—are expected to be semantically aligned. This cross-modal alignment is fundamental for tasks such as cross-generation and for obtaining a more comprehensive modeling of the world around us. This work specifically addresses the problem of aligning latent representations between the text and image modalities, proposing an approach based on variational autoencoders trained jointly. To this end, a reformulation of sequence-to-sequence models is proposed, allowing their integration within a variational framework by incorporating latent variables capable of modeling the inherent uncertainty of natural language. In particular, this will be done for three recurrent architectures widely used in text modeling: LSTM, GRU, and xLSTM. These architectures will be integrated into a joint model with image autoencoders, sharing a common latent space where representations from both modalities converge. Inspired by the theoretical framework of the MVAE, a similar notation and formulation is adopted to ensure conceptual compatibility, although an alternative architecture better suited for sequential tasks is proposed. As part of the work, an extensible experimentation framework has been developed, designed to facilitate scientific collaboration and a more practical, formal, and reproducible approach. This framework allows for defining and executing experiments in a structured manner, accelerating the incorporation of new architectures and multimodal configurations. Experiments are conducted on the MNIST, FashionMNIST, and CelebA datasets. The source code of the experiments, as well as the framework, is available at: https://github.com/ricardocardn/Re-MVAE.	en_US
dc.language	spa	en_US
dc.subject	120317 Informática	en_US
dc.title	Estudio acerca del entrenamiento con alineamiento de representaciones multimodales en el espacio latente	en_US
dc.type	info:eu-repo/semantics/bachelorThesis	en_US
dc.type	BachelorThesis	en_US
dc.contributor.departamento	Departamento de Informática y Sistemas	en_US
dc.contributor.facultad	Escuela de Ingeniería Informática	en_US
dc.investigacion	Ingeniería y Arquitectura	en_US
dc.type2	Trabajo final de grado	en_US
dc.utils.revision	Sí	en_US
dc.identifier.matricula	TFT-36361
dc.identifier.ulpgc	Sí	en_US
dc.contributor.buulpgc	BU-INF	en_US
dc.contributor.titulacion	Grado en Ciencia e Ingeniería de Datos
item.fulltext	Sin texto completo	-
item.grantfulltext	none	-
crisitem.advisor.dept	GIR SIANI: Inteligencia Artificial, Redes Neuronales, Aprendizaje Automático e Ingeniería de Datos	-
crisitem.advisor.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.advisor.dept	Departamento de Informática y Sistemas	-
crisitem.advisor.dept	GIR SIANI: Inteligencia Artificial, Redes Neuronales, Aprendizaje Automático e Ingeniería de Datos	-
crisitem.advisor.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.advisor.dept	Departamento de Informática y Sistemas	-
Colección:	Trabajo final de grado

Vista resumida

Google Scholar^TM

Verifica

Google Scholar^TM

Comparte

Exporta metadatos

Dirección

Contacto

Legal

De interés

Google ScholarTM

Comparte

Exporta metadatos

Dirección

Google Scholar^TM