Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/143106
Campo DC Valoridioma
dc.contributor.advisorHernández Tejera, Francisco Mario-
dc.contributor.advisorHernández Cabrera, José Juan-
dc.contributor.authorCárdenes Pérez, Ricardo Juan-
dc.date.accessioned2025-07-20T20:01:29Z-
dc.date.available2025-07-20T20:01:29Z-
dc.date.issued2025en_US
dc.identifier.otherGestión académica
dc.identifier.urihttps://accedacris.ulpgc.es/handle/10553/143106-
dc.description.abstractIn recent years, there has been significant interest in integrating into deep learning models the ability to process and interpret multiple data modalities, such as text and images, jointly. To achieve this, many modern architectures project these data into a shared latent space, where representations of equivalent concepts—although expressed in different forms—are expected to be semantically aligned. This cross-modal alignment is fundamental for tasks such as cross-generation and for obtaining a more comprehensive modeling of the world around us. This work specifically addresses the problem of aligning latent representations between the text and image modalities, proposing an approach based on variational autoencoders trained jointly. To this end, a reformulation of sequence-to-sequence models is proposed, allowing their integration within a variational framework by incorporating latent variables capable of modeling the inherent uncertainty of natural language. In particular, this will be done for three recurrent architectures widely used in text modeling: LSTM, GRU, and xLSTM. These architectures will be integrated into a joint model with image autoencoders, sharing a common latent space where representations from both modalities converge. Inspired by the theoretical framework of the MVAE, a similar notation and formulation is adopted to ensure conceptual compatibility, although an alternative architecture better suited for sequential tasks is proposed. As part of the work, an extensible experimentation framework has been developed, designed to facilitate scientific collaboration and a more practical, formal, and reproducible approach. This framework allows for defining and executing experiments in a structured manner, accelerating the incorporation of new architectures and multimodal configurations. Experiments are conducted on the MNIST, FashionMNIST, and CelebA datasets. The source code of the experiments, as well as the framework, is available at: https://github.com/ricardocardn/Re-MVAE.en_US
dc.languagespaen_US
dc.subject120317 Informáticaen_US
dc.titleEstudio acerca del entrenamiento con alineamiento de representaciones multimodales en el espacio latenteen_US
dc.typeinfo:eu-repo/semantics/bachelorThesisen_US
dc.typeBachelorThesisen_US
dc.contributor.departamentoDepartamento de Informática y Sistemasen_US
dc.contributor.facultadEscuela de Ingeniería Informáticaen_US
dc.investigacionIngeniería y Arquitecturaen_US
dc.type2Trabajo final de gradoen_US
dc.utils.revisionen_US
dc.identifier.matriculaTFT-36361
dc.identifier.ulpgcen_US
dc.contributor.buulpgcBU-INFen_US
dc.contributor.titulacionGrado en Ciencia e Ingeniería de Datos
item.fulltextSin texto completo-
item.grantfulltextnone-
crisitem.advisor.deptGIR SIANI: Inteligencia Artificial, Redes Neuronales, Aprendizaje Automático e Ingeniería de Datos-
crisitem.advisor.deptIU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería-
crisitem.advisor.deptDepartamento de Informática y Sistemas-
crisitem.advisor.deptGIR SIANI: Inteligencia Artificial, Redes Neuronales, Aprendizaje Automático e Ingeniería de Datos-
crisitem.advisor.deptIU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería-
crisitem.advisor.deptDepartamento de Informática y Sistemas-
Colección:Trabajo final de grado
Vista resumida

Google ScholarTM

Verifica


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.