Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/147255
Campo DC Valoridioma
dc.contributor.authorMartín, Óscar A.en_US
dc.contributor.authorSánchez, Javieren_US
dc.date.accessioned2025-09-19T17:23:23Z-
dc.date.available2025-09-19T17:23:23Z-
dc.date.issued2025en_US
dc.identifier.issn2331-8422en_US
dc.identifier.urihttps://accedacris.ulpgc.es/jspui/handle/10553/147255-
dc.description.abstractNeural networks have become the standard technique for medical diagnostics, especially in cancer detection and classification. This work evaluates the performance of Vision Transformers architectures, including Swin Transformer and MaxViT, in several datasets of magnetic resonance imaging (MRI) and computed tomography (CT) scans. We used three training sets of images with brain, lung, and kidney tumors. Each dataset includes different classification labels, from brain gliomas and meningiomas to benign and malignant lung conditions and kidney anomalies such as cysts and cancers. This work aims to analyze the behavior of the neural networks in each dataset and the benefits of combining different image modalities and tumor classes. We designed several experiments by fine-tuning the models on combined and individual datasets. The results revealed that the Swin Transformer provided high accuracy, achieving up to 99\% on average for individual datasets and 99.4\% accuracy for the combined dataset. This research highlights the adaptability of Transformer-based models to various image modalities and features. However, challenges persist, including limited annotated data and interpretability issues. Future work will expand this study by incorporating other image modalities and enhancing diagnostic capabilities. Integrating these models across diverse datasets could mark a significant advance in precision medicine, paving the way for more efficient and comprehensive healthcare solutions.en_US
dc.languageengen_US
dc.relation.ispartofArXiv.orgen_US
dc.sourceArXiv.org. [2331-8422], v.2, 16 jun,2025en_US
dc.subject120304 Inteligencia artificialen_US
dc.subject.otherBrain tumoren_US
dc.subject.otherLung tumoren_US
dc.subject.otherKidney tumoren_US
dc.subject.otherNeural Networksen_US
dc.subject.otherVision Transformeren_US
dc.subject.otherSwin Transformeren_US
dc.subject.otherMaxViTen_US
dc.titleEvaluation of Vision Transformers for Multimodal Image Classification: A Case Study on Brain, Lung, and Kidney Tumorsen_US
dc.identifier.doi10.48550/arXiv.2502.05517en_US
dc.relation.volume2en_US
dc.investigacionIngeniería y Arquitecturaen_US
dc.description.numberofpages19en_US
dc.utils.revisionen_US
dc.date.coverdatejun 2025en_US
dc.identifier.ulpgcen_US
dc.identifier.ulpgcen_US
dc.identifier.ulpgcen_US
dc.identifier.ulpgcen_US
dc.contributor.buulpgcBU-INFen_US
item.grantfulltextopen-
item.fulltextCon texto completo-
crisitem.author.deptGIR IUCES: Centro de Tecnologías de la Imagen-
crisitem.author.deptIU de Cibernética, Empresa y Sociedad (IUCES)-
crisitem.author.deptDepartamento de Informática y Sistemas-
crisitem.author.orcid0000-0001-8514-4350-
crisitem.author.parentorgIU de Cibernética, Empresa y Sociedad (IUCES)-
crisitem.author.fullNameSánchez Pérez, Javier-
Colección:Artículo preliminar
Adobe PDF (3,13 MB)
Vista resumida

Google ScholarTM

Verifica

Altmetric


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.