Evaluation of Vision Transformers for Multimodal Image Classification: A  Case Study on Brain, Lung, and Kidney Tumors

Martín, Óscar A.; Sánchez, Javier

Please use this identifier to cite or link to this item: https://accedacris.ulpgc.es/jspui/handle/10553/147255

DC Field	Value	Language
dc.contributor.author	Martín, Óscar A.	en_US
dc.contributor.author	Sánchez, Javier	en_US
dc.date.accessioned	2025-09-19T17:23:23Z	-
dc.date.available	2025-09-19T17:23:23Z	-
dc.date.issued	2025	en_US
dc.identifier.issn	2331-8422	en_US
dc.identifier.uri	https://accedacris.ulpgc.es/jspui/handle/10553/147255	-
dc.description.abstract	Neural networks have become the standard technique for medical diagnostics, especially in cancer detection and classification. This work evaluates the performance of Vision Transformers architectures, including Swin Transformer and MaxViT, in several datasets of magnetic resonance imaging (MRI) and computed tomography (CT) scans. We used three training sets of images with brain, lung, and kidney tumors. Each dataset includes different classification labels, from brain gliomas and meningiomas to benign and malignant lung conditions and kidney anomalies such as cysts and cancers. This work aims to analyze the behavior of the neural networks in each dataset and the benefits of combining different image modalities and tumor classes. We designed several experiments by fine-tuning the models on combined and individual datasets. The results revealed that the Swin Transformer provided high accuracy, achieving up to 99\% on average for individual datasets and 99.4\% accuracy for the combined dataset. This research highlights the adaptability of Transformer-based models to various image modalities and features. However, challenges persist, including limited annotated data and interpretability issues. Future work will expand this study by incorporating other image modalities and enhancing diagnostic capabilities. Integrating these models across diverse datasets could mark a significant advance in precision medicine, paving the way for more efficient and comprehensive healthcare solutions.	en_US
dc.language	eng	en_US
dc.relation.ispartof	ArXiv.org	en_US
dc.source	ArXiv.org. [2331-8422], v.2, 16 jun,2025	en_US
dc.subject	120304 Inteligencia artificial	en_US
dc.subject.other	Brain tumor	en_US
dc.subject.other	Lung tumor	en_US
dc.subject.other	Kidney tumor	en_US
dc.subject.other	Neural Networks	en_US
dc.subject.other	Vision Transformer	en_US
dc.subject.other	Swin Transformer	en_US
dc.subject.other	MaxViT	en_US
dc.title	Evaluation of Vision Transformers for Multimodal Image Classification: A Case Study on Brain, Lung, and Kidney Tumors	en_US
dc.identifier.doi	10.48550/arXiv.2502.05517	en_US
dc.relation.volume	2	en_US
dc.investigacion	Ingeniería y Arquitectura	en_US
dc.description.numberofpages	19	en_US
dc.utils.revision	Sí	en_US
dc.date.coverdate	jun 2025	en_US
dc.identifier.ulpgc	Sí	en_US
dc.identifier.ulpgc	Sí	en_US
dc.identifier.ulpgc	Sí	en_US
dc.identifier.ulpgc	Sí	en_US
dc.contributor.buulpgc	BU-INF	en_US
item.grantfulltext	open	-
item.fulltext	Con texto completo	-
crisitem.author.dept	GIR IUCES: Centro de Tecnologías de la Imagen	-
crisitem.author.dept	IU de Cibernética, Empresa y Sociedad	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.orcid	0000-0001-8514-4350	-
crisitem.author.parentorg	IU de Cibernética, Empresa y Sociedad	-
crisitem.author.fullName	Sánchez Pérez, Javier	-
Appears in Collections:	Artículo preliminar

Adobe PDF (3,13 MB)

Show simple item record

Page view(s)

1

checked on Jan 11, 2026

Download(s)

3

checked on Jan 11, 2026

Adobe PDF (3,13 MB)

Page view(s)

Download(s)

Google Scholar^TM

Altmetric

Share

Export metadata

Dirección

Contacto

Legal

De interés

Adobe PDF (3,13 MB)

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Share

Export metadata

Dirección

Google Scholar^TM