Visual Question Answering Models for Zero-Shot Pedestrian Attribute Recognition: A Comparative Study

Sánchez-Nielsen, Elena; Castrillón-Santana, Modesto; Freire Obregón, David Sebastián; Santana Jaria, Oliverio Jesús; Hernández-Sosa, Daniel; Lorenzo-Navarro, Javier

Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/131973

Campo DC	Valor	idioma
dc.contributor.author	Sánchez-Nielsen, Elena	en_US
dc.contributor.author	Castrillón-Santana, Modesto	en_US
dc.contributor.author	Freire Obregón, David Sebastián	en_US
dc.contributor.author	Santana Jaria, Oliverio Jesús	en_US
dc.contributor.author	Hernández-Sosa, Daniel	en_US
dc.contributor.author	Lorenzo-Navarro, Javier	en_US
dc.date.accessioned	2024-07-01T08:46:44Z	-
dc.date.available	2024-07-01T08:46:44Z	-
dc.date.issued	2024	en_US
dc.identifier.issn	2661-8907	en_US
dc.identifier.other	Scopus	-
dc.identifier.uri	https://accedacris.ulpgc.es/handle/10553/131973	-
dc.description.abstract	Pedestrian Attribute Recognition (PAR) poses a significant challenge in developing automatic systems that enhance visual surveillance and human interaction. In this study, we investigate using Visual Question Answering (VQA) models to address the zero-shot PAR problem. Inspired by the impressive results achieved by a zero-shot VQA strategy during the PAR Contest at the 20th International Conference on Computer Analysis of Images and Patterns in 2023, we conducted a comparative study across three state-of-the-art VQA models, two of them based on BLIP-2 and the third one based on the Plug-and-Play VQA framework. Our analysis focuses on performance, robustness, contextual question handling, processing time, and classification errors. Our findings demonstrate that both BLIP-2-based models are better suited for PAR, with nuances related to the adopted frozen Large Language Model. Specifically, the Open Pre-trained Transformers based model performs well in benchmark color estimation tasks, while FLANT5XL provides better results for the considered binary tasks. In summary, zero-shot PAR based on VQA models offers highly competitive results, with the advantage of avoiding training costs associated with multipurpose classifiers.	en_US
dc.language	eng	en_US
dc.relation.ispartof	SN Computer Science	en_US
dc.source	SN Computer Science [ISSN 2661-8907], v. 5, (680), (Junio 2024)	en_US
dc.subject	120304 Inteligencia artificial	en_US
dc.subject.other	Pedestrian attribute recognition	en_US
dc.subject.other	Biometrics	en_US
dc.subject.other	Vision language models	en_US
dc.subject.other	Visual question answering	en_US
dc.title	Visual Question Answering Models for Zero-Shot Pedestrian Attribute Recognition: A Comparative Study	en_US
dc.type	info:eu-repo/semantics/article	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1007/s42979-024-02985-0	en_US
dc.identifier.scopus	85197373236	-
dc.contributor.orcid	0000-0002-8673-2725	-
dc.contributor.orcid	NO DATA	-
dc.contributor.orcid	NO DATA	-
dc.contributor.orcid	NO DATA	-
dc.contributor.orcid	NO DATA	-
dc.contributor.orcid	NO DATA	-
dc.contributor.authorscopusid	57218418238	-
dc.contributor.authorscopusid	13105159100	-
dc.contributor.authorscopusid	23396618800	-
dc.contributor.authorscopusid	7003605046	-
dc.contributor.authorscopusid	6507124168	-
dc.contributor.authorscopusid	15042453800	-
dc.identifier.eissn	2661-8907	-
dc.identifier.issue	6	-
dc.relation.volume	5	en_US
dc.investigacion	Ingeniería y Arquitectura	en_US
dc.type2	Artículo	en_US
dc.description.numberofpages	13	en_US
dc.utils.revision	Sí	en_US
dc.date.coverdate	Junio 2024	en_US
dc.identifier.ulpgc	Sí	en_US
dc.contributor.buulpgc	BU-INF	en_US
dc.description.sjr	0,565	-
dc.description.sjrq	Q2	-
item.grantfulltext	open	-
item.fulltext	Con texto completo	-
crisitem.author.dept	GIR SIANI: Inteligencia Artificial, Robótica y Oceanografía Computacional	-
crisitem.author.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.dept	GIR SIANI: Inteligencia Artificial, Robótica y Oceanografía Computacional	-
crisitem.author.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.dept	GIR SIANI: Inteligencia Artificial, Robótica y Oceanografía Computacional	-
crisitem.author.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.dept	GIR SIANI: Inteligencia Artificial, Robótica y Oceanografía Computacional	-
crisitem.author.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.dept	GIR SIANI: Inteligencia Artificial, Robótica y Oceanografía Computacional	-
crisitem.author.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.dept	Departamento de Informática y Sistemas	-
crisitem.author.orcid	0000-0002-8673-2725	-
crisitem.author.orcid	0000-0003-2378-4277	-
crisitem.author.orcid	0000-0001-7511-5783	-
crisitem.author.orcid	0000-0003-3022-7698	-
crisitem.author.orcid	0000-0002-2834-2067	-
crisitem.author.parentorg	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.parentorg	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.parentorg	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.parentorg	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.parentorg	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.fullName	Castrillón Santana, Modesto Fernando	-
crisitem.author.fullName	Freire Obregón, David Sebastián	-
crisitem.author.fullName	Santana Jaria, Oliverio Jesús	-
crisitem.author.fullName	Hernández Sosa, José Daniel	-
crisitem.author.fullName	Lorenzo Navarro, José Javier	-
Colección:	Artículos

Adobe PDF (1,73 MB)

Vista resumida

Visitas

1

actualizado el 10-ene-2026

Adobe PDF (1,73 MB)

Visitas

Google Scholar^TM

Altmetric

Comparte

Exporta metadatos

Dirección

Contacto

Legal

De interés

Adobe PDF (1,73 MB)

Visitas

Google ScholarTM

Altmetric

Comparte

Exporta metadatos

Dirección

Google Scholar^TM