Leveraging Generalist VQA Models to Improve Zero-Shot Pedestrian Attribute Recognition

Salas Cáceres, José Ignacio

Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/jspui/handle/10553/149206

Campo DC	Valor	idioma
dc.contributor.author	Salas Cáceres, José Ignacio	-
dc.date.accessioned	2025-10-02T17:48:43Z	-
dc.date.available	2025-10-02T17:48:43Z	-
dc.date.issued	2025	-
dc.identifier.isbn	978-3-032-04967-4	-
dc.identifier.issn	0302-9743	-
dc.identifier.other	Scopus	-
dc.identifier.uri	https://accedacris.ulpgc.es/jspui/handle/10553/149206	-
dc.description.abstract	Pedestrian Attribute Recognition (PAR) plays a key role in surveillance scenarios where classical biometric traits, such as facial features, are often unavailable due to low image quality, occlusions, or variable conditions. By extracting soft biometric attributes, such as gender, clothing type, and carried objects, PAR provides essential contextual information that can support tasks like person re-identification and behavior analysis. In this work, a novel approach is proposed based on Visual Question Answering (VQA) models, which avoids the limitations of supervised learning methods by leveraging general-purpose models without the need for additional training. This extends the PAR2023-winning strategy by introducing two state-of-the-art models, PaliGemma 1 and PaliGemma 2, along with a refined set of attribute-specific questions and an innovative fusion mechanism that combines both models’ strengths. Experimental results on the PAR2025 dataset demonstrate that the proposed system surpasses previous methods, achieving a mean accuracy of 95.4% on the private set, outranking previous approaches on this task.	-
dc.language	eng	-
dc.publisher	Springer	-
dc.relation	Interaccióny Re-Identificación de Personas Mediante Machine Learning, Deep Learningy Análisis de Datos Multimodal: Hacia Una Comunicación Más Natural en la Robótica Social	-
dc.relation.ispartof	Lecture Notes in Computer Science	-
dc.source	Computer Analysis of Images and Patterns. CAIP 2025. Lecture Notes in Computer Science, vol. 15621, p. 16–26. Springer, Cham.	-
dc.subject	120304 Inteligencia artificial	-
dc.subject.other	Contest	-
dc.subject.other	Pedestrian Attribute Recognition	-
dc.subject.other	Vision Language Model	-
dc.subject.other	Visual Question Answering	-
dc.title	Leveraging Generalist VQA Models to Improve Zero-Shot Pedestrian Attribute Recognition	-
dc.type	book_content	-
dc.relation.conference	21st International Conference in Computer Analysis of Images and Patterns (CAIP 2025)	-
dc.identifier.doi	10.1007/978-3-032-04968-1_2	-
dc.identifier.scopus	105017376735	-
dc.contributor.orcid	0009-0004-7543-3385	-
dc.contributor.authorscopusid	58745737800	-
dc.identifier.eissn	1611-3349	-
dc.description.lastpage	26	-
dc.description.firstpage	16	-
dc.relation.volume	15621	-
dc.investigacion	Ingeniería y Arquitectura	-
dc.type2	Actas de congresos	-
dc.identifier.eisbn	978-3-032-04968-1	-
dc.utils.revision	Sí	-
dc.date.coverdate	September 2025	-
dc.identifier.conferenceid	events156046	-
dc.identifier.ulpgc	Sí	-
dc.contributor.buulpgc	BU-INF	-
dc.description.sjr	0,352	-
dc.description.sjrq	Q2	-
dc.description.miaricds	10,0	-
item.fulltext	Sin texto completo	-
item.grantfulltext	none	-
crisitem.event.eventsstartdate	25-08-2025	-
crisitem.event.eventsenddate	29-08-2025	-
crisitem.author.dept	GIR SIANI: Inteligencia Artificial, Robótica y Oceanografía Computacional	-
crisitem.author.dept	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.orcid	0009-0004-7543-3385	-
crisitem.author.parentorg	IU de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería	-
crisitem.author.fullName	Salas Cáceres, José Ignacio	-
crisitem.project.principalinvestigator	Castrillón Santana, Modesto Fernando	-
Colección:	Actas de congresos

Vista resumida

Google Scholar^TM

Altmetric

Comparte

Exporta metadatos

Dirección

Contacto

Legal

De interés

Google ScholarTM

Altmetric

Comparte

Exporta metadatos

Dirección

Google Scholar^TM