Identificador persistente para citar o vincular este elemento: http://hdl.handle.net/10553/131973
Título: Visual Question Answering Models for Zero-Shot Pedestrian Attribute Recognition: A Comparative Study
Autores/as: Sánchez-Nielsen, Elena
Castrillón-Santana, Modesto 
Freire Obregón, David Sebastián 
Santana Jaria, Oliverio Jesús 
Hernández-Sosa, Daniel 
Lorenzo-Navarro, Javier 
Clasificación UNESCO: 120304 Inteligencia artificial
Palabras clave: Pedestrian attribute recognition
Biometrics
Vision language models
Visual question answering
Fecha de publicación: 2024
Publicación seriada: SN Computer Science 
Resumen: Pedestrian Attribute Recognition (PAR) poses a significant challenge in developing automatic systems that enhance visual surveillance and human interaction. In this study, we investigate using Visual Question Answering (VQA) models to address the zero-shot PAR problem. Inspired by the impressive results achieved by a zero-shot VQA strategy during the PAR Contest at the 20th International Conference on Computer Analysis of Images and Patterns in 2023, we conducted a comparative study across three state-of-the-art VQA models, two of them based on BLIP-2 and the third one based on the Plug-and-Play VQA framework. Our analysis focuses on performance, robustness, contextual question handling, processing time, and classification errors. Our findings demonstrate that both BLIP-2-based models are better suited for PAR, with nuances related to the adopted frozen Large Language Model. Specifically, the Open Pre-trained Transformers based model performs well in benchmark color estimation tasks, while FLANT5XL provides better results for the considered binary tasks. In summary, zero-shot PAR based on VQA models offers highly competitive results, with the advantage of avoiding training costs associated with multipurpose classifiers.
URI: http://hdl.handle.net/10553/131973
ISSN: 2661-8907
DOI: 10.1007/s42979-024-02985-0
Fuente: SN Computer Science [ISSN 2661-8907], v. 5, arti. 680 (Junio 2024)
Colección:Artículos
Adobe PDF (1,73 MB)
Vista completa

Google ScholarTM

Verifica

Altmetric


Comparte



Exporta metadatos



Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.