Identificador persistente para citar o vincular este elemento: https://accedacris.ulpgc.es/handle/10553/131973
Title: Visual Question Answering Models for Zero-Shot Pedestrian Attribute Recognition: A Comparative Study
Authors: Sánchez-Nielsen, Elena
Castrillón-Santana, Modesto 
Freire Obregón, David Sebastián 
Santana Jaria, Oliverio Jesús 
Hernández-Sosa, Daniel 
Lorenzo-Navarro, Javier 
UNESCO Clasification: 120304 Inteligencia artificial
Keywords: Pedestrian attribute recognition
Biometrics
Vision language models
Visual question answering
Issue Date: 2024
Journal: SN Computer Science 
Abstract: Pedestrian Attribute Recognition (PAR) poses a significant challenge in developing automatic systems that enhance visual surveillance and human interaction. In this study, we investigate using Visual Question Answering (VQA) models to address the zero-shot PAR problem. Inspired by the impressive results achieved by a zero-shot VQA strategy during the PAR Contest at the 20th International Conference on Computer Analysis of Images and Patterns in 2023, we conducted a comparative study across three state-of-the-art VQA models, two of them based on BLIP-2 and the third one based on the Plug-and-Play VQA framework. Our analysis focuses on performance, robustness, contextual question handling, processing time, and classification errors. Our findings demonstrate that both BLIP-2-based models are better suited for PAR, with nuances related to the adopted frozen Large Language Model. Specifically, the Open Pre-trained Transformers based model performs well in benchmark color estimation tasks, while FLANT5XL provides better results for the considered binary tasks. In summary, zero-shot PAR based on VQA models offers highly competitive results, with the advantage of avoiding training costs associated with multipurpose classifiers.
URI: https://accedacris.ulpgc.es/handle/10553/131973
ISSN: 2661-8907
DOI: 10.1007/s42979-024-02985-0
Source: SN Computer Science [ISSN 2661-8907], v. 5, (680), (Junio 2024)
Appears in Collections:Artículos
Adobe PDF (1,73 MB)
Show full item record

Page view(s)

89
checked on Oct 31, 2024

Google ScholarTM

Check

Altmetric


Share



Export metadata



Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.