Evaluation of a visual question answering architecture for pedestriana attribute recognition

Castrillón Santana, Modesto Fernando; Sánchez Nielsen,Maria Elena; Freire Obregón, David Sebastián; Santana Jaria, Oliverio Jesús; Hernández Sosa, José Daniel; Lorenzo Navarro, José Javier

Title:	Evaluation of a visual question answering architecture for pedestriana attribute recognition
Authors:	Castrillón Santana, Modesto Fernando Sánchez Nielsen,Maria Elena Freire Obregón, David Sebastián Santana Jaria, Oliverio Jesús Hernández Sosa, José Daniel Lorenzo Navarro, José Javier
UNESCO Clasification:	120304 Inteligencia artificial
Keywords:	Pedestrian attribute recognition Vision language models Visual question answering
Issue Date:	2023
Publisher:	Springer
Project:	Interaccióny Re-Identificación de Personas Mediante Machine Learning, Deep Learningy Análisis de Datos Multimodal: Hacia Una Comunicación Más Natural en la Robótica Social
Journal:	Lecture Notes in Computer Science
Conference:	20th International Conference Computer Analysis of Images and Patterns (CAIP 2023)
Abstract:	Pedestrian attribute recognition (PAR) ensures public safety and security. By automatically detecting attributes such as clothing color, accessories, and hairstyles, surveillance systems can provide valuable information for criminal investigations, aiding in identifying suspects based on their appearances. Additionally, in crowd management scenarios, PAR enables monitoring of specific groups, such as individuals wearing safety gear at construction sites or identifying potential threats in sensitive areas. Real-time attribute recognition enhances situational awareness and facilitates rapid response during emergencies, thereby contributing to public spaces’ overall safety and security. This work proposes applying the BLIP-2 Visual Question Answering (VQA) framework to address the PAR problem. By employing Large Language Models (LLMs), we have achieved an accuracy rate of 92% in the private set. This combination of VQA and LLMs makes it possible to effectively analyze visual information and answer questions related to pedestrian attributes, improving the accuracy and performance of PAR systems.
URI:	https://accedacris.ulpgc.es/handle/10553/124523
ISBN:	978-3-031-44236-0
ISSN:	0302-9743
DOI:	10.1007/978-3-031-44237-7_2
Source:	Computer Analysis of Images and Patterns. CAIP 2023-Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)[ISSN 0302-9743],v. 14184 LNCS, p. 13-22, (Enero 2023)
Appears in Collections:	Actas de congresos

Show full item record

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM

Altmetric

Share

Export metadata

Dirección

Contacto

Legal

De interés

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

Share

Export metadata

Dirección

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM