Improving user verification in human-robot interaction from audio or image inputs through sample quality assessment

Freire Obregón, David Sebastián; Rosales-Santana, Kevin; Marín Reyes, Pedro Antonio; Peñate Sánchez, Adrián; Lorenzo Navarro, José Javier; Castrillón Santana, Modesto Fernando

Title:	Improving user verification in human-robot interaction from audio or image inputs through sample quality assessment
Authors:	Freire Obregón, David Sebastián Rosales-Santana, Kevin Marín Reyes, Pedro Antonio Peñate Sánchez, Adrián Lorenzo Navarro, José Javier Castrillón Santana, Modesto Fernando
UNESCO Clasification:	120304 Inteligencia artificial 2405 Biometría
Keywords:	Biometric verification Audiovisual verification Human robot interaction
Issue Date:	2021
Project:	ULPGC2018-08 Identificación automática de oradores en sesiones parlamentarias usando características audiovisuales. RTI2018-093337-B-I00 Re-identificación mUltimodal de participaNtes en competiciones dEpoRtivaS
Journal:	Pattern Recognition Letters
Abstract:	In this paper, we tackle the task of improving biometric verification in the context of Human-Robot Interaction (HRI). A robot that wants to identify a specific person to provide a service can do so by either image verification or, if light conditions are not favourable, through voice verification. In our approach, we will take advantage of the possibility a robot has of recovering further data until it is sure of the identity of the person. The key contribution is that we select from both image and audio signals the parts that are of higher confidence. For images we use a system that looks at the face of each person and selects frames in which the confidence is high while keeping those frames separate in time to avoid using very similar facial appearance. For audio our approach tries to find the parts of the signal that contain a person talking, avoiding those in which noise is present by segmenting the signal. Once the parts of interest are found, each input is described with an independent deep learning architecture that obtains a descriptor for each kind of input (face/voice). We also present in this paper fusion methods that improve performance by combining the features from both face and voice, results to validate this are shown for each independent input and for the fusion methods.
URI:	https://accedacris.ulpgc.es/handle/10553/110725
ISSN:	0167-8655
DOI:	10.1016/j.patrec.2021.06.014
Source:	Pattern Recognition Letters, [ISSN 0167-8655] v. 149, p. 179-184, (September 2021)
Appears in Collections:	Artículos

Adobe PDF (1,21 MB)

Show full item record

Adobe PDF (1,21 MB)

WEB OF SCIENCE^TM
Citations

Page view(s)

Download(s)

Google Scholar^TM

Altmetric

Share

Export metadata

Dirección

Contacto

Legal

De interés

Adobe PDF (1,21 MB)

WEB OF SCIENCETM Citations

Page view(s)

Download(s)

Google ScholarTM

Altmetric

Share

Export metadata

Dirección

WEB OF SCIENCE^TM
Citations

Google Scholar^TM