Identificador persistente para citar o vincular este elemento:
http://hdl.handle.net/10553/70579
Título: | Deep multi-biometric fusion for audio-visual user re-identification and verification | Autores/as: | Marras, Mirko Marín-Reyes, Pedro A. Lorenzo-Navarro, Javier Castrillón-Santana, Modesto Fenu, Gianni |
Clasificación UNESCO: | 120304 Inteligencia artificial | Palabras clave: | Audio-visual learning Cross-modal biometrics Deep biometric fusion Multi-biometric system Re-identification, et al. |
Fecha de publicación: | 2020 | Editor/a: | Springer | Proyectos: | Identificación Automática de Oradores en Sesiones Parlamentarias Usando Características Audiovisuales. | Publicación seriada: | Lecture Notes in Computer Science | Conferencia: | 8th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2019 | Resumen: | From border controls to personal devices, from online exam proctoring to human-robot interaction, biometric technologies are empowering individuals and organizations with convenient and secure authentication and identification services. However, most biometric systems leverage only a single modality, and may face challenges related to acquisition distance, environmental conditions, data quality, and computational resources. Combining evidence from multiple sources at a certain level (e.g., sensor, feature, score, or decision) of the recognition pipeline may mitigate some limitations of the common uni-biometric systems. Such a fusion has been rarely investigated at intermediate level, i.e., when uni-biometric model parameters are jointly optimized during training. In this chapter, we propose a multi-biometric model training strategy that digests face and voice traits in parallel, and we explore how it helps to improve recognition performance in re-identification and verification scenarios. To this end, we design a neural architecture for jointly embedding face and voice data, and we experiment with several training losses and audio-visual datasets. The idea is to exploit the relation between voice characteristics and facial morphology, so that face and voice uni-biometric models help each other to recognize people when trained jointly. Extensive experiments on four real-world datasets show that the biometric feature representation of a uni-biometric model jointly trained performs better than the one computed by the same uni-biometric model trained alone. Moreover, the recognition results are further improved by embedding face and voice data into a single shared representation of the two modalities. The proposed fusion strategy generalizes well on unseen and unheard users, and should be considered as a feasible solution that improves model performance. We expect that this chapter will support the biometric community to shape the research on deep audio-visual fusion in real-world contexts. | URI: | http://hdl.handle.net/10553/70579 | ISBN: | 978-3-030-40013-2 | ISSN: | 0302-9743 | DOI: | 10.1007/978-3-030-40014-9_7 | Fuente: | Pattern Recognition Applications and Methods. ICPRAM 2019. Lecture Notes in Computer Science, v. 11996, p. 136-157 |
Colección: | Capítulo de libro |
Citas SCOPUSTM
9
actualizado el 10-nov-2024
Citas de WEB OF SCIENCETM
Citations
2
actualizado el 10-nov-2024
Visitas
207
actualizado el 01-nov-2024
Google ScholarTM
Verifica
Altmetric
Comparte
Exporta metadatos
Los elementos en ULPGC accedaCRIS están protegidos por derechos de autor con todos los derechos reservados, a menos que se indique lo contrario.