Please use this identifier to cite or link to this item:
http://hdl.handle.net/10553/70579
Title: | Deep multi-biometric fusion for audio-visual user re-identification and verification | Authors: | Marras, Mirko Marín-Reyes, Pedro A. Lorenzo-Navarro, Javier Castrillón-Santana, Modesto Fenu, Gianni |
UNESCO Clasification: | 120304 Inteligencia artificial | Keywords: | Audio-visual learning Cross-modal biometrics Deep biometric fusion Multi-biometric system Re-identification, et al |
Issue Date: | 2020 | Publisher: | Springer | Project: | Identificación Automática de Oradores en Sesiones Parlamentarias Usando Características Audiovisuales. | Journal: | Lecture Notes in Computer Science | Conference: | 8th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2019 | Abstract: | From border controls to personal devices, from online exam proctoring to human-robot interaction, biometric technologies are empowering individuals and organizations with convenient and secure authentication and identification services. However, most biometric systems leverage only a single modality, and may face challenges related to acquisition distance, environmental conditions, data quality, and computational resources. Combining evidence from multiple sources at a certain level (e.g., sensor, feature, score, or decision) of the recognition pipeline may mitigate some limitations of the common uni-biometric systems. Such a fusion has been rarely investigated at intermediate level, i.e., when uni-biometric model parameters are jointly optimized during training. In this chapter, we propose a multi-biometric model training strategy that digests face and voice traits in parallel, and we explore how it helps to improve recognition performance in re-identification and verification scenarios. To this end, we design a neural architecture for jointly embedding face and voice data, and we experiment with several training losses and audio-visual datasets. The idea is to exploit the relation between voice characteristics and facial morphology, so that face and voice uni-biometric models help each other to recognize people when trained jointly. Extensive experiments on four real-world datasets show that the biometric feature representation of a uni-biometric model jointly trained performs better than the one computed by the same uni-biometric model trained alone. Moreover, the recognition results are further improved by embedding face and voice data into a single shared representation of the two modalities. The proposed fusion strategy generalizes well on unseen and unheard users, and should be considered as a feasible solution that improves model performance. We expect that this chapter will support the biometric community to shape the research on deep audio-visual fusion in real-world contexts. | URI: | http://hdl.handle.net/10553/70579 | ISBN: | 978-3-030-40013-2 | ISSN: | 0302-9743 | DOI: | 10.1007/978-3-030-40014-9_7 | Source: | Pattern Recognition Applications and Methods. ICPRAM 2019. Lecture Notes in Computer Science, v. 11996, p. 136-157 |
Appears in Collections: | Capítulo de libro |
SCOPUSTM
Citations
9
checked on Nov 10, 2024
WEB OF SCIENCETM
Citations
2
checked on Nov 10, 2024
Page view(s)
207
checked on Nov 1, 2024
Google ScholarTM
Check
Altmetric
Share
Export metadata
Items in accedaCRIS are protected by copyright, with all rights reserved, unless otherwise indicated.