{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T15:42:49Z","timestamp":1740152569978,"version":"3.37.3"},"reference-count":63,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T00:00:00Z","timestamp":1674086400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"In this work we present a bimodal multitask network for audiovisual biometric recognition. The proposed network performs the fusion of features extracted from face and speech data through a weighted sum to jointly optimize the contribution of each modality, aiming for the identification of a client. The extracted speech features are simultaneously used in a speech recognition task with random digit sequences. Text prompted verification is performed by fusing the scores obtained from the matching of bimodal embeddings with the Word Error Rate (WER) metric calculated from the accuracy of the transcriptions. The score fusion outputs a value that can be compared with a threshold to accept or reject the identity of a client. Training and evaluation was carried out by using our proprietary database BIOMEX-DB and VidTIMIT audiovisual database. Our network achieved an accuracy of 100% and an Equal Error Rate (EER) of 0.44% for identification and verification, respectively, in the best case. To the best of our knowledge, this is the first system that combines the mutually related tasks previously described for biometric recognition.<\/jats:p>","DOI":"10.3390\/a16020066","type":"journal-article","created":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T10:06:14Z","timestamp":1674122774000},"page":"66","source":"Crossref","is-referenced-by-count":1,"title":["Audiovisual Biometric Network with Deep Feature Fusion for Identification and Text Prompted Verification"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8663-8130","authenticated-orcid":false,"given":"Juan","family":"Atenco","sequence":"first","affiliation":[{"name":"Department of Electronics, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1, Sta Mar\u00eda Tonanzintla, San Andr\u00e9s Cholula, Puebla 72840, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9935-0075","authenticated-orcid":false,"given":"Juan","family":"Moreno","sequence":"additional","affiliation":[{"name":"Department of Electronics, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1, Sta Mar\u00eda Tonanzintla, San Andr\u00e9s Cholula, Puebla 72840, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8515-2489","authenticated-orcid":false,"given":"Juan","family":"Ramirez","sequence":"additional","affiliation":[{"name":"Department of Electronics, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1, Sta Mar\u00eda Tonanzintla, San Andr\u00e9s Cholula, Puebla 72840, Mexico"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,19]]},"reference":[{"key":"ref_1","unstructured":"Minaee, S., Abdolrashidi, A., Su, H., Bennamoun, M., and Zhang, D. (2019). Biometrics recognition using deep learning: A survey. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1016\/j.inffus.2018.11.018","article-title":"Multibiometric fusion strategy and its applications: A review","volume":"49","author":"Modak","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_3","first-page":"2276","article-title":"A comprehensive survey on various biometric systems","volume":"13","author":"Sabhanayagam","year":"2018","journal-title":"Int. J. Appl. Eng. Res."},{"key":"ref_4","first-page":"25","article-title":"Multimodal biometric system: A review","volume":"4","author":"Dahea","year":"2018","journal-title":"Int. J. Res. Adv. Eng. Technol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"6247","DOI":"10.1109\/ACCESS.2017.2694050","article-title":"The fall of one, the rise of many: A survey on multi-biometric fusion methods","volume":"5","author":"Dinca","year":"2017","journal-title":"IEEE Access"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/j.inffus.2017.12.003","article-title":"Multiple classifiers in biometrics. part 1: Fundamentals and review","volume":"44","author":"Fierrez","year":"2018","journal-title":"Inf. Fusion"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.inffus.2018.12.003","article-title":"A comprehensive overview of biometric fusion","volume":"52","author":"Singh","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Mar\u00edn-Jim\u00e9nez, M.J., Castro, F.M., Guil, N., De la Torre, F., and Medina-Carnicer, R. (2017, January 17\u201320). Deep multi-task learning for gait-based biometrics. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296252"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"7907","DOI":"10.1109\/ACCESS.2020.2964048","article-title":"Joint decision of anti-spoofing and automatic speaker verification by multi-task learning with contrastive loss","volume":"8","author":"Li","year":"2020","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"102581","DOI":"10.1016\/j.adhoc.2021.102581","article-title":"Robust deep identification using ECG and multimodal biometrics for industrial internet of things","volume":"121","author":"Yeun","year":"2021","journal-title":"Ad. Hoc. Netw."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TMM.2020.2975922","article-title":"End-to-end audiovisual speech recognition system with multitask learning","volume":"23","author":"Tao","year":"2020","journal-title":"IEEE Trans. Multimed."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kortli, Y., Jridi, M., Al Falou, A., and Atri, M. (2020). Face recognition systems: A survey. Sensors, 20.","DOI":"10.3390\/s20020342"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"99112","DOI":"10.1109\/ACCESS.2021.3096136","article-title":"Recent advances in deep learning techniques for face recognition","volume":"9","author":"Fuad","year":"2021","journal-title":"IEEE Access"},{"key":"ref_14","unstructured":"Kalaiarasi, P., and Esther Rani, P. (2021). Advances in Smart System Technologies, Springer."},{"key":"ref_15","first-page":"5488","article-title":"Face recognition for presence system by using residual networks-50 architecture","volume":"11","author":"Pratama","year":"2021","journal-title":"Int. J. Electr. Comput. Eng."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"William, I., Rachmawanto, E.H., Santoso, H.A., and Sari, C.A. (2019, January 16\u201317). Face recognition using facenet (survey, performance test, and comparison). Proceedings of the 2019 fourth international conference on informatics and computing (ICIC), Semarang, Indonesia.","DOI":"10.1109\/ICIC47613.2019.8985786"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Nandy, A. (2019, January 27\u201328). A densenet based robust face detection framework. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00229"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gwyn, T., Roy, K., and Atay, M. (2021). Face recognition using popular deep net architectures: A brief comparative study. Future Internet, 13.","DOI":"10.3390\/fi13070164"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1-1","DOI":"10.1002\/cpe.5851","article-title":"Feature extraction based on deep-convolutional neural network for face recognition","volume":"32","author":"Li","year":"2020","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Pei, Z., Xu, H., Zhang, Y., Guo, M., and Yang, Y.H. (2019). Face recognition via deep learning using data augmentation based on orthogonal experiments. Electronics, 8.","DOI":"10.3390\/electronics8101088"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1016\/j.csl.2017.07.010","article-title":"Incorporating pass-phrase dependent background models for text-dependent speaker verification","volume":"47","author":"Sarkar","year":"2018","journal-title":"Comput. Speech Lang."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1109\/TASL.2010.2064307","article-title":"Front-end factor analysis for speaker verification","volume":"19","author":"Dehak","year":"2010","journal-title":"IEEE Trans. Audio, Speech, Lang. Process."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, Y., He, L., Tian, Y., Chen, Z., Liu, J., and Johnson, M.T. (2017, January 16\u201320). Comparison of multiple features and modeling methods for text-dependent speaker verification. Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan.","DOI":"10.1109\/ASRU.2017.8268995"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Novoselov, S., Kudashev, O., Shchemelinin, V., Kremnev, I., and Lavrentyeva, G. (2018, January 15\u201320). Deep cnn based feature extractor for text-prompted speaker recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462358"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., and Khudanpur, S. (2018, January 15\u201320). X-vectors: Robust dnn embeddings for speaker recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461375"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Jung, J., Heo, H., Yang, I., Yoon, S., Shim, H., and Yu, H. (2017, January 2\u20133). D-vector based speaker verification system using Raw Waveform CNN. Proceedings of the 2017 International Seminar on Artificial Intelligence, Networking and Information Technology (Anit 2017), Bangkok, Thailand.","DOI":"10.2991\/anit-17.2018.21"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Muckenhirn, H., Doss, M.M., and Marcell, S. (2018, January 15\u201320). Towards directly modeling raw speech signal for speaker verification using CNNs. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462165"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ravanelli, M., and Bengio, Y. (2018, January 18\u201321). Speaker recognition from raw waveform with sincnet. Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece.","DOI":"10.1109\/SLT.2018.8639585"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Tripathi, M., Singh, D., and Susan, S. (2020, January 12\u201314). Speaker recognition using SincNet and X-vector fusion. Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland.","DOI":"10.1007\/978-3-030-61401-0_24"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Chowdhury, L., Zunair, H., and Mohammed, N. (2020). Robust deep speaker recognition: Learning latent representation with joint angular margin loss. Appl. Sci., 10.","DOI":"10.3390\/app10217522"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.neunet.2021.03.004","article-title":"Speaker recognition based on deep learning: An overview","volume":"140","author":"Bai","year":"2021","journal-title":"Neural Netw."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"37431","DOI":"10.1109\/ACCESS.2021.3063031","article-title":"Audio-visual biometric recognition and presentation attack detection: A comprehensive survey","volume":"9","author":"Mandalapu","year":"2021","journal-title":"IEEE Access"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"34541","DOI":"10.1109\/ACCESS.2021.3061589","article-title":"Continuous multimodal biometric authentication schemes: A systematic review","volume":"9","author":"Ryu","year":"2021","journal-title":"IEEE Access"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Talreja, V., Valenti, M.C., and Nasrabadi, N.M. (2017, January 14\u201316). Multibiometric secure system based on deep learning. Proceedings of the 2017 IEEE Global conference on Signal and Information Processing (globalSIP), Montreal, QC, Canada.","DOI":"10.1109\/GlobalSIP.2017.8308652"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"21418","DOI":"10.1109\/ACCESS.2018.2815540","article-title":"Multimodal feature-level fusion for biometrics identification system on IoMT platform","volume":"6","author":"Xin","year":"2018","journal-title":"IEEE Access"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Olazabal, O., Gofman, M., Bai, Y., Choi, Y., Sandico, N., Mitra, S., and Pham, K. (2019, January 7\u20139). Multimodal biometrics for enhanced iot security. Proceedings of the 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NA, USA.","DOI":"10.1109\/CCWC.2019.8666599"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1572","DOI":"10.1109\/TIFS.2019.2944058","article-title":"LVID: A multimodal biometrics authentication system on smartphones","volume":"15","author":"Wu","year":"2019","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Alay, N., and Al-Baity, H.H. (2020). Deep learning approach for multimodal biometric recognition system based on fusion of iris, face, and finger vein traits. Sensors, 20.","DOI":"10.3390\/s20195523"},{"key":"ref_39","first-page":"6","article-title":"Multimodal biometrics recognition from facial video with missing modalities using deep learning","volume":"16","author":"Maity","year":"2020","journal-title":"J. Inf. Process. Syst."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"e248","DOI":"10.7717\/peerj-cs.248","article-title":"Convolutional neural networks approach for multimodal biometric identification system using the fusion of fingerprint, finger-vein and face images","volume":"6","author":"Alaoui","year":"2020","journal-title":"PeerJ Comput. Sci."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"102757","DOI":"10.1109\/ACCESS.2020.2999115","article-title":"An efficient android-based multimodal biometric authentication system with face and voice","volume":"8","author":"Zhang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Leghari, M., Memon, S., Dhomeja, L.D., Jalbani, A.H., and Chandio, A.A. (2021). Deep feature fusion of fingerprint and online signature for multimodal biometrics. Computers, 10.","DOI":"10.3390\/computers10020021"},{"key":"ref_43","unstructured":"Liu, M., Wang, L., Lee, K.A., Zhang, H., Zeng, C., and Dang, J. (2021). Exploring Deep Learning for Joint Audio-Visual Lip Biometrics. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1109\/LSP.2021.3079850","article-title":"A deep feature fusion network based on multiple attention mechanisms for joint iris-periocular biometric recognition","volume":"28","author":"Luo","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"7914","DOI":"10.1109\/ACCESS.2022.3143433","article-title":"Multimodal Biometric Recognition Based on 3D Ultrasound Palmprint-Hand Geometry Fusion","volume":"10","author":"Iula","year":"2022","journal-title":"IEEE Access"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-021-04652-3","article-title":"Enhanced multimodal biometric recognition approach for smart cities based on an optimized fuzzy genetic algorithm","volume":"12","author":"Rajasekar","year":"2022","journal-title":"Sci. Rep."},{"key":"ref_47","first-page":"102707","article-title":"Deep belief network-based hybrid model for multimodal biometric system for futuristic security applications","volume":"58","author":"Vijay","year":"2021","journal-title":"J. Inf. Secur. Appl."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1109\/TIFS.2018.2833033","article-title":"Deep feature fusion for iris and periocular biometrics on mobile devices","volume":"13","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"111267","DOI":"10.1109\/ACCESS.2021.3100035","article-title":"BIOMEX-DB: A Cognitive Audiovisual Dataset for Unimodal and Multimodal Biometric Systems","volume":"9","year":"2021","journal-title":"IEEE Access"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Sanderson, C., and Lovell, B.C. (2009, January 2\u20135). Multi-region probabilistic histograms for robust and scalable identity inference. Proceedings of the International Conference on Biometrics, Alghero, Italy.","DOI":"10.1007\/978-3-642-01793-3_21"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Ko, T., Peddinti, V., Povey, D., Seltzer, M.L., and Khudanpur, S. (2017, January 5\u20139). A study on data augmentation of reverberant speech for robust speech recognition. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7953152"},{"key":"ref_52","unstructured":"Snyder, D., Chen, G., and Povey, D. (2015). Musan: A music, speech, and noise corpus. arXiv."},{"key":"ref_53","first-page":"6","article-title":"Speech recognition based on convolutional neural networks and MFCC algorithm","volume":"1","author":"Mahmood","year":"2021","journal-title":"Adv. Artif. Intell. Res."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"e453","DOI":"10.7717\/peerj.453","article-title":"scikit-image: Image processing in Python","volume":"2","author":"Boulogne","year":"2014","journal-title":"PeerJ"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"15503","DOI":"10.1007\/s00521-020-04748-3","article-title":"A survey on face data augmentation for the training of deep neural networks","volume":"32","author":"Wang","year":"2020","journal-title":"Neural Comput. Appl."},{"key":"ref_56","unstructured":"Jung, A.B., Wada, K., Crall, J., Tanaka, S., Graving, J., Reinders, C., Yadav, S., Banerjee, J., Vecsei, G., and Kraft, A. (2020, February 01). Imgaug. Available online: https:\/\/github.com\/aleju\/imgaug."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Graves, A., Fern\u00e1ndez, S., Gomez, F., and Schmidhuber, J. (2006, January 25\u201329). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd international conference on Machine learning, Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143891"},{"key":"ref_58","unstructured":"Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., and Chen, G. (2016, January 20\u201322). Deep speech 2: End-to-end speech recognition in english and mandarin. Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Zenkel, T., Sanabria, R., Metze, F., Niehues, J., Sperber, M., St\u00fcker, S., and Waibel, A. (2017). Comparison of decoding strategies for ctc acoustic models. arXiv.","DOI":"10.21437\/Interspeech.2017-1683"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_61","unstructured":"Cheng, J.M., and Wang, H.C. (2006, January 13\u201316). A method of estimating the equal error rate for automatic speaker verification. Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, Singapore."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_63","first-page":"30","article-title":"A Comparative Study of Eigenface and Fisherface Algorithms Based on OpenCV and Sci-kit Libraries Implementations","volume":"14","author":"Aliyu","year":"2022","journal-title":"Int. J. Inf. Eng. Electron. Bus."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/2\/66\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T17:08:42Z","timestamp":1705943322000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/2\/66"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,19]]},"references-count":63,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["a16020066"],"URL":"https:\/\/doi.org\/10.3390\/a16020066","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2023,1,19]]}}}