Serial Multimodal Biometrics Authentication and Liveness Detection Using Speech Recognition with Normalized Longest Word Subsequence Method

Rafi Andrian - Bina Nusantara, Jakarta, Indonesia, 11480
Gede Putra Kusuma - Bina Nusantara, Jakarta, Indonesia, 11480


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.8.3.2247

Abstract


Biometric authentication aims to verify whether an entity matches the claimed identity based on biometric data. Despite its advantages, vulnerabilities, particularly those related to spoofing, still exist. Efforts to mitigate these vulnerabilities include multimodal approaches and liveness detection. However, these strategies may potentially increase resource requirements in the authentication process. This paper proposes a multimodal authentication process incorporating voice and facial recognition, with liveness detection applied to voice data using speech recognition. This paper introduces Normalized Longest Word Subsequence (NLWS), a combination of Intersection Over Union (IOU) and the longest common subsequence, to compare the prompted system sentence with the user's spoken sentence at speech recognition. Unlike the Word Error Rate (WER), NLWS has a measurable range between 1 and 0. Furthermore, the paper introduces decision-level fusion in the multimodal approach, employing two threshold levels in voice authentication. This approach aims to reduce resource requirements while enhancing the overall security of the authentication process. This paper uses cosine similarity, Euclidean distance, random forest, and extreme gradient boosting (XGBoost) to measure distance or similarity. The results show that the proposed method has better accuracy compared to unimodal approaches, achieving accuracies of 98.44%, 98.83%, 97.46%, and 99.22% using cosine similarity, Euclidean distance, random forest, and XGBoost calculations. The proposed method also demonstrates resource savings, reducing from 5.19 MB to 0.792 MB, from 7.3294 MB to 1.9437 MB, from 6.6512 MB to 1.3284 MB, and from 7.8632 MB to 2.1517 MB in different distance or similarity measurements


Keywords


Multimodal Biometric; Voice and Face Authentication; Serial Fusion; Liveness Detection; Speech Recognition

Full Text:

PDF

References


P. Campisi, E. Maiorana, and A. Neri, “On-Line Signature-Based Authentication: Template Security Issues and Countermeasures,” in Biometrics, Hoboken, NJ, USA: John Wiley & Sons, Inc., 2009, pp. 497–538. doi: 10.1002/9780470522356.ch20.

M. T. S. Al-Kaltakchi, R. R. O. Al-Nima, M. Alfathe, and M. A. M. Abdullah, “Speaker Verification Using Cosine Distance Scoring with i-vector Approach,” in 2020 International Conference on Computer Science and Software Engineering (CSASE), 2020, pp. 157–161. doi: 10.1109/CSASE48920.2020.9142088.

M. Zulfiqar, F. Syed, M. J. Khan, and K. Khurshid, “Deep Face Recognition for Biometric Authentication,” in 1st International Conference on Electrical, Communication and Computer Engineering, ICECCE 2019, Institute of Electrical and Electronics Engineers Inc., Jul. 2019. doi: 10.1109/ICECCE47252.2019.8940725.

O. Olazabal et al., “Multimodal Biometrics for Enhanced IoT Security,” in 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), 2019, pp. 886–893. doi: 10.1109/CCWC.2019.8666599.

S. S. Sengar, U. Hariharan, and K. Rajkumar, “Multimodal Biometric Authentication System using Deep Learning Method,” in 2020 International Conference on Emerging Smart Computing and Informatics, ESCI 2020, Institute of Electrical and Electronics Engineers Inc., Mar. 2020, pp. 309–312. doi: 10.1109/ESCI48226.2020.9167512.

A. Abozaid, A. Haggag, H. Kasban, and M. Eltokhy, “Multimodal Biometric Scheme for Human Authentication Technique Based on Voice and Face Recognition Fusion,” Multimed Tools Appl, vol. 78, no. 12, pp. 16345–16361, Jun. 2019, doi: 10.1007/s11042-018-7012-3.

Bella, J. Hendryli, and D. E. Herwindiati, “Voice Authentication Model for One-time Password Using Deep Learning Models,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Jan. 2020, pp. 35–39. doi: 10.1145/3378904.3378908.

M. Slivova, M. Voznak, J. Tovarek, and P. Partila, “Detection of Speaker Liveness with CNN Isolated Word ASR for Verification Systems,” Multimed Tools Appl, vol. 81, no. 7, pp. 9445–9457, Mar. 2022, doi: 10.1007/s11042-021-11150-1.

R. Errattahi, A. El Hannani, and H. Ouahmane, “Automatic Speech Recognition Errors Detection and Correction: A Review,” Procedia Comput Sci, vol. 128, pp. 32–37, 2018, doi: 10.1016/j.procs.2018.03.005.

A. N. Gajjar and J. Jethva, “Intersection Over Union Based Analysis of Image Detection/Segmentation using CNN Model,” in 2022 Second International Conference on Power, Control and Computing Technologies (ICPC2T), 2022, pp. 1–6. doi: 10.1109/ICPC2T53885.2022.9776896.

F. Van Beers, A. Lindström, E. Okafor, and M. A. Wiering, “Deep Neural networks with intersection over union loss for binary image segmentation,” in ICPRAM 2019 - Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods, SciTePress, 2019, pp. 438–445. doi: 10.5220/0007347504380445.

E. Kalbaliyev and S. Rustamov, “Text Similarity Detection Using Machine Learning Algorithms with Character-Based Similarity Measures,” in Digital Interaction and Machine Intelligence, J. and O. J. W. and R. A. and S. M. Biele Cezary and Kacprzyk, Ed., Cham: Springer International Publishing, 2021, pp. 11–19.

F. Mira and W. Huang, “Performance Evaluation of String Based Malware Detection Methods,” in 2018 24th International Conference on Automation and Computing (ICAC), 2018, pp. 1–6. doi: 10.23919/IConAC.2018.8749096.

S. S. Harakannanavar, P. C. Renukamurthy, and K. B. Raja, “Comprehensive Study of Biometric Authentication Systems, Challenges and Future Trends,” International Journal of Advanced Networking and Applications, vol. 10, no. 4, pp. 3958–3968, 2019, doi: 10.35444/ijana.2019.10048.

E. A. W. Hachim, M. T. Gaata, and T. Abbas, “Voice-Authentication Model Based on Deep Learning for Cloud Environment,” JOIV : International Journal on Informatics Visualization, vol. 7, no. 3, p. 864, Sep. 2023, doi: 10.30630/joiv.7.3.1303.

J. W. Jung, H. S. Heo, I. H. Yang, H. J. Shim, and H. J. Yu, “A Complete End-to-End Speaker Verification System using Deep Neural Networks: From raw signals to verification result,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Institute of Electrical and Electronics Engineers Inc., Sep. 2018, pp. 5349–5353. doi: 10.1109/ICASSP.2018.8462575.

J. M. Coria, H. Bredin, S. Ghannay, and S. Rosset, “A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification,” in Statistical Language and Speech Processing, L. Espinosa-Anke, C. Martín-Vide, and I. Spasić, Eds., Cham: Springer International Publishing, 2020, pp. 137–148.

M. Ravanelli and Y. Bengio, “Speaker Recognition from Raw Waveform with SincNet,” in 2018 IEEE Spoken Language Technology Workshop (SLT), IEEE, Dec. 2018, pp. 1021–1028. doi: 10.1109/SLT.2018.8639585.

D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-Vectors: Robust DNN Embeddings for Speaker Recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Apr. 2018, pp. 5329–5333. doi: 10.1109/ICASSP.2018.8461375.

V. Wati, K. Kusrini, H. Al Fatta, and N. Kapoor, “Security of Facial Biometric Suthentication for Attendance System,” Multimed Tools Appl, vol. 80, no. 15, pp. 23625–23646, 2021, doi: 10.1007/s11042-020-10246-4.

Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “VGGFace2: A Dataset for Recognising Faces across Pose and Age,” in 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018, pp. 67–74. doi: 10.1109/FG.2018.00020.

J. Deng, J. Guo, J. Yang, N. Xue, I. Cotsia, and S. P. Zafeiriou, “ArcFace: Additive Angular Margin Loss for Deep Face Recognition,” IEEE Trans Pattern Anal Mach Intell, p. 1, 2021, doi: 10.1109/tpami.2021.3087709.

M. Sajjad et al., “CNN-based Anti-spoofing Two-tier Multi-factor Authentication System,” Pattern Recognit Lett, vol. 126, pp. 123–131, Sep. 2019, doi: 10.1016/j.patrec.2018.02.015.

Y. Xin et al., “A Survey of Liveness Detection Methods for Face Biometric Systems,” Sensor Review, vol. 37, no. 3, pp. 346–356, Jun. 2017, doi: 10.1108/SR-08-2015-0136.

K. Kavita, G. S. Walia, and R. Rohilla, “A Contemporary Survey of Unimodal Liveness Detection Techniques: Challenges & Opportunities,” in 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), 2020, pp. 848–855. doi: 10.1109/ICISS49785.2020.9316059.

A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, and K. Shaalan, “Speech Recognition Using Deep Neural Networks: A Systematic Review,” IEEE Access, vol. 7, pp. 19143–19165, 2019, doi: 10.1109/ACCESS.2019.2896880.

A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” Jun. 2020, [Online]. Available: http://arxiv.org/abs/2006.11477

G. B. Huang and E. G. Learned-Miller, “Labeled Faces in the Wild : Updates and New Reporting Procedures,” 2014.

G. B. Huang, V. Jain, and E. Learned-Miller, “Unsupervised Joint Alignment of Complex Images,” in 2007 IEEE 11th International Conference on Computer Vision, 2007, pp. 1–8. doi: 10.1109/ICCV.2007.4408858.

G. B. Huang, M. Mattar, H. Lee, and E. Learned-Miller, “Learning to Align from Scratch,” in NIPS, 2012.

S. I. Serengil and A. Ozpinar, “LightFace: A Hybrid Deep Face Recognition Framework,” in Proceedings - 2020 Innovations in Intelligent Systems and Applications Conference, ASYU 2020, Institute of Electrical and Electronics Engineers Inc., Oct. 2020. doi: 10.1109/ASYU50717.2020.9259802.