A Multi-Feature Fusion Approach for Dialect Identification using 1D CNN

Sarkhel H.Taher Karim; Karzan J. Ghafoor; Ayub O. Abdulrahman; Karwan M. Hama Rawf

doi:10.62527/joiv.8.3.2146

A Multi-Feature Fusion Approach for Dialect Identification using 1D CNN

Sarkhel Karim - University of Halabja, Halabja 46018, Kurdistan Region, Iraq
Karzan J. Ghafoor - University of Halabja, Halabja 46018, Kurdistan Region, Iraq
Ayub O. Abdulrahman - University of Halabja, Halabja 46018, Kurdistan Region, Iraq
Karwan M. Hama Rawf - University of Halabja, Halabja 46018, Kurdistan Region, Iraq

Citation Format:

DOI: http://dx.doi.org/10.62527/joiv.8.3.2146

Abstract

The phonological variety of Kurdish, a language with several dialects, poses a distinct problem in automatically identifying dialects. This study examines and evaluates several sound criteria for identifying Kurdish dialects: Badini, Hawrami, and Sorani. We deployed a dataset including 6,000 samples and utilized a mix of 1D convolutional neural networks (CNN) and fully connected layers to conduct the identification job. Our study aimed to assess the efficacy of different sound characteristics in accurately identifying dialects. We employed the Mel-frequency Cepstral Coefficients (MFCC) and other features such as the Mel spectrogram, spectral contrast, and polynomial features to extract the sound characteristics. We conducted training and testing of our models utilizing both individual characteristics and a composite of all features. Our analysis revealed that the identification task achieved excellent accuracy rates, suggesting a promising potential for success. We achieved 95.75% accuracy using MFCC combined with a Mel spectrogram. The accuracy improved by including contrast in the MFCC feature extraction process to 91.42%. Similarly, using poly_features resulted in an accuracy of 90.83%. Remarkably, accuracy reached a maximum of 96.5% when all the attributes were combined.

Keywords

Kurdish dialect identification; sound features; Mel-frequency Cepstral Coefficients (MFCC); 1D Convolutional Neural Networks (CNN); Mel spectrogram; spectral contrast; polynomial features.

Full Text:

PDF

References

N. J. Ibrahim, M. Y. Idna Idris, M. Y. @ Z. Mohd Yusoff, N. N. Abdul Rahman, and M. Izzi Dien, “ROBUST FEATURE EXTRACTION BASED ON SPECTRAL AND PROSODIC FEATURES FOR CLASSICAL ARABIC ACCENTS RECOGNITION,” Malaysian Journal of Computer Science, pp. 46–72, Dec. 2019, doi: 10.22452/mjcs.sp2019no3.4.

R. H. Aljuhani, A. Alshutayri, and S. Alahdal, “Arabic Speech Emotion Recognition From Saudi Dialect Corpus,” IEEE Access, vol. 9, pp. 127081–127085, 2021, doi: 10.1109/access.2021.3110992.

A. Al-Talabani, Z. Abdul, and A. Ameen, “Kurdish Dialects and Neighbor Languages Automatic Recognition,” ARO-The Scientific Journal of Koya University, vol. 5, no. 1, pp. 20–23, Apr. 2017, doi: 10.14500/aro.10167.

K. J. Ghafoor, K. M. Hama Rawf, A. O. Abdulrahman, and S. H. Taher, “Kurdish Dialect Recognition using 1D CNN,” ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, vol. 9, no. 2, pp. 10–14, Oct. 2021, doi: 10.14500/aro.10837.

P. A. Abdalla et al., “A vast dataset for Kurdish handwritten digits and isolated characters recognition,” Data in Brief, vol. 47, p. 109014, Apr. 2023, doi: 10.1016/j.dib.2023.109014.

S. Badawi, A. M. Saeed, S. A. Ahmed, P. A. Abdalla, and D. A. Hassan, “Kurdish News Dataset Headlines (KNDH) through multiclass classification,” Data in Brief, vol. 48, p. 109120, Jun. 2023, doi: 10.1016/j.dib.2023.109120.

G. IŞIK and H. ARTUNER, “Turkish dialect recognition using acoustic and phonotactic features in deep learning architectures,” Bilişim Teknolojileri Dergisi, vol. 13, no. 3, pp. 207–216, Jul. 2020, doi: 10.17671/gazibtd.668023.

L. Deng, & D.Yu, “Deep learning: methods and applications,” Foundations and trends® in signal processing, vol. 7, no, (3–4), pp. 197-387, 2014.

O. Abdel-Hamid, A. -r. Mohamed, H. Jiang, L. Deng, G. Penn and D. Yu, "Convolutional Neural Networks for Speech Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1533-1545, Oct. 2014, doi: 10.1109/TASLP.2014.2339736.

S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” Computer Speech & Language, vol. 50, pp. 141–156, Jul. 2018, doi: 10.1016/j.csl.2018.01.001.

R. Rahmawati and D. P. Lestari, "Java and Sunda dialect recognition from Indonesian speech using GMM and I-Vector," 2017 11th International Conference on Telecommunication Systems Services and Applications (TSSA), Lombok, Indonesia, 2017, pp. 1-5, doi: 10.1109/TSSA.2017.8272892.

K. M. Hama Rawf, A. A. Mohammed, A. O. Abdulrahman, P. A. Abdalla, and K. J. Ghafor, “A COMPARATIVE STUDY USING 2D CNN AND TRANSFER LEARNING TO DETECT AND CLASSIFY ARABIC-SCRIPT-BASED SIGN LANGUAGE,” Acta Informatica Malaysia, vol. 7, no. 1, pp. 08–14, 2023, doi: 10.26480/aim.01.2023.08.14.

Hama Rawf, Karwan Mahdi, Ayub Othman Abdulrahman, and Aree Ali Mohammed. 2024. "Improved Recognition of Kurdish Sign Language Using Modified CNN" Computers 13, no. 2: 37. https://doi.org/10.3390/computers13020037

S. Shivaprasad and M. Sadanandam, “Dialect recognition from Telugu speech utterances using spectral and prosodic features,” International Journal of Speech Technology, Jun. 2021, Published, doi: 10.1007/s10772-021-09854-8.

B. Tawaqal and S. Suyanto, “Recognizing Five Major Dialects in Indonesia Based on MFCC and DRNN,” Journal of Physics: Conference Series, vol. 1844, no. 1, p. 012003, Mar. 2021, doi: 10.1088/1742-6596/1844/1/012003.

M. Moftah, M. W. Fakhr and S. El Ramly, "Arabic dialect identification based on motif discovery using GMM-UBM with different motif lengths," 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP), Algiers, Algeria, 2018, pp. 1-6, doi: 10.1109/ICNLSP.2018.8374397.

U. Garg, S. Agarwal, S. Gupta, R. Dutt and D. Singh, "Prediction of Emotions from the Audio Speech Signals using MFCC, MEL and Chroma," 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), Bhimtal, India, 2020, pp. 87-91, doi: 10.1109/CICN49253.2020.9242635.

Y. Su, K. Zhang, J. Wang, D. Zhou, and K. Madani, “Performance analysis of multiple aggregated acoustic features for environment sound classification,” Applied Acoustics, vol. 158, p. 107050, Jan. 2020, doi: 10.1016/j.apacoust.2019.107050.

Rawf, Karwan M. Hama, Sarkhel H. Taher Karim, Ayub O. Abdulrahman, and Karzan J. Ghafoor. "Dataset for the recognition of Kurdish sound dialects." Data in Brief 53 (2024): 110231.

A. Khamparia, D. Gupta, N. G. Nguyen, A. Khanna, B. Pandey, and P. Tiwari, “Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network,” IEEE Access, vol. 7, pp. 7717–7727, 2019, doi: 10.1109/access.2018.2888882.

S. U. Rehman, S. Tu, Y. Huang and Z. Yang, "Face recognition: A novel un-supervised convolutional neural network method," 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China, 2016, pp. 139-144, doi: 10.1109/ICOACS.2016.7563066.

G. Sharma, K. Umapathy, and S. Krishnan, “Trends in audio signal feature extraction methods,” Applied Acoustics, vol. 158, p. 107020, Jan. 2020, doi: 10.1016/j.apacoust.2019.107020.

N. Melek Manshouri, “Identifying COVID-19 by using spectral analysis of cough recordings: a distinctive classification study,” Cognitive Neurodynamics, vol. 16, no. 1, pp. 239–253, Jul. 2021, doi: 10.1007/s11571-021-09695-w.

G. Tzanetakis and P. Cook, “Musimcal genre classification of audio signals,” IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 293–302, Jul. 2002, doi: 10.1109/tsa.2002.800560.

S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, Aug. 1980, doi: 10.1109/tassp.1980.1163420.

B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music signal analysis in python,” In Proceedings of the 14th python in science conference Vol. 8, pp. 18-25, July 2015.

K. West, and S. Cox, “Finding An Optimal Segmentation for Audio Genre Classification,” In ISMIR (pp. 680-685, September 2005.

Dan-Ning Jiang, Lie Lu, Hong-Jiang Zhang, Jian-Hua Tao and Lian-Hong Cai, "Music type classification by spectral contrast feature," Proceedings. IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, 2002, pp. 113-116 vol.1, doi: 10.1109/ICME.2002.1035731.

A. Shati, G. M. Hassan, and A. Datta, “COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals,” 2023, arXiv preprint arXiv:2309.04505.

F. Alías, J. Socoró, and X. Sevillano, “A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds,” Applied Sciences, vol. 6, no. 5, p. 143, May 2016, doi: 10.3390/app6050143.

M. B. Er, “A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features,” IEEE Access, vol. 8, pp. 221640–221653, 2020, doi: 10.1109/access.2020.3043201.

D. T. Pizzo, and S. Esteban, “IATos: AI-powered pre-screening tool for COVID-19 from cough audio samples,” 2021, arXiv preprint arXiv:2104.132

Username
Password
Remember me