Implementing K-Nearest Neighbors (k-NN) Algorithm and Backward Elimination on Cardiotocography Datasets
DOI: http://dx.doi.org/10.62527/joiv.8.3.1996
Abstract
Having a healthy baby is a dream for mothers. Unfortunately, high maternal and fetal mortality has become a vital problem that requires early risk detection for pregnant women. A cardiotocograph examination is necessary to maintain maternal and fetal health. One method that can solve this problem is classification. This research analyzes the optimal use of k values and distance measurements in the k-NN method. This research expects to become the primary reference for other studies examining the same dataset or developing k-NN. A selection feature is needed to optimize the classification method, particularly for improving accuracy results. This study used the cardiotocography dataset from cardiotocograph examinations related to fetal conditions. The cardiotocography dataset consisted of 2,126 records with 22 features and variables. It also had three classification classes, normal, suspect, and pathological, from the Universal Child Immunization Machine Learning Repository website. It employed the K-Nearest Neighbor (k-NN) method and the backward elimination feature with ordinary least squares regression. The test in this research applied the scenarios of three distance calculations, i.e., Euclidean distance, Manhattan distance, and Minkowski distance, as well as four variations of k values. Evaluation of each scenario indicated the accuracy of the confusion matrix and execution time. This study compared K-Nearest Neighbor (k-NN) and Backward Elimination methods with K-nearest neighbor (k-NN) without selection features. The best accuracy of the Backward Elimination and K-Nearest Neighbor (K-NN) methods was 91%, as was the K-Nearest Neighbor (k-NN) method without selection features. Both had similar k values (k = 3) and Manhattan distance. The backward elimination method reduced the number of features from 22 to 14. Meanwhile, the execution times of the Backward Elimination and K-Nearest Neighbor (k-NN) methods got better results as each distance averaged 26.54, 19.23, and 68.09 seconds. K-Nearest Neighbor (k-NN) execution times without selection features were 26.83, 19.39, and 68.84, respectively. In conclusion, backward elimination did not increase accuracy because it yielded the same accuracy. However, backward elimination and K-nearest Neighbor (k-NN) produced faster results, with differences of 29%, 16%, and 75%, respectively.
Keywords
Full Text:
PDFReferences
Z. Cömert and A. F. Kocamaz, “Journal of Science and Technology A study of artificial neural network training algorithms for classification of cardiotocography signals,” Journal of Science and Technology, vol. 7, no. 2, pp. 93–103, 2017, [Online]. Available: www.dergipark.ulakbim.gov.tr/beuscitech/
A. Pinas and E. Chandraharan, “Continuous cardiotocography during labour: Analysis, classification and management,” Best Pract Res Clin Obstet Gynaecol, vol. 30, pp. 33–47, Jan. 2016, doi: 10.1016/j.bpobgyn.2015.03.022.
A. Mehbodniya et al., “Fetal Health Classification from Cardiotocographic Data using Machine Learning,” Expert Syst, vol. 39, no. 6, Jul. 2022, doi: 10.1111/exsy.12899.
A. Subasi, B. Kadasa, and E. Kremic, “Classification of the Cardiotocogram Data for Anticipation of Fetal Risks using Bagging Ensemble Classifier,” Procedia Comput Sci, vol. 168, pp. 34–39, 2020, doi: 10.1016/j.procs.2020.02.248.
V. Khare and S. Kumari, “Performance Comparison of Three Classifiers for Fetal Health Classification Based on Cardiotocographic Data,” Acadlore Transactions on AI and Machine Learning, no. 1, pp. 52–60, 2022, doi: 10.56578/ataiml010107.
S. Al-yousif et al., “A systematic review of automated preprocessing, feature extraction and classification of cardiotocography,” PeerJ Comput Sci, vol. 7, pp. 1–37, Apr. 2021, doi: 10.7717/peerj-cs.452.
J. L. Aeberhard et al., “Artificial intelligence and machine learning in cardiotocography: A scoping review,” European Journal of Obstetrics & Gynecology and Reproductive Biology, vol. 281, pp. 54–62, 2023, doi: https://doi.org/10.1016/j.ejogrb.2022.12.008.
Jayashree Piri, Puspanjali Mohapatra, and Raghunath Dey, “Fetal Health Status Classification Using MOGA -CD Based Feature Selection Approach,” in Institute of Electrical and Electronics Engineers. Bangalore, 2020.
A. Maslan, K. Malik, B. Mohamad, A. Hamid, H. Pangaribuan, and S. Sitohang, “Feature Selection to Enhance DDoS Detection using Hybrid N-Gram Heuristic Techniques,” International Journal on Informatics Visualization, vol. 7, no. 3, pp. 815–822, 2023.
S. P. Potharaju, M. Sreedevi, V. K. Ande, and R. K. Tirandasu, “Data mining approach for accelerating the classification accuracy of cardiotocography,” Clin Epidemiol Glob Health, vol. 7, no. 2, pp. 160–164, Jun. 2019, doi: 10.1016/j.cegh.2018.03.004.
S. E. Prasetyo, P. H. Prastyo, and S. Arti, “A Cardiotocographic Classification using Feature Selection: A comparative Study,” JITCE (Journal of Information Technology and Computer Engineering), vol. 5, no. 01, pp. 25–32, Mar. 2021, doi: 10.25077/jitce.5.01.25-32.2021.
S. Chandra, R. Nandipati, and C. Xinying, “Classification and Feature Selection Approaches for Cardiotocography by Machine Learning Techniques,” Journal of Telecommunication, Electronic and Computer Engineering ENGINEERING (JTEC), vol. 12, no. 1, 2020, [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Cardiotocography.
M. Ramla, “Influence of Feature Selection Methods on Cardiotocography Data: A Quantitative Investigation,” Article in International Journal of Engineering and Advanced Technology, pp. 2249–8958, 2019, doi: 10.35940/ijeat.D1006.0484S219.
M. Sulistiyono, L. A. Wirasakti, and Y. Pristyanto, “The Effect of Adaptive Synthetic and Information Gain on C4.5 and Naive Bayes in Imbalance Class Dataset,” International Journal of Advanced Science Computing and Engineering, vol. 4, no. 1, pp. 1–11, 2022, doi: 10.30630/ijasce.4.1.70.
N. F. Kamarudin, Z. Ali Shah, M. F. M. Fudzee, and S. Kasim, “Feature Extraction And Classification On Single Nucleotide Polymorphism,” International Journal of Advanced Science Computing and Engineering, vol. 1, no. 2, pp. 85–90, 2019, doi: 10.30630/ijasce.1.2.6.
U. M. Khaire and R. Dhanalakshmi, “Stability of feature selection algorithm: A review,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 4. King Saud bin Abdulaziz University, pp. 1060–1073, Apr. 01, 2022. doi: 10.1016/j.jksuci.2019.06.012.
V. Bolón-Canedo and A. Alonso-Betanzos, “Ensembles for feature selection: A review and future trends,” Information Fusion, vol. 52, pp. 1–12, Dec. 2019, doi: 10.1016/j.inffus.2018.11.008.
B. Remeseiro and V. Bolon-Canedo, “A review of feature selection methods in medical applications,” Computers in Biology and Medicine, vol. 112. Elsevier Ltd, Sep. 01, 2019. doi: 10.1016/j.compbiomed.2019.103375.
H. Liu, M. Zhou, and Q. Liu, “An Embedded Feature Selection Method for Imbalanced Data Classification,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 3, pp. 703–715, May 2019, doi: 10.1109/JAS.2019.1911447.
R. Kaur, “Predicting Diabetes by Adopting Classification Approach in Data Mining,” International Journal on Informatics Visualization, vol. 3, no. 2–2, pp. 218–221, 2019, doi: 10.30630/joiv.3.2-2.229.
S. K. Mohamed, N. A. Sakr, and N. A. Hikal, “A Review of Breast Cancer Classification and Detection Techniques,” International Journal of Advanced Science Computing and Engineering, vol. 3, no. 3, pp. 128–139, 2021, doi: 10.30630/ijasce.3.3.55.
Md. A. H. Sujon and H. Mustafa, “Comparative Study of Machine Learning Models on Multiple Breast Cancer Datasets,” International Journal of Advanced Science Computing and Engineering, vol. 5, no. 1, pp. 15–24, 2023.
A. E. Minarno, M. Y. Hasanuddin, and Y. Azhar, “Batik Images Retrieval Using Pre-trained Model and K-Nearest Neighbor,” International Journal on Informatics Visualization, vol. 7, no. 1, pp. 115–121, 2023, doi: 10.30630/joiv.7.1.1299.
G. E. Yuliastuti, A. N. Alfiyatin, A. M. Rizki, A. Hamdianah, H. Taufiq, and W. F. Mahmudy, “Performance Analysis of Data Mining Methods for Sexually Transmitted Disease Classification,” International Journal of Electrical and Computer Engineering, vol. 8, no. 5, pp. 3933–3939, 2018, doi: 10.11591/ijece.v8i5.pp3933-3939.
M. K. A. Rahman et al., “Hand Gesture Recognition Based on Continuous Wave (CW) Radar Using Principal Component Analysis (PCA) and K-Nearest Neighbor (KNN) Methods,” International Journal on Informatics Visualization, vol. 6, no. 1–2, pp. 188–194, 2022, doi: 10.30630/joiv.6.1-2.926.
D. I. Ibnu, MACHINE LEARNING: Teori, Studi Kasus dan Implementasi Menggunakan Python. 2021.
I. J. Jebadurai, G. J. L. Paulraj, J. Jebadurai, and S. Silas, “Experimental Analysis of Filtering-Based Feature Selection Techniques for Fetal Health Classification,” Serbian Journal of Electrical Engineering, vol. 19, no. 2, pp. 207–224, 2022, doi: 10.2298/SJEE2202207J.
M. Faisal, E. M. Zamzami, and Sutarman, “Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean Distance, Canberra Distance and Manhattan Distance,” J Phys Conf Ser, vol. 1566, no. 1, 2020, doi: 10.1088/1742-6596/1566/1/012112.
C. Fu and J. Yang, “Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method,” Algorithms, vol. 14, no. 2, 2021, doi: 10.3390/a14020054.
S. Hawley, M. S. Ali, K. Berencsi, A. Judge, and D. Prieto-Alhambra, “Sample size and power considerations for ordinary least squares interrupted time series analysis: A simulation study,” Clin Epidemiol, vol. 11, pp. 197–205, 2019, doi: 10.2147/CLEP.S176723.
T. J. Machado, J. V. Filho, and M. A. de Oliveira, “Forensic speaker verification using ordinary least squares,” Sensors (Switzerland), vol. 19, no. 20, Oct. 2019, doi: 10.3390/s19204385.