Comparative Analysis of Machine Learning Algorithms for Cross-Site Scripting (XSS) Attack Detection

Khairatun Hisan Hamzah - Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor Bahru, Malaysia
Mohd Zamri Osman - Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor Bahru, Malaysia
Tumusiime Anthony - Faculty of Computing, Universiti Teknologi Malaysia, Skudai, Johor Bahru, Malaysia
Mohd Arfian Ismail - Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
Zubaile Abdullah - Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
Alde Alanda - Department of Information Technology, Politeknik Negeri Padang, Padang, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.8.3-2.3451

Abstract


Cross-Site Scripting (XSS) attacks pose a significant cybersecurity threat by exploiting vulnerabilities in web applications to inject malicious scripts, enabling unauthorized access and execution of malicious code. Traditional XSS detection systems often struggle to identify increasingly complex XSS payloads. To address this issue, this research evaluated the efficacy of Machine Learning algorithms in detecting XSS threats within online web applications. The study conducts a comprehensive comparative analysis of XSS attack detection using four prominent Machine Learning algorithms, which consist of Extreme Gradient Boosting (XGBoost), Random Forest (RF), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). This research utilizes a comparative methodology to assess the selected Machine Learning algorithms by analyzing their performance metrics, including confusion matrix, 10-fold cross-validation, and assessment of training time to thoroughly evaluate the models. By exploring dataset characteristics and evaluating the performance metrics of each selected algorithm, the study determined the most robust Machine Learning solution for XSS detection. Results indicate that Random Forest is the top performer, achieving 99.93% accuracy and balanced metrics across all criteria evaluated. These findings will significantly enhance web application security by providing reliable defenses against evolving XSS threats.

Full Text:

PDF

References


OWASP. OWASP Top Ten. Retrieved from Owasp.org website: https://owasp.org/www-project-top-ten/. 2021.

F. M. M. Mokbal, W. Dan, W. Xiaoxi, Z. Wenbin, and F. Lihua, “XGBXSS: An Extreme Gradient Boosting Detection Framework for Cross-Site Scripting Attacks Based on Hybrid Feature Selection Approach and Parameters Optimization,” Journal of Information Security and Applications, vol. 58, p. 102813, May 2021, doi:10.1016/j.jisa.2021.102813.

P. Roy, R. Kumar, P. Rani, and T. S. Joy, “XSS: Cross-site Scripting Attack Detection by Machine Learning Classifiers,” 2022 11th International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 1535–1539, Dec. 2022, doi:10.1109/smart55829.2022.10046960.

I. K. Thajeel, K. Samsudin, S. J. Hashim, and F. Hashim, “Machine and Deep Learning-based XSS Detection Approaches: A Systematic Literature Review,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 7, p. 101628, Jul. 2023, doi:10.1016/j.jksuci.2023.101628.

R. Banerjee, A. Baksi, N. Singh, and S. K. Bishnu, “Detection of XSS in web applications using Machine Learning Classifiers,” 2020 4th International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech), pp. 1–5, Oct. 2020, doi:10.1109/iementech51367.2020.9270052.

S. H. Shah and S. S. Hussain, "Cross site scripting (XSS) dataset for deep learning," Kaggle, Jan. 11, 2024. [Online]. Available: https://www.kaggle.com/datasets/syedsaqlainhussain/cross-site-scripting-xss-dataset-for-deep-learning/data.

B. Gogoi, T. Ahmed, and H. K. Saikia, “Detection of XSS Attacks in Web Applications: A Machine Learning Approach,” International Journal of Innovative Research in Computer Science & Technology, vol. 9, no. 1, pp. 1–10, Jan. 2021, doi:10.21276/ijircst.2021.9.1.1.

J. Kaur, U. Garg, and G. Bathla, “Detection of cross-site scripting (XSS) attacks using machine learning techniques: a review,” Artificial Intelligence Review, vol. 56, no. 11, pp. 12725–12769, Mar. 2023, doi:10.1007/s10462-023-10433-3.

Q. Abu Al-Haija, “Cost-effective detection system of cross-site scripting attacks using hybrid learning approach,” Results in Engineering, vol. 19, p. 101266, Sep. 2023, doi:10.1016/j.rineng.2023.101266.

A. E. Mohamed, "Comparative study of four supervised machine learning techniques for classification", Int. J. Appl. Sci. Technol., vol. 7, no. 2, pp. 5-18, 2017.

A. Hannousse, S. Yahiouche, and M. C. Nait-Hamoud, “Twenty-two years since revealing cross-site scripting attacks: A systematic mapping and a comprehensive survey,” Computer Science Review, vol. 52, p. 100634, May 2024, doi: 10.1016/j.cosrev.2024.100634.

B. Panda, D. Chaturya, I. Sahil, C. Dinesh, and A. Prakash, "Hazard Identification and Detection using Machine Learning," Shu Ju Cai Ji Yu Chu Li/Journal of Data Acquisition and Processing, vol. 38, pp. 4418–4427, 2023. doi:10.5281/zenodo.7766139.

A. W. Marashdih, Z. F. Zaaba, K. Suwais, and N. A. Mohd, “Web Application Security: An Investigation on Static Analysis with other Algorithms to Detect Cross Site Scripting,” Procedia Computer Science, vol. 161, pp. 1173–1181, 2019, doi:10.1016/j.procs.2019.11.230.

T. Chen and C. Guestrin, “XGBoost,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, Aug. 2016, doi: 10.1145/2939672.2939785.

J. Lu, Z. Wei, Z. Qin, Y. Chang, and S. Zhang, “Resolving Cross-Site Scripting Attacks through Fusion Verification and Machine Learning,” Mathematics, vol. 10, no. 20, p. 3787, Oct. 2022, doi:10.3390/math10203787.

J. Harish Kumar and J. J Godwin Ponsam, “Cross Site Scripting (XSS) vulnerability detection using Machine Learning and Statistical Analysis,” 2023 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–9, Jan. 2023, doi:10.1109/iccci56745.2023.10128470.

D. A. Prasetio, K. Kusrini, and M. R. Arief, “Cross-site Scripting Attack Detection Using Machine Learning with Hybrid Features,” Jurnal Infotel, vol. 13, no. 1, pp. 1–6, Feb. 2021, doi:10.20895/infotel.v13i1.606.

R. Alhamyani and M. Alshammari, “Machine Learning-Driven Detection of Cross-Site Scripting Attacks,” Information, vol. 15, no. 7, p. 420, Jul. 2024, doi: 10.3390/info15070420.

A. Kumar and I. Sharma, “Performance Evaluation of Machine Learning Techniques for Detecting Cross-Site Scripting Attacks,” 2023 11th International Conference on Emerging Trends in Engineering & Technology - Signal and Information Processing (ICETET - SIP), pp. 1–5, Apr. 2023, doi: 10.1109/icetet-sip58143.2023.10151468.

E. H. Tusher, M. A. Ismail, M. A. Rahman, A. H. Alenezi, and M. Uddin, “Email Spam: A Comprehensive Review of Optimize Detection Methods, Challenges, and Open Research Problems,” IEEE Access, vol. 12, pp. 143627–143657, 2024, doi:10.1109/access.2024.3467996.

N. F. Idris, M. A. Ismail, M. I. M. Jaya, A. O. Ibrahim, A. W. Abulfaraj, and F. Binzagr, “Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus,” PLOS ONE, vol. 19, no. 5, p. e0302595, May 2024, doi:10.1371/journal.pone.0302595.

N. S. Nordin and M. A. Ismail, “A hybridization of butterfly optimization algorithm and harmony search for fuzzy modelling in phishing attack detection,” Neural Computing and Applications, vol. 35, no. 7, pp. 5501–5512, Nov. 2022, doi: 10.1007/s00521-022-07957-0.