Machine Learning-Driven Stroke Prediction Using Independent Dataset

Fatin Natasha Binti Zahari - Multimedia University, 63100 Cyberjaya, Malaysia
Kannan Ramakrishnan - Multimedia University, 63100 Cyberjaya, Malaysia

Citation Format:



The incidence of stroke cases has witnessed a rapid global rise, affecting not only the elderly but also individuals across all age groups. Accurate prediction of stroke occurrence demands the utilization of extensive data pre-processing techniques. Moreover, the automation of early stroke forecasting is crucial to prevent its onset at the initial stage. In this study, stroke prediction models are evaluated to estimate the likelihood of stroke based on various symptoms such as age, gender, pre-existing medical conditions, and social variables. The machine learning techniques employed include Linear Support Vector Classifier, Extreme Gradient Boosting Classifier, Multilayer Perceptron, Adaptive Boosting Classifier, Bootstrap Aggregating Classifier, and Light Gradient-Boosting Machine. The purpose of this paper is to optimize the hyperparameters of machine learning approaches in developing stroke prediction models. The goal was achieved through a comprehensive comparison of three different sampling techniques for handling imbalanced datasets and evaluating their performance by using various metrics. The most effective model is identified, which is the Adaptive Boosting Classifier utilizing the Tomek Links, with a cross-dataset accuracy of 99% which demonstrated a reliable performance and generalization as evidenced by high cross-validation scores and accuracy on an independent dataset. The next stage of this endeavor entails looking into multiple ways to forecast the development of new dangerous diseases such as breast cancer and skin disorders. In the long run, the aim of subsequent work is to build a powerful toolset that is obtainable to all medical practitioners, allowing for the pre-emptive diagnosis of all potentially hazardous illnesses.


Stroke; Machine Learning; Classification; Multilayer Perceptron

Full Text:



S. Wang et al., “A randomized controlled trial of brain and heart health manager-led mHealth secondary stroke prevention,” Cardiovascular Diagnosis and Therapy, vol. 10, no. 5, pp. 1192-1199, Oct. 2020, DOI:


D. B. Harrar et al., “A Stroke Alert Protocol Decreases the Time to Diagnosis of Brain Attack Symptoms in a Pediatric Emergency Department,” Journal of Pediatrics, vol. 216, pp. 136-141.e6, Jan 2020, DOI: 10.1016/j.jpeds.2019.09.027

M. Rasmussen, J. B. Valentin, and C. Z. Simonsen, “Blood Pressure Thresholds during Endovascular Therapy in Ischemic Stroke-Reply,” JAMA Neurology, vol. 77, no. 12, pp. 1579, Dec. 2020, DOI:


S. Lattanzi and M. Silvestrini, “Blood pressure in acute intra-cerebral hemorrhage,” Annals of Translational

Medicine, vol. 4, no. 16, pp. 320-320, Aug. 2016, DOI: 10.21037/atm.2016.08.04

A. Verma, S. Jaiswal, and W. R. Sheikh, “Acute thrombotic occlusion of subclavian artery presenting as a stroke mimic,” Journal of the American College of Emergency Physicians Open, vol. 1, no. 5, pp. 932-934,

May 2020, DOI: 10.1002/emp2.12085

M. Boukobza, S. Nahmani, L. Deschamps, and J. -P. Laissy, “Brain abscess complicating ischemic embolic stroke in a patient with cardiac papillary fibroelastoma - Case report and literature review,” Journal of Clinical Neuroscience, vol. 66, pp. 277-279, Aug. 2019, DOI: 10.1016/j.jocn.2019.03.041

S. Uppal, S. Goel, B. Randhawa, and A. Maheshwary, “Autoimmune-Associated Vasculitis Presenting as Ischemic Stroke with Hemorrhagic Transformation: A Case Report and Literature Review,” Cureus, Sep. 2020,

DOI: 10.7759/cureus.10403

M. Lee, J. Ryu, and D. Kim, “Automated epileptic seizure waveform detection method based on the feature of the mean slope of wavelet coefficient counts using a hidden Markov model and EEG signals,” ETRI Journal, vol. 42, no. 2, pp. 217-229, Apr. 2020, DOI: 10.4218/etrij.2018-0118

CDC, (2020), National Center for Chronic Disease Prevention and Health Promotion, Division for Heart Disease and Stroke Prevention. [Online]. Available:

E. Dritsas and M. Trigka, “Stroke Risk Prediction with Machine Learning techniques,” Sensors, vol. 22, no. 13,

p. 4670, Jun. 2022, DOI: 10.3390/s22134670

V. Abedi et al., “Prediction of long-term stroke recurrence using machine learning models,” Journal of Clinical Medicine, vol. 10, no. 6, p. 1286, Mar. 2021, DOI: 10.3390/jcm10061286

J. O. Victor, X. Chew, K. W. Khaw, and M. H. Lee, “A Cost-Based Dual ConvNet-Attention Transfer Learning Model for ECG heartbeat classification,” Journal of Informatics and Web Engineering, vol. 2, no. 2, pp. 90-110, Sep. 2023, DOI: 10.33093/jiwe.2023.2.2.7

S. Dev, H.Wang, C. S. Nwosu, N. Jain, B. Veeralli, and D. John, “A predictive analytics approach for stroke prediction using machine learning and neural networks,” Healthcare Analytics, vol. 2, p. 100032, Nov. 2022, DOI: 10.1016/

M. U. Emon, M. S. Keya, T. I. Meghla, Md. M. Rahman,

M. S. A. Mamun, and M. S. Kaiser, “Performance Analysis of Machine Learning Approaches in Stroke Prediction,” 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Nov. 2020, DOI: 10.1109/iceca49313.2020.9297525

V. Jalajajayalakshmi, V. Geetha, and M. M. Ijaz, “Analysis and Prediction of Stroke using Machine Learning Algorithms,” 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Oct. 2021, DOI: 10.1109/icaeca52838.2021.9675545

R. K. Kavitha, W. Jaisingh, and S. R. Sujithra, “Applying Machine Learning Techniques for Stroke Prediction in Patients,” 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Oct. 2021, DOI: 10.1109/ICAECA52838.2021.9675652

C. Rana, N. Chitre, B. Poyekar, and P. Bide, “Stroke Prediction Using Smote-Tomek and Neural Network,”

12th International Conference on Computing Communication and Networking Technologies (ICCCNT),

Jul. 2021, DOI: 10.1109/icccnt51525.2021.9579763

N. Biswas, K. M. M. Uddin, S. T. Rikta, and S. K. Dey, “A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach,” Healthcare Analytics, vol. 2, p. 100116, Nov. 2022, DOI: 10.1016/

Md. Shafiul Azam, Md. Habibullah, and H. Kabir Rana, “Performance Analysis of Various Machine Learning Approaches in Stroke Prediction,” International Journal of Computer Applications, vol. 175, no. 21, pp. 11-15,

Sep. 2020, DOI: 10.5120/ijca2020920740

Y. Wu and Y. Fang, “Stroke Prediction with Machine Learning Methods among Older Chinese,” International Journal of Environmental Research and Public Health, vol. 17, no. 6, p. 1828, Mar. 2020, DOI:


Ferdib-Al-Islam and M. Ghosh, “An Enhanced Stroke Prediction Scheme Using Smote and Machine Learning techniques,” 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Jul. 2021, DOI: 10.1109/ICCCNT51525.2021.9579648

M. Phankokkruad and S. Wacharawichanant, “Performance Analysis and Comparison of Cerebral Stroke Prediction Models on Imbalanced Datasets,” 2022 IEEE/ACIS 7th International Conference on Big Data, Cloud Computing, and Data Science (BCD), Aug. 2022,

DOI: 10.1109/bcd54882.2022.9900833

G. Fang, Z. Huang, and Z. Wang, “Predicting Ischemic Stroke Outcome Using Deep Learning Approaches,” Frontiers in Genetics, vol. 12, Jan. 2022, DOI: 10.3389/fgene.2021.827522

U. Fayyad, “Knowledge Discovery in Databases: An Overview,” Relational Data Mining, pp. 28-47, 2001,

DOI: 10.1007/978-3-662-04599-2_2

Ashish Bhardwaj. (2022). Framingham heart study dataset. Kaggle. [Online]. Available: m-heart-study-dataset

Alex Teoul. (2022). Heart Disease Health Indicators Dataset. Kaggle. [Online]. Available:


Brownlee, J. (2020). How to choose a feature selection method for machine learning. [Online]. Available: h-real-and-categorical-data/

I. H. Witten, E. Frank, and M. A. Hall, “Writing New Learning Schemes,” Data Mining: Practical Machine Learning Tools and Techniques, pp. 539-557, 2011, DOI: 10.1016/b978-0-12-374856-0.00016-x

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, Jun. 2002,

DOI: 10.1613/jair.953

L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, no. 2, pp. 123-140, Aug. 1996, DOI: 10.1007/bf00058655