Comparative Analysis of Machine Learning Algorithms for Health Insurance Pricing

Yoon-Teck Bau - Multimedia University, Persiaran Multimedia, Cyberjaya, 63100, Malaysia
Shuhail Azri Md Hanif - Multimedia University, Persiaran Multimedia, Cyberjaya, 63100, Malaysia

Citation Format:



Insurance is an effective way to guard against potential loss. Risk management is primarily employed to protect against the risk of a financial loss. Risk and uncertainty are inevitable parts of life, and the pace of life has led to a rise in these risks and uncertainties. Health insurance pricing has emerged as one of the essential fields of this study following the coronavirus pandemic. The anticipated outcomes from this study will be applied to guarantee that an insurance company's goal for its health insurance packages is within the range of profitability so that the insurance company will also choose the most price-effective course of action. The US Health Insurance dataset was utilized for this study. This health insurance pricing prediction aims to examine four different types of regression-based machine learning algorithms: multiple linear regression, ridge regression, XGBoost regression, and random forest regression. The implemented model's performance is assessed using four evaluation metrics: MAE, MSE, RMSE, and R2 score. Random forest regression outperforms all other algorithms in terms of all four evaluation metrics. The best machine learning algorithm, random forest, is further enhanced with hyperparameter tuning. Random forest with hyperparameter tuning performs better for three evaluation metrics except for MAE. To gain further insights, data visualizations are also implemented to showcase the importance of features and the differences between actual and predicted prices for all the data points.


Health Insurance Pricing; Machine Learning Algorithms; Regression; Multiple Linear Regression; Ridge Regression; XGBoost Regression; Random Forest Regression; MAE; MSE; RMSE; R2 Score; Hyperparameter Tuning

Full Text:



L. Zhou, Q. Chen, Z. Luo, H. Zhu, and C. Chen, “Speed-Based Location Tracking in Usage-Based Automotive Insurance,” in Proceedings - International Conference on Distributed Computing Systems, Institute of Electrical and Electronics Engineers Inc., Jul. 2017, pp. 2252–2257. doi: 10.1109/ICDCS.2017.278.

K. Kaushik, A. Bhardwaj, A. D. Dwivedi, and R. Singh, “Article Machine Learning-Based Regression Framework to Predict Health Insurance Premiums,” Int J Environ Res Public Health, vol. 19, no. 13, Jul. 2022, doi: 10.3390/ijerph19137898.

V. Kuleto et al., “Exploring opportunities and challenges of artificial intelligence and machine learning in higher education institutions,” Sustainability (Switzerland), vol. 13, no. 18, Sep. 2021, doi: 10.3390/su131810424.

S. A. Kalogirou, “Artificial intelligence for the modeling and control of combustion processes: A review,” Progress in Energy and Combustion Science, vol. 29, no. 6. pp. 515–566, 2003. doi: 10.1016/S0360-1285(03)00058-3.

R. H. Hariri, E. M. Fredericks, and K. M. Bowers, “Uncertainty in big data analytics: survey, opportunities, and challenges,” J Big Data, vol. 6, no. 1, Dec. 2019, doi: 10.1186/s40537-019-0206-3.

I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Computer Science, vol. 2, no. 3. Springer, May 01, 2021. doi: 10.1007/s42979-021-00592-x.

J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, “A survey of machine learning for big data processing,” Eurasip Journal on Advances in Signal Processing, vol. 2016, no. 1. Springer International Publishing, Dec. 01, 2016. doi: 10.1186/s13634-016-0355-x.

J. H. Thrall et al., “Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success,” Journal of the American College of Radiology, vol. 15, no. 3, pp. 504–508, Mar. 2018, doi: 10.1016/j.jacr.2017.12.026.

Erik Brynjolfsson and Tom Mitchell, “What can machine learning do? Workforce implications”.

P. Embrechts, “Actuarial versus financial pricing of insurance.”

B. Panay, N. Baloian, J. Pino, S. Peñafiel, H. Sanson, and N. Bersano, “Predicting Health Care Costs Using Evidence Regression,” MDPI AG, Nov. 2019, p. 74. doi: 10.3390/proceedings2019031074.

M. Amin Morid, K. Kawamoto, T. Ault, J. Dorius, and S. Abdelrahman, “Utah Health Plans for,” 2013.

P. Yang, H. Qiu, L. Wang, and L. Zhou, “Early prediction of high-cost inpatients with ischemic heart disease using network analytics and machine learning,” Expert Syst Appl, vol. 210, Dec. 2022, doi: 10.1016/j.eswa.2022.118541.

M. Eling, D. Nuessle, and J. Staubli, “The impact of artificial intelligence along the insurance value chain and on the insurability of risks,” Geneva Papers on Risk and Insurance: Issues and Practice, vol. 47, no. 2, pp. 205–241, Apr. 2022, doi: 10.1057/s41288-020-00201-7.

T. Pfutzenreuter and E. de Lima, “MACHINE LEARNING IN HEALTHCARE MANAGEMENT FOR MEDICAL INSURANCE COST PREDICTION,” 2022, pp. 1323–1334. doi: 10.37885/220207863.

H. D. Wang, “Research on the features of car insurance data based on machine learning,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 582–587. doi: 10.1016/j.procs.2020.02.016.

M. hanafy and O. M. A. Mahmoud, “Predict Health Insurance Cost by using Machine Learning and DNN Regression Models,” International Journal of Innovative Technology and Exploring Engineering, vol. 10, no. 3, pp. 137–143, Jan. 2021, doi: 10.35940/ijitee.C8364.0110321.

A. D. Kafuria, “Predictive Model for Computing Health Insurance Premium Rates Using Machine Learning Algorithms,” International Journal of Computer, [Online]. Available:

S. Panda, B. Purkayastha, D. Das, M. Chakraborty, and S. K. Biswas, “Health Insurance Cost Prediction Using Regression Models,” in 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, COM-IT-CON 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 168–173. doi: 10.1109/COM-IT-CON54601.2022.9850653.

R. Kshirsagar et al., “Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance Plans,” 2021. [Online]. Available:

S. Badillo et al., “An Introduction to Machine Learning,” Clin Pharmacol Ther, vol. 107, no. 4, pp. 871–885, Apr. 2020, doi: 10.1002/cpt.1796.

N. H. Zulkifley, S. A. Rahman, N. H. Ubaidullah, and I. Ibrahim, “House price prediction using a machine learning model: A survey of literature,” International Journal of Modern Education and Computer Science, vol. 12, no. 6, pp. 46–54, 2020, doi: 10.5815/ijmecs.2020.06.04.

M. Azzone, E. Barucci, G. Giuffra Moncayo, and D. Marazzina, “A machine learning model for lapse prediction in life insurance contracts,” Expert Syst Appl, vol. 191, Apr. 2022, doi: 10.1016/j.eswa.2021.116261.

A. Lakshmanarao, C. S. Koppireddy, and G. V. Kumar, “Prediction of medical costs using regression algorithms.” [Online]. Available:

N. K. Yego, J. Kasozi, and J. Nkurunziza, “A comparative analysis of machine learning models for the prediction of insurance uptake in kenya,” Data (Basel), vol. 6, no. 11, Nov. 2021, doi: 10.3390/data6110116.

C. A. ul Hassan, J. Iqbal, S. Hussain, H. AlSalman, M. A. A. Mosleh, and S. Sajid Ullah, “A Computational Intelligence Approach for Predicting Medical Insurance Cost,” Math Probl Eng, vol. 2021, 2021, doi: 10.1155/2021/1162553.

N. Shakhovska, N. Melnykova, V. Chopiyak, and M. Gregus Ml, “An ensemble methods for medical insurance costs prediction task,” Computers, Materials and Continua, vol. 70, no. 2, pp. 3969–3984, 2022, doi: 10.32604/cmc.2022.019882.

A. Kumar Sahu, G. Sharma, J. Kaushik, K. Agrawal, and D. Singh, “Health Insurance Cost Prediction by Using Machine Learning.” [Online]. Available:

R. Samala, H.-P. Chan, L. Hadjiiski, and S. Koneru, “Hazards of data leakage in machine learning: a study on classification of breast cancer using deep neural networks,” Feb. 2020, p. 39. doi: 10.1117/12.2549313.

B. Janet, A. Ghosh, and J. A. Kumar R, “End-to-End Encryption and Prediction of Medical Insurance Cost,” in 2022 6th International Conference on Trends in Electronics and Informatics, ICOEI 2022 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 846–850. doi: 10.1109/ICOEI53556.2022.9777238.