High-Resolution Downscaling with Interpretable Relevant Vector Machine: Rainfall Prediction for Case Study in Selangor

Raghdah Rasyidah Abdul Rashid - Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
Shazlyn Milleana Shaharudin - Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia and Department of Statistics, Columbia University, New York, United States
Nurul Ainina Filza Sulaiman - Kolej Vokasional Besut, Terengganu, Malaysia
Nurul Hila Zainuddin - Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
Hairulnizam Mahdin - Universiti Tun Hussein Onn Malaysia, Johor, Malaysia
Summayah Aimi Mohd Najib - Universiti Pendidikan Sultan Idris, Perak, Malaysia
Rahmat Hidayat - Department of Information Technology, Politeknik Negeri Padang, Padang, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.62527/joiv.8.2.2700


Due to the discrepancy in resolution between existing global climate model output and the resolution required by decision-makers, there is a persistent need for climate downscaling. We conducted a study to determine the effectiveness of Relevant Vector Machine (RVM), one of the machine learning approaches, in outperforming existing statistical methods in downscaling historical rainfall data in the complex terrain of Selangor, Malaysia. While machine learning eliminates the requirement for manual feature selection when extracting significant information from predictor fields, considering multiple pivotal factors is essential. These factors include identifying relevant atmospheric features contributing to rainfall, addressing missing data, and developing a significant model to predict daily rainfall intensity using appropriate machine-learning techniques. The Principal Component Analysis (PCA) technique was employed to choose relevant environmental variables as input for the machine learning model, and various imputation methods were utilized to manage missing data, such as mean imputation and the KNN algorithm. To assess the performance of the RVM-based rainfall model, we collected a dataset from the Department of Irrigation and Drainage Malaysia. We used Nash-Sutcliffe Efficiency (NSE) and Root Mean Square Error (RMSE) as evaluation metrics. This study concluded that Relevance Vector Machine (RVM) models are suitable for forecasting future rainfall since they can support large rainfall extremes and generate reliable daily rainfall estimates based on rainfall extremes. In this study, the RVM model was employed to determine a predictive association between predictand variables and predictors.


Statistical downscaling; missing value; PCA; RVM; forecasting; missing data.

Full Text:



Abbass, K., Qasim, M. Z., Song, H., Murshed, M., Mahmood, H., & Younis, I. (2022). A review of the global climate change impacts, adaptation, and sustainable mitigation measures. Environmental Science and Pollution Research, 29(28), 42539-42559

Bernama (2021, Dec 19). Once in 100 years: One month average rainfall poured down in one day. Th star. https://www.thestar.com.my/news/nation/2021/12/19/floods-heavy-rain-lasting-over-24-hours-equals-to-average-monthly-rainfall-occurring-once-in-100-years-says-environs-ministry

Deo, R.C., Samui, P. & Kim, D. Estimation of monthly evaporative loss using relevance vector machine, extreme learning machine and multivariate adaptive regression spline models. Stoch Environ Res Risk Assess 30, 1769–1784 (2016). https://doi-org.ezpustaka2.upsi.edu.my/10.1007/s00477-015-1153-y

Daniel, F. (2020). What is Machine Learning? Emerj The Al Research and Advisory Company. https://emerj.com/ai-glossary-terms/what-is-machine-learning/

Jollife, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065). https://doi.org/10.1098/rsta.2015.0202

Kang, H. (2013). The prevention and handling of the missing data. Korean journal of anesthesiology, 64(5), 402-406.

Lee, H., & Kang, K. (2015). Interpolation of missing precipitation data using kernel estimations for hydrologic modeling. Advances in Meteorology, 2015.

M. Bilal,Ari Y.B.A,Lukumon O. O., Taofeek D.A, Juan M.D.D &,Lukman A,A (2022) Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting, https://doi.org/10.1016/j.mlwa.2021.100204

Maraun, D., Wetterhall, F., Ireson, A. M., Chandler, R. E., Kendon, E. J., Widmann, M., ... & Thiele‐Eich, I. (2010). Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user. Reviews of geophysics, 48(3).

McCuen, R. H., Knight, Z., & Cutter, A. G. (2006). Evaluation of the Nash–Sutcliffe efficiency index. Journal of hydrologic engineering, 11(6), 597-602

Mehta, P., Bukov, M., Wang, C. H., Day, A. G., Richardson, C., Fisher, C. K., & Schwab, D. J. (2019). A high-bias, low-variance introduction to machine learning for physicists. Physics reports, 810, 1-124.

Pepinsky, T. B. (2018). A note on listwise deletion versus multiple imputation. Political Analysis, 26(4), 480-488.

Qiao, W., Huang, K., Azimi, M., & Han, S. (2019). A novel hybrid prediction model for hourly gas consumption in supply side based on improved whale optimization algorithm and relevance vector machine.

Quinonero-Candela, J., & Hansen, L. K. (2002, May). Time series prediction based on the relevance vector machine with adaptive kernels. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. I-985). IEEE.

Reddy, G. T., Reddy, M. P. K., Lakshmanna, K., Kaluri, R., Rajput, D. S., Srivastava, G., & Baker, T. (2020). Analysis of dimensionality reduction techniques on big data. IEEE Access, 8, 54776-54788.

Rogers, I., & Kirkham, C. (2005, July). JikesNODE and PearColator: A Jikes RVM operating system and legacy code execution environment. In 2nd ECOOP Workshop on Programm Languages and Operating Systems (ECOOP-PLOS’05).

Sachindra, D. A., Ahmed, K., Rashid, M. M., Shahid, S., & Perera, B. J. C. (2018). Statistical downscaling of precipitation using machine learning techniques. Atmospheric Research, 212, 240–258. https://doi.org/10.1016/j.atmosres.2018.05.022

Shaharudin, S. M., Andayani, S., Kismiantini, N. B., Kurniawan, A., Basri, M. A. A., & Zainuddin, N. H. (2020). Imputation methods for addressing missing data of monthly rainfall in Yogyakarta, Indonesia. International Journal, 9(1.4).

Silva, C., & Ribeiro, B. (2007). RVM ensemble for text classification. International Journal of Computational Intelligence Research, 3(1), 31-35.

Song, L., Duan, W., Li, Y. et al. A timescale decomposed threshold regression downscaling approach to forecasting South China early summer rainfall. Adv. Atmos. Sci. 33, 1071–1084 (2016). https://doi-org.ezpustaka2.upsi.edu.my/10.1007/s00376-016-5251-7

Stavseth, M. R., Clausen, T., & Røislien, J. (2019). How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data. SAGE open medicine, 7, 2050312118822912.

Suhaimi, N., Ghazali, N. A., Nasir, M. Y., Mokhtar, M. I. Z., & Ramli, N. A. (2017). Markov Chain Monte Carlo method for handling missing data in air quality datasets. Malaysian journal of analytical sciences, 21(3), 552-559

Sulaiman.A. (2022). statistical downscaling of projecting rainfall amount based on svc-rvm mode. Faculty of science and mathematics sultan idris education university. Dissertation presented to qualify for a master’s degree in science (applied statistics) (research mode).

ur Rehman, N., & Aftab, H. (2019). Multivariate variational mode decomposition. IEEE Transactions on signal processing, 67(23), 6039-6052.

Tien Bui, D., Shahabi, H., Shirzadi, A., Chapi, K., Hoang, N. D., Pham, B. T., ... & Saro, L. (2018). A novel integrated approach of relevance vector machine optimized by imperialist competitive algorithm for spatial modeling of shallow landslides. Remote Sensing, 10(10), 1538.

Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of machine learning research, 1(Jun), 211-244.

Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research, 30(1), 79-82.

Wu, S. F., Chang, C. Y., & Lee, S. J. (2015, March). Time series forecasting with missing values. In 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom) (pp. 151-156). IEEE.

Xingmeng, J., Li, W., Liwu, P., Mingtao, G., & Daidi, H. (2016). Rolling Bearing Fault diagnosis based on ELCD permutation entropy and RVM. Journal of Engineering, 2016.

Zhang, Y., Zhou, B., Cai, X., Guo, W., Ding, X., & Yuan, X. (2021). Missing value imputation in multivariate time series with end-to-end generative adversarial networks. Information Sciences, 551, 67-82.

Zhao, H., Zheng, J., Xu, J., & Deng, W. (2019). Fault diagnosis method based on principal component analysis and broad learning system. IEEE Access, 7, 99263-99272.

Zhu, X., Zhang, S., Jin, Z., Zhang, Z., & Xu, Z. (2010). Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering, 23(1), 110-121.