Comparing Support Vector Regression and Random Forests Modelling for Predicting Malaria Incidencein Mozambique

Orlando Pedro Zacarias

Abstract— accurate prediction of malaria incidence is essentialfor the management of several activities in the ministry of health in Mozambique. This study investigates the comparison ofsupport vector machines (SVMs) and random forests (RFs) forthis purpose. A dataset with records of malaria cases covering theperiod 1999-2008 was used to evaluate predictive models on thelast year when developed from one up to nine years of historicaldata. Mean squared error (MSE) was used as performancemetric. The scheme for estimating variable importance commonlyemployed for RFs was also adopted for SVMs. SVMs developedfrom two year of historical data obtained the best predictionaccuracy. Hence, if we are interested in predicting the actualnumber of malaria cases the support vector machines modelshould be chosen. In the analysis of variable importance, IndoorResidual Spray (IRS), the districts of Manhiça and Matola andmonth of January turned out to be the most important predictorsin both the SVM and RF models.

Subscribe to ICTer News