Performance Comparison Between Support Vector Regression and Artificial Neural Network for Prediction of Oil Palm Production

The largest region that produces oil palm in Indonesia has an important role in improving the welfare of society and economy. Oil palm has increased significantly in Riau Province in every period, to determine the production development for the next few years with the functions and benefits of oil palm carried prediction production results that were seen from time series data last 8 years (2005-2013). In its prediction implementation, it was done by comparing the performance of Support Vector Regression (SVR) method and Artificial Neural Network (ANN). From the experiment, SVR produced the best model compared with ANN. It is indicated by the correlation coefficient of 95% and 6% for MSE in the kernel Radial Basis Function (RBF), whereas ANN produced only 74% for R2 and 9% for MSE on the 8th experiment with hiden neuron 20 and learning rate 0,1. SVR model generates predictions for next 3 years which increased between 3% - 6% from actual data and RBF model predictions.


Introduction
Riau is a province in the central of Sumatra, Indonesia that has 8.91 million hectares of area. Riau consists of 12 districts and 142 sub-districts. In 2013, Riau was recorded as a province that has the largest area of oil palm in Indonesia, with 2.26 million hectares. The average production of oil palm in Riau is 6.93 million tons per year spread in 10 sub-districts [1]. The production of oil palm in Riau is increasing every year for both its production and its plantation area. Information that was released by Riau Central Bureau of Statistics showed that there was a decreasing value of certain area. It was due to the change and replanting oil palm that has reached the limit of its age production.
The amount of oil palm production in Riau illustrates its benefits toward the prosperity level of a region [2]. In addition, oil palm also contributed to the sustainability of three main different industries. First of all, the production of Crude Oil palm (CPO) [3]. Secondly, it affects downstream industries derived from waste oil [3]. Lastly and the most important for Riau, it is used as a raw materials for February 2016 the development of renewable energy with the composition of waste that has been prescribed for each part such as shells, fibers, and oil palm's empty bunch, to over-come the electricity crisis [4][5][6][7].
A broad view and production of oil palm was also used as a decision making reference for Steam Power Plant development in Riau with simulation of extraction calculation 50% oil palm waste [8][9]. On the other hand, oil palm that has spread throughout Riau at this time have become a phenomenon among investors in terms of both production and waste. The local government is also seeking a way to develop energy using the raw material of oil palm as an alternative of fossil energy. This is in accordance with the mandate of the law No. 30/2007 concerning about chapter 20 verse 4. It is stated that is the provision and utilization of new and renewable energy should be enhanced by the central and local governments appropriate with their authority. One of the renew-able energy which is mentioned in the law is biomass which is made from oil palm [10]. The problem is the condition of oil palms in Riau in a long term, whether the result of production can always provide the raw material supply of alternative energy or vice versa. It must be dependent from local government policy.
Several studies had discussed topic related to forecasting of oil palm production both in term of production statistics or based on past data. In 2009, Hermantoro was predict oil palm [11]. By comparing determiner parameters, he concluded that oil palm production will increase. The study was conducted by using a machine learning technique called Artificial Neural Network (ANN). However, he did not mention the accuracy of the prediction result. Mustakim [12] also studied another predicttion of oil palm using a different method called Support Vector Regression (SVR). This is done by using time series data Riau from 2005 to 2013. The Research concluded the best model accuracy of SVR is 95% and 6% for error in the kernel of Radial Basis Function (RBF). Therefore, this study will discuss the performance comparison between the best model of SVR and best model of ANN to predict the oil palm production in Riau by utilizing last 8 years data (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013). SVR is used to overcome several data over-fitting from the data set. The expectation of this study is to provide conclusions related to the best model in predicting the production of oil palm for the coming years.

Methods
This research was conducted with multiple steps including data collection, data selection, SVR modelling, ANN modelling, and analysis of performance comparison between SVR and ANN. Sever-al literatures that compare SVR and ANN often conclude that SVR is better than ANN. This research will also prove some statement best algorithm SVR modelling than ANN modelling. For more details, methodology can be seen in Figure 1.

Data Collection
The data that were used in this research are the production and productivity of oil palm. The data ware originated from Central Bureau of Statistics and Department of Estate Crops in Riau 2013. The data consists of 32 data points, and were recorded from 2005 to 2013. The data was filtered into 74 subdistricts based on Production Minimum Standard (PMS). SVR was able to overcome some of the data over-fitting in a data set. According to Christodoulos's research the minimum data required for prediction is 16 up to 20 data points [13].

Data Selection
Data selection was done by performing pre-processing all of the data, several companies, and department determined that the PMS which was used as a target should be 1.000 ton/period or an average minimum production of 1.000 ton/ year. There are only 74 from 142 sub-districts that fulfil this Production Minimum Standard (PMS). After establishing and obtaining the data points that will be used to make a prediction the next step is to divide the data into two parts: training dataset and testing dataset. The division was based on k-fold cross validation by randomly dividing the data into k subsets and all the data were used for both testing data and training data [14]. All of the data will also be normalized. To obtain the same weight from all data attributes and to obtain less variation. In other words, there are no attributes which more dominant or considered as more important than others from the result of its weighting [15].

Support Vector Regression (SVR)
SVR is the application of Support Vector Machine (SVM) for the case of regression. In the case of regression, output is in real or continuous numbers. SVR is a method that can solve over-fitting. Therefore, it will produce a good performance [16] and provide conclusions about the superiority and accuracy results [17].
It could also be applied to various cases with continuous data [18]. In 2003, Smola and Scholkopf explained about SVR by giving example of a condition which there is training dataset ( , ) with = 1,2, … , with input. = { 1 , 2 , 3 } ⊆ and output concerned = { , … , } ⊆ . By using SVR, a function of ( ) will be found.

Mustakim, et al., Performance Comparison Between Support Vector Regression 3
The function has the biggest deviation from the actual target for all training data. Then by using SVR, when the value of is equal to 0, perfect regression will be obtained. Based on the data, the SVR wanted to find a regression function of ( ) that can approximate output to an actual target, with error tolerance of , and minimal complexity. Regression function of ( ) can be stated by the following formula [19]: Where ( ) indicates a point within a higher dimension feature space and the result of mapping the input of vector x in a lower dimension feature space. Coefficients w and b are estimated by minimizing the risk function that is defined in the equation (2) and (3): where, There are three kernel functions on SVR models. They are Linear, Polynomial and Radial Basis Function (RBF). These 3 kernel functions are in LIBSVM [20]:

Artificial Neural Network (ANN)
ANN is a network of small processing unit group that is modelled based on human neural tissue. The ANN has an adaptive system that can change its structure to solve problems based on external or internal information that flows through the network [21]. In its development, ANN architecture is divided into two parts; Single Layer Network and Multiple Layer Network [22]. Models of Multiple Layer Network's category such as backpropagation [23]. Backpropagation trains a network to get a balance between the network's ability to recognize patterns that are used during training as well as network's ability to give the correct response toward input pattern which are similar (but not equal) with  the pattern that are used during training [24]. Backpropagation network has 3 phases: advance phase, reverse phase, and weight modification phase to decrease error that might occur [25]. Backpropagation architecture consists of input neuron/layer, hidden neuron/layer and output neuron/layer. Each layer consists of one or more artificial neuron. The network architecture that is used in this research can be seen in Figure 2.

Comparative Analysis SVR and KNN
This analysis was done by comparing the best results between models of SVM and ANN that were calculated based on the error size and terminated coefficient. If y i is the claimed predicttion value for the i-data and � is the actual output value of the idata and m is amount of data, then the error size that is often used is Mean Squared Error (MSE).

SVR Experiment
SVR requires appropriate kernel parameters to conduct the training. To obtain the optimum kernel, optimization was done by using grid search while training. There are two parameters that are optimized using grid search. They are parameter C and parameter . Polynomial parameter is part of α.
Parameter is the penalty value toward error model of SVR, whereas parameters was used as an input to kernel functions that will be used. RBF kernel and polynomial require parameter and , whereas linear kernel only required parameter [26]. To search for the optimum value from parameter and , a combination of training and testing process experiment for RBF was conducted 220 times. 55 combination experiment were for the linear kernel and 220 experiments combine polynomial with various value of parameter and , so that an optimal model was produced. Other than parameter and , testing was done by applying parameter New-SVR with a value of 4. The performance kernel function model can be known through the correlation coefficient (R) value and the value of MSE. The best model is a model with the largest value of R (approaching 1) and the smallest value of MSE (close to 0). R and MSE is a simple method that is often used and have been verified in measuring errors. Simulations which had been performed to find the best accuracy on RBF kernel. The polynomial with a C combination is between 2 -6 and 2 5 and a combination is between 2 -1 up to 2 4 . Likewise, for the linear kernel the C combination is between 2 -6 and 2 5 . This kind of combination was also conducted by Hendra Gunawan [27] to find the best accuracy in the case of rice production predicttion in 2012 that resulted the accuracy above 95%.
Some phases and steps that were done at linear kernel were optimized in parameter C. In accordance with previous studies. Linear kernel is the simplest one compared to other kernels. Experiment combination that ranges from 2 -6 up to 2 5 produced minimum MSE of 0,053308 or 5% with a maximum 2 value of 0,921253 or 92%. RBF kernel will optimize the value of that ranges from 2 -1 up to 2 4 . Parameter C at the same range to linear kernel can obtain the smallest error value of 1.4%, on fold 2. The largest determination coefficient is obtained on fold 1 with 95%. Similar to RBF, polynomial kernel optimize value of and at the same range on RBF.
The best experiment in polynomial with error value of 18% and determination coefficient of 62% is in fold 1. The parameter and range between 2 -1 and 2 0 . The value of error and the determination coefficient from those three kernels can be seen on Table 1. Based on experiment from the three kernels, the relationship between observation and prediction can be seen and are shown in Figure 3, 4 and 5.
On polynomials, experiments that were conducted illustrate the inverse curve between actual and predicted it taken based on experiments with the smallest error without considering other aspects. In the linear kernel, making the conclusion of an  experiment series was also based on the value of the smallest error. From the comparison of linear and polynomial kernels, the linear is more optimum. The results caused some experienced overfitting data. From the experiment, prediction models that showed the highest correlation level and the lowest value of error was the one that was done by using RBF kernel. This is appropriate with SVM guide that states RBF kernel is more superior in many cases of machine learning [28].

Experiment of ANN
ANN experiment was done by using the same data based on 32 data points and Hidden Neuron comparison with a learning rate of 4 cross validation. Moreover, the experiments were comprised of 12 ANN models. Table II and III showed the characteristics and specifications that were used for ANN architecture and the best experiments result, respectively.
From Table 3, it can be seen that the experiment that used ANN had the best model. It was known from the 8 th experiment that the ANN has a determination coefficient value of 74% and error value of 9%. Likewise, for the second experiment, it had the lowest error value between among other experiments with 8%. However the second experiment only has a determination coefficient of 43%. Therefore, from the result of best 2 and best MSE, it can be concluded that the 8 th experiment with hidden neuron 20 and learning rate 0.1 was the best model of a series model which was produced to predict the relationship between observed data and

Performance Comparison between SVR and ANN
From the experiment, the method that produced the best model for oil palm production is the SVR model. The model has a determination coefficient of 95% and error value of 6%. From the percent-age, it can be seen that the two methods produced very much differences on value R 2 .

Prediction of Best Model
From the best SVR model the prediction result that was obtained for three years ahead can be describeed based on estimated actual data prediction and oil palm production prediction in the next year. Figure 9 shows that the average increase for each recording period is 3%-6% in normalization form. It will have the same pattern for year 2017, if the pattern data used are still the same as the actual data and the prediction results. Nature is not a factor that will be used as references in this study. The factors that will be used as references in this study is only based on the final data.

Conclusion
From the conducted research, it can be concluded that SVR model is better than ANN model for oil palm production prediction's case in Riau. ANN Model got the best value of determination coefficient ( ) 74% with galat error 9% on the 8 th experiment, while SVR on the RBF kernel produced a smaller error i.e. 6% and also a bigger i.e. 95%. A very huge difference of determination coefficient value proved that by using time series data, SVR model is more superior compared to ANN model. Prediction results for next three years gradually in normal form as many as 3%-6%. Prediction results do not reckon the nature or other factors in the field that could affect production in each period.  Figure 9. Comparison graph between actual data and RBF model and oil palm production prediction for 3 years ahead