STUDY COMPARISON BACKPROPOGATION, SUPPORT VECTOR MACHINE, AND EXTREME LEARNING MACHINE FOR BIOINFORMATICS DATA

A successful understanding on how to make computers learn would open up many new uses of computers and new levels of competence and customization. A detailed understanding on inform- ation-processing algorithms for machine learning might lead to a better understanding of human learning abilities and disabilities. There are many type of machine learning that we know, which includes Backpropagation (BP), Extreme Learning Machine (ELM), and Support Vector Machine (SVM). This research uses five data that have several characteristics. The result of this research is all the three investigated models offer comparable classification accuracies. This research has three type conclusions, the best performance in accuracy is BP, the best performance in stability is SVM and the best performance in CPU time is ELM for bioinformatics data


Introduction
A successful understanding of how to make computers learn would open up many new uses of computers and new levels of competence and customization.And a detailed understanding of information-processing algorithms for machine learning might lead to a better understanding of human learning abilities (and disabilities) as well [1].Many type of machine learning that we know, some of them are Backpropagation (BP), Extreme Learning Machine (ELM), and Support Vector Machine (SVM).
First Machine Learning is backpropagation.Backpropagation was initially formulated by Webros in 1974, which was later modified by Rumelhart and McClelland [2].Backpropagation is the gradient descent type algorithm, which has connection parameter for each step or iteration.But, this algorithm can provide harmony result between "network capability" to recognize the patterns which used for training and "network capability" to respond correctly to the input patterns that similar (but not equal) to the training pattern.Then, Backpropagation algorithm has limit classification accuracy, because if the output value is different from the target value, an error will be calculated, and then taken from the output layer to the input layer (Backpropagation process).
Until now, many researchers have developed and implemented the BP algorithm [3][4][5][6][7].Implementation BP for classification was implemented in most problem as bioinformatics, biomedical, chemistry, art, environment, etc.Several research 54 Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information), Volume 8, Issue 1, February 2015 use this algorithm because BP has minimum error and effective for some problem, especially in this study for classification problem.
The next powerful machine learning is Support Vector Machines (SVM).Support vector machines (SVM) are a group of supervised learning methods that can be applied to classification or regression.This Machine learning is different from Machine Learning in ANN.In SVM when training process, SVM isn't training all data, but this algorithm just training support vector data.Based idea from this algorithm is optimization margin hyper plane.Although SVM need big memory and need long time to process data, but many researcher use SVM to solve some problem [8][9][10][11], because this machine learning have high performance.
Extreme Learning Machine is one of a new learning algorithm in neural networks, which has the Single-hidden Layer feed-forward Network (SLFN).ELM has a very fast learning capability and training small error [12].First ELM was introduced by Huang in 2004 as Single-hidden Layer Feed-forward Network.ELM was made to overcome the weaknesses of the feed-forward neural networks problem that learning speed.Traditionally, feed-forward neural network using gradientbased learning algorithm for training, as well as all the parameters (input weight and hidden bias) are determined by iterative network, to solve that problem, Extreme Learning Machine using minimum norm least-squares (LS) solution of SLFNs.
Unlike the traditional function approximation theories which require to adjusted input weights and hidden layer biases, input weights and hidden layer biases can be randomly assigned if only the activation function is infinitely differentiable [13].So, ELM can be faster than prior the neural network algorithm previously.ELM has been applied and developed in various fields [14][15][16][17][18]. Up until now this algorithm is developed, the main reason many research use this algorithm because this algorithm is simple and faster than several algorithm in ANN.

Backpropagation
Backpropagation is one of many supervised machine learning.Backpropagation is the gradient descent type algorithm .This algorithm has two phases for processing data.First phase, input vector given to input layer, continued to hidden layer then finding output value in output layer.Second phase, if the output value is different from the target value, an error will be calculated, and then taken from the output layer to the input layer (Backpropagation process) [19].
In BP, transfer function must fulfill several conditions: continue differentiable, not descending function.In this paper we use sigmoid binary function as defined in equation( 1) and its derivation as in equation (2).
Backpropagation training follow these step [20]: a) Initialize neuron's weight with random number.Forward propagation b) Each input layer neuron receives the inputs and passes it to the connected hidden layer's neuron.c) Compute output value from hidden layer   ,  = 1,2 … ,  , with the weight is   that connected input layer and hidden layer using equation(3) and equation (4).
If the hidden layer end, we continue to output layer.d) Compute all output values from output layer as equation( 5) and equation (6).
Backpropagation e) Compute the output layer's error factor based on the error in each of the output layer's neuron using equation (7) and equation (8).
f) Compute the hidden layer's error factor based on error in each of the hidden layer's neuron using equation( 9) to equation (11).
() =   () + ∆  (13) Figure 1 shows the architecture of Back-propagation Algorithm [19].This architecture has two processes, i.e. forward propagation and back propagation.That is all was explained before.Support Vector Machine Support vector machines (SVM) are a group of supervised learning methods that can be applied to classification or regression.Originally, SVM is a training algorithm for linear classification.For non-linear case, SVM maps data sets of input space into a higher dimensional feature space, which is linear and the large-margin learning algorithm is then applied, the mapping can be done by kernel functions.Because, in the high dimensional feature space, that have maximal margin between the classes can be obtained, that called linear hyper plane classifiers [21].This is step of support vector machine algorithm: Given training data set D and q feature of each data, there are Φ, such that formulated in equation (14) and equation (15).
Where r is new feature set that result from mapping D. while x is training data, which  1 ,  2 , …,   ∈   is dataset that mapped to r dimension.
The training data is defined using equation( 16).
Mapping process in SVM needs dot product operation, which denoted as Φ(  ).Φ(  ).We can compute it without knowing the transform function of Φ.This computation technique is called kernel trick.Many type of kernel trick in SVM is shown in Table 1.
All explanation in Table 1 can be resumed in illustration of hyper plane SVM. Figure 2 shows an illustration of margin hyper plane in SVM: SVM has positive margin, hyper plane and negative margin.

Extreme Learning Machine
Extreme learning machine is a group of supervised learning methods too as BP and SVM.This algorithm is one of a new learning algorithm in neural networks, which has the Single-hidden Layer feed-forward Network (SLFN).ELM has a simple algorithm, very fast learning capability and training small error [12].First ELM was introduced by Huang in 2004 as Single-hidden Layer Feed-forward Network (SLFNs).ELM made to overcome the weaknesses of the feed-forward neural networks problem that learning speed.Traditionally, feed-forward neural network using gradient-based learning algorithm for training, as well as all the parameters (input weight and hidden bias) are determined by iterative network, to solve that problem, Extreme Learning Machine using minimum norm least-squares (LS) solution of SLFNs.
A standard single layer feedforward neural network with n hidden neurons and activation function g(x) can be mathematically modeled as equation( 17) and equation (18).
Note that H is output matrix hidden layer.Thus, � ELM algorithm is derived from the minim-um norm least squares solution SLFNs.
Although, ELM is "generalized" of SLFN but hidden layer (feature mapping) of the ELM does not need to be tuned.Main concepts of ELM as presented in the journal Huang (2006) † is Moore-Penrose Generalized Inverse.A matrix  ∈  × (ℂ).There exist a unique Then Activation function in the ELM must be infinitely differential (i.e sigmoid function, RBF, sine, cosine, exponential, etc.).Many hidden node depend on the number of training samples is N ( � ≤ ).
Several methods can be used to compute the Moore-Penrose Generalized Inverse of H, which are orthogonal projection, orthogonalization method, iterative method, and singular value decomposition (SVD).
This procedure of research in this study will be presented in the flow chart of research as follows in the Figure 3.
The first step in this research is preprocesssing data, in this case used Z score normalization.The formulation Z score as the following equation (21).used as given by equation (22).Threshold in this research use 0.5.
This research will compare Backpropagation algorithm, Support Vector Machine using linear kernel, and standard Extreme Learning Machine.They have been compared using six classification dataset, which is all binary classification.
The datasets in this case is taken from the UCI Machine Learning Repository.From the six selected dataset there are some attributes missing data, so that the pieces of data that have lost attributes was deleted.
Accuracy, precision, recall, CPU time, and misclassification data were used as evaluation measure to compare that three algorithm.Because the dataset uses not only balance data but also use imbalanced data.That formulation or accuracy, precision and recall as the following equation( 23 The dataset has been split into two categories based on data volume [21] and balancing data [22].All simulation have been carried out in MA-TLAB 2012b environment running in an AMD E-350 Processor 1,60GHz.

Results and Analysis
In this section we will discuss the results of this research, that is a comparison of the performance of BP, SVM, and ELM, including accuracy, precision, recall, CPU time and misclassification.First, we analyze the result of accuracy, the result of accuracy shown in the Figure 4.
From above data we can show the accuracy of ELM and SVM are not dependent on a small or big data but on balancing the data, whereas BP algorithm always achieves the highest accuracy.But in precision data SVM and ELM have precision higher than BP, that is accuracy for SVM and ELM is 1 and 0.93 then BP is 0.89.If we calculate accuracy of BP, SVM, and ELM for all data is 0.90 ± 0.072, 0.90 ± 0.093, 0.82 ± 0.079 in succession.
Second, we analyze the precision of each algorithm and each data.From Figure 5 we can show that precision is high if the data is balanced data, because the system or machine learns that  6).Recall, accuracy, and precision in BP ELM not only depend on data volume or balancing volume, but also dependent on determine of threshold, in this result we use 0.5 for all data.If we calculate recall of BP, SVM, and ELM for all data is 0.87 ± 0.12, 0.88 ± 0.11 and 0.82 ± 0.16 in succession.
We can show in three Figure above, accuracy, precision, and recall of BP in promoter dataset always above ELM and SVM.This occur phenomenon because SVM and ELM have best performance in balance dataset.Because promoter dataset have balanced characteristic, so ELM and SVM can surpass performance of BP.
The last, we show the diagram of CPU time and number misclassification data for each data and each algorithm.Result of CPU time and the number of misclassification data shown in the following diagram.
Figure 7 shows the CPU performance.The CPU time of Parkinson data using BP method is faster than CPU time in promoter data, however the Parkinson data size is bigger than promoter data and the data type of promoter data is balan-ced data.That could be happened because attribute of promoter data is more than Parkinson data.
The result of CPU time is in seconds.From the above results it can be seen that the CPU time of the ELM and SVM is almost always the same as the average difference of them is 5 second, while the CPU time of the BP is always high, especially for large data and imbalance.
Then, the most of misclassification data is in the large data and imbalance data, because the system more learns a lot of majority data than the minority data.Not only characteristic of data but also determine of threshold also have influence of misclassification data.

Conclusion
This paper discusses and compares three classification model and seven data that have two characteristic.All the four investigated models offer comparable classification accuracies.ELM has a good performance in balance data, so if we use ELM for imbalanced data must be conditioned so that the data used balance, using under-sampling or other.SVM has a very good performance but requires considerable memory in the training process.BP always has the best perform in accuracy, precision, and recall, the performance has a weak- ness at the time of computation.So we have three type conclusion, the best perform of accuracy is BP, the best perform of stability is SVM and the best perform of CPU time is ELM.

Figure 8 .
Figure 8. Diagram of the number of misclassification.