The Stock Exchange Prediction using Machine Learning Techniques: A Comprehensive and Systematic Literature Review

This literature review identifies and analyzes research topic trends, types of data sets, learning algorithm, methods improvements, and frameworks used in stock exchange prediction. A total of 81 studies were investigated, which were published regarding stock predictions in the period January 2015 to June 2020 which took into account the inclusion and exclusion criteria. The literature review methodology is carried out in three major phases: review planning, implementation, and report preparation, in nine steps from defining systematic review requirements to presentation of results. Estimation or regression, clustering, association, classification, and preprocessing analysis of data sets are the five main focuses revealed in the main study of stock prediction research. The classification method gets a share of 35.80% from related studies, the estimation method is 56.79%, data analytics is 4.94%, the rest is clustering and association is 1.23%. Furthermore, the use of the technical indicator data set is 74.07%, the rest are combinations of datasets. To develop a stock prediction model 48 different methods have been applied, 9 of the most widely applied methods were identified. The best method in terms of accuracy and also small error rate such as SVM, DNN, CNN, RNN, LSTM, bagging ensembles such as RF, boosting ensembles such as XGBoost, ensemble majority vote and the meta-learner approach is ensemble Stacking. Several techniques are proposed to improve prediction accuracy by combining several methods, using boosting algorithms, adding feature selection and using parameter and hyper-parameter optimization.


Introduction
Research on financial time series in the last five years has achieved rapid development, based on intelligent learning algorithms, such as forecasting regression [1] and classification [2]. Including, this result has attracted widespread attention for economists, investors, investment managers, and even data scientists. For example, [3], [4], [5] proposed a algorithm to prediction the stock exchange using the decision tree method in a random forest or ensemble bagging decision tree. [6], [7], [8], [9] studied stock trading through deep learning such as deep neural network, long short-term memory, recurrent natural network, and convolutional neural network.
For superior results, several studies have used one or a combination of various types of comprehensive data sets including economics, politics, stock trading time series, company fundamentals, news sentiment, social media, and commodity trading movements. For example, [10] used indicator technical data to predict stocks on the China Shanghai Stock Exchange market, then [11] proposed stock predictions using a combination of types of technical and macroeconomic indicator data sets. combining technical indicators and news sentiment through text mining techniques such as [12], [13], [14], [4], [15], research [16], [3], [17] also found that the company's fundamentals had a positive impact on changes in the company's stock price.
Many types of data sets, methodologies, method modifications, combinations of methods and stock market prediction frameworks published are different and complex, make a helicopter view of the status of existing stock research blurry and disappearing. In addition, for data scientists who want to develop stock prediction models, it is quite difficult to comprehensively look at the gaps from recent research. For example, in a systematic literature issue 2, June 2021 review [18] only explaining modeling techniques and types of input data sets only considers technical analysis and fundamental analysis. [19] only describe methods, types of data sets, and performance evaluation metrics and only on classification topics in their systematic literature review. [20] in their systematic literature review only describes the types of data sets, input variables, methods, performance evaluation metrics. However, the above systematic literature review has not sufficed the needs of data scientists or future researchers for gap analysis and developing stock predictions.
Therefore, it is important to present a complete systematic literature review that has not been done before, such as all kinds of potential topics in stock prediction such as clustering, association, classification, regression, forecasting, and dataset analysis. In addition, the latest types of datasets used are not only technical and fundamental indicators, but also news, social media, macroeconomics including their combination. More importantly, the types of improvements and modifications that have been proposed in the study, are very important for developing predictions and filling research gaps.
So that it gives a complete contribution in systematic literature review papers including, the most significant journals, the most active and qualified researchers, research topics, types of data collections, types of methods, types of methods that are often used, types of best performing methods, types of method improvements and modifications, and types of methods. framework that is the purpose of this paper. The following is the arrangement of writing in this paper. In part 2, an introduction which contains the background and importance of the research is explained, the research methodology is explained in part 2, in part 3 presents the results and answers to the research questions. Section 4 or the last section is finally the conclusion of this paper in summary and the potential future work is explained.

Review Method
The literature related to stock predictions is selected with a systematic approach. Within the scope of information science, systematic literature review is an important method and must be carried out. Collect all research evidence, identify, and assess and aim to answer the specific research question defined SLR [21]. This literature review was conducted as a systematic and comprehensive literature review based on the guidelines proposed by Kitchenham [21]. The review methodology, stages, and some of the illustrations in this section are also inspired by the work [22], [23].
The three main phases of SLR preparation are illustrated in figure 1, they are: review planning, implementation, and reporting of results. The first step is to identify needs and requirements for a comprehensive review (step 1). In the introduction the purpose of the literature review has been described. Furthermore, the identification and review of the existing systematic reviews of stock predictions is carried out. It is possible that researcher bias could exist, so to guide the conduct of the review and reduce bias it is necessary to design a protocol (step 2). In this step, research questions are identified, paper search strategies, determination of inclusion and exclusion criteria for the study selection process, quality assessment, data extraction process from the main study, lastly perform data synthesis. During the implementation, assess and reporting stages of the review, protocol development review, evaluation, and iteratively be increased.

Research Questions
To keep the research focused, it is necessary to determine the research question. Adopting the PICOC approach [21] which stands for population, intervention, comparison, outcome, and context is designed to build the structure of the research question. Table 1 explaination the structure the PICOC questions of research (RQ). From the main study, to answer RQ4 to RQ9 we extracted stock prediction algorithms, method development, dataset types, and prediction frameworks. Furthermore, the extraction results are analyzed to determination which ones are included and which are not related to the significance in stock market predictions. Identify the researchers who have contributed the most to the field of stock market prediction research. RQ 3 What are the trending research topics studied in the field of stock market prediction?
Trend analysis and research topics that emerged from stock prediction research. RQ 4 What kind of datasets and features by which amount to predict stock trading?
Identify the types of data sets and data combinations that have a high probability of accuracy for stock trading predictions. RQ 5 What are the different methods for solving stock market prediction challenges?
Identify research gaps related to stock prediction methods metode. RQ 6 What type of method is most frequently adopted by researchers for stock prediction?
Identify the development potential and high performance of frequently used methods. RQ 7 What are the current high-performing methods for stock prediction?
Identify new methods as potential developments and combinations of methods for better results. RQ 8 What improvements and modifications have been made to improve stock prediction performance?
Identification of research gaps and differences that could potentially improve predictive performance. RQ 9 For stock prediction, what kind of framework is available in the paper proposed by the researcher?
The available framework is proposed to identify gaps from one researcher to another.  Table 2 shows the motivations and research questions discussed in the literature review. To help evaluate the context of the main study, the main research questions like in RQ4 to RQ9 and general questions lie in RQ1 to RQ3. RQ1 to RQ3 provides us with summaries and synopsis of specific research areas in the field of stock exchange prediction.
The basic mind map was compiled to make it easier to see a comprehensive picture of the study shown in Figure 2 below. The types of data sets, frameworks, learning algorithms, and method improvements are identified which are the main objectives of this paper.

Search Strategy
Several processes in the search for papers (step 4) were carried out, including selecting a digital library, the search string was determined, running a search based on the string, updating the search string, and then some initial lists of studies were retrieved from the digital library that matched the search criteria.
Before the search begins, to increase the chances of finding suitable papers several screening sets should be set. The most popular digital library database in the scope of scientific knowledge is searched for various studies in the issue 2, June 2021 world. A broad perspective is required for a broad coverage and extensive literature. The following is a list of digital databases that were searched: (1) SpringerLink (springerlink.com), (2) ScienceDirect (sciencedirect.com), (3) IEEE eXplore (ieeexplore.ieee.org), (4) Emerald (emerald.com), and (5) Taylor&Francis (tandfonline.com) Using the following steps several strings were developed, (1) identification of the appropriate search space keywords from PICOC, focus terms from the intervention and population sections, (2) identify search terms from research questions, (3) search terms in titles and abstracts, and identified relevant keywords, (4) identifying search terms through synonyms, alternative vocabulary, and alternative spellings, (5) Construction of advanced search strings using identified search terms, Boolean AND and OR.
The following search string is finally used: (stock OR shares OR exchange) AND (market OR price OR return) AND (forecast* OR fundamental OR technical OR predict* OR probability OR assess* OR estimate* OR classificat*) Search string adjustments have been made, but the original is preserved, on the other hand engineering of the search string causes an increase in the frequency of irrelevant studies. Search in database by title, abstract and keywords. Searches are limited to the 2015 to 2020 publication period. The requirements of each database are tailored specifically to that search string. Only English language papers and journal papers are included, an exception to conference proceedings.

Study Selection
To select the main study inclusion and exclusion criteria were considered. Table 3 shows these criteria.
Mendeley Desktop is used to store and manage search results papers. Figure 3 shows each phase of the search process and the number of studies identified in detail. The two-step study selection process (step 5) is shown in Figure 3: (1) excluding the main study by title and abstract, (2) exclusion based on the full text of the paper. In addition, literature review studies and other studies that did not include experimental results were excluded. The degree of research similarity to the prediction of stock price movements, return benefits, and stock price forecasting was also included in the study.  A total of 81 main study lists were generated from the first phase of the search process. Then, analyzes were carried out for the entire text in the papers of the 81 main studies. At the time of the analysis, the relevance to the research question and the quality of the research were taken into account. At the end of this paper, an appendix with a complete list of the 81 selected studies (table 6) is presented.

Data Extraction
In collecting data that contribute to answering the research question, the main studies are extracted. To complete the data extraction, each of the 81 main studies was identified and analyzed into a special form designed to collect the study data needed to answer the research question (step 6). Table 4 shows a total of 6 properties identified through research questions and used to answer research questions, iteratively data extraction was carried out.

Study Quality Assessment and Data Synthesis
To gather scientific evidence from studies that are screened to answer research questions is the goal of data synthesis. The process of assessing the quality of the study (step 8) to determine the strength of the conclusions outlined. Combining several scientific works of evidence can make an information strong, rather than one or two pieces of evidence that may be weak as information. In this review, the data extracted include qualitative data and quantitative data. Different strategies were used to synthesize the extracted data with regard to different types of research questions.
In general, the narrative synthesis method is used. Data is tabulated in a way that fits the questions. Several types of visualizations including histograms, bar charts, pie charts, and tables are used to improve the presentation of research trend distributions, learning algorithms, improve market prediction methods and model performance.

Threats to Validity
This SLR aims to analyze studies related to stock predictions based on machine learning algorithms, method improvements, types of stock prediction data sets, and frameworks. Bias in the study may exist, but it is not the scope of this study. The selection of published studies is not based on the filtering of all papers published in the journal but based on the related research journals above. In practice, some conference or journal proceedings may not be detected if the study is outside the scope of this research, for example stock prediction studies in finance & management journals or accounting journals.
Studies from conference proceedings are not excluded from this SLR, as most studies are generally published in conference proceedings. Due to the increased workload significantly when reviewing studies some of the SLR papers did not use the conference proceedings as in [24]. In contrast to [18], [19] included conference proceedings as the main study in their systematic literature review.

Significant Published Journal
A total of 81 major studies analyzing stock predictive performance are featured in this literature review. Figure 4 shows a real picture of the distribution of studies over the last five years from various digital libraries in the world, which is shown to show the level of interest of scientific researchers in the field of data science, stock prediction has increased from the past to the latest years. There is a significant increase in studies published in 2019, showing that there is a drastic trend of increasing interest. Figure 4 also shows that 2020 is still in June when this article was written, the number of publications is higher than in previous years meaning that the field of research on stock market predictions is still very relevant today and is getting a strong concern. The international journals that are the target of publication are shown in Figure 5, based on the selected main studies. In this figure conference proceedings are excluded.
The graph above shows that the Expert Systems with Applications journal is at the top because it makes up the majority of stock prediction research papers in publications. Expert Systems With Applications is the best journal of international reputation, which publishes expert systems and intelligent algorithms in various research groups in the world such as academic research in universities, industry and business around the world. In the field of computer and information science, this journal is very comprehensive in publishing related to knowledge management, data science, data analytics, business analytical algorithms, machine learning, deep neural networks, big data analytics, data mining, text mining, genetic algorithms, and heuristic optimization as well. published in this journal.
In addition to these fields, this journal also has a good track record in terms of managing editors and reviewers. This journal scores 1.49 in the Scimago Journal Rank (SJR) and occupies the Q1 category in Artificial Intelligence. Therefore, the journal Expert Systems With Applications is very prestigious as the main goal and choice for researchers to publish their work. The second most significant journal is IEEE Access, this journal maintains a special section highlighting specific topics of IEEE interest. IEEE Access is published by the Institute of Electrical and Electronics Engineers (IEEE) with a concentration area with specialization in the application of science, technology, engineering, and mathematics [26]. There is no doubt that the quality of the IEEE Acess journal has a score of 0.78 in the Q1 category in Computer Science.
Soft computing journals get the third highest number of publications for stock prediction research. This journal is very popular among informatics engineering and information science, published scientific works include software engineering studies, methodologies, data science, data analytics, algorithms and optimization, and soft computing foundations. A total of 36 international reputable journals are presented quantitatively in Figure 5, and qualitatively displayed the ranking scores in Table 5.
With the summary and analysis of this journal, it is hoped that it can become a reference for further researchers in finding research gaps, innovations, and contributions to their research more easily and with quality. Based on the main study extracted, the Scimago Journal Rank and Quartile category (Q1-Q4 scale) are shown in table 5. To facilitate analysis, published journals of international repute are arranged based on SJR scores from the highest to the lowest. For further research, focus on the top five journals for the search for scientific papers on stock predictions because there may still be many papers published in those journals in June 2020, the limit of the search for this paper.

The Most Contributing Researchers
It is necessary to investigate the researchers who made the most influential contributions to stock prediction research. The goal is that the researcher will become a role model and follow his scientific works. These researchers are most contributing and most active researchers are shown in Figure 6 below. The first and non-first author were selected based on the order in which they were written on the paper. All researchers were enrolled according to the main study.

Research Topics in the Field of Stock Market Predictions
Stock market prediction is a significant research topic in the field of data mining and has begun to place great emphasis on machine learning techniques as they exhibit a broad ability to simulate more complex problems [3]. Recent stock predictions focus on five topics, these are the findings and disclosures of a comprehensive analysis of the main studies.
Estimating stock price movements and returns in trading time series, using estimation algorithms or regression forecasting algorithms (Estimation/ Forecasting/Regression).
Finding the relationship between the emergence of bullish and bearish signal indicators for stock price movements using the Association rules algorithm.
Using a classification learning algorithm, classifying stock price movements usually into two to three classes such as "Up and Down", or "Buy, Sell, and Hold" (classification).
Using a clustering algorithm, shares will be grouped against an investment decision making criterion (clustering).
Analyze and pre-process stock market data sets (Data Set Analysis).
Estimation is the first type of work, the approach used is regression or statistics, [10], Linear Regression [27], artificial neural networks [28] to estimate stock prices and profit returns in time series trading on the stock exchange. Estimation or forecasting results help as an important tool for contribute knowledge in the academic and financial environment [8], and can be used to support investor decision making in the selection of issuers that can provide short-term and long-term returns [29].
The second type of work (Association), uses the Associated Network algorithm to expose model associations in stock market predictions [30], this association method can be used to find relationships between signals that appear in technical indicators in stock predictions. which will be bullish.
The third type of work (Classification), classifies indicator data from a stock as "Buy", "Hold", or "Sell" through deep learning and neural network based classifications [31], [32], [33], [34], [16]. Prediction results can also represent an "Up" or "Down" trend so that investors can make decisions on investment entry positions by applying two single non-linear classifiers ANN, SVM and one RF ensemble approach to predict the direction of the next day's movement [35], [5], [36]. issue 2, June 2021 Clustering is the fourth type of work (clustering), using an unsupervised algorithm from machine learning that aims to group stocks based on certain input criteria, the K-Means algorithm proposed by Ying Xu, Cuijuan Yang, Shaoliang Peng, Yusuke Nojima (2020) to predict stock price movements [37]. Unsupervised learning methods such as grouping can be used to predict stock price movements, especially in improvising the merging of financial news data sets and social media to classify positive and negative sentiments.
The main focus of the fifth work (dataset analysis) is how to deal with data problems and pre-processing of data sets on indicators of stock price movements. Some researchers pre-process the datasets using several methods, while others analyze the stock indicator datasets in various aspects of viewpoint. [38] demonstrates and explains why dozens of technical indicator datasets require significant pre-processing to improve predictive accuracy. with the construction of the random forest method, they combined it with the method of treatment the problem of unbalanced data distribution. ntil now, it is still an NP-Hard problem to solve the superior combination of features [38], they use the forward sequential method to select candidate features.
The distribution of stock prediction research topics from January 2015 to June 2020 is shown in Figure 7. 56.79% of research studies are on estimation topics, 35.80% of studies focus on classification techniques, and 4.94% of main studies are related to analytical topics. datasets. Grouping and association is a minority research topic only 1.23%. Finally, the topic of classification and estimation has great interest in stock prediction and in the future it is possible to improve the performance of that topic. hy researchers pay so much attention to this topic, here are three possibilities: (1) the topic of estimation and classification is important according to the needs of the industry which requires several algorithm for predict which indicator signals very possible for give better returns. (2) related to technical datasets of indicators and fundamental indicators, most of which are ready to be used for classification and estimation methods. The third reason, (3) The performance of the clustering and association methods is less satisfactory than the stock price estimation and classification methods, therefore the clustering and association methods are very minimally published.

Datasets Type Used For Stock Predictions
For learning development purposes, certain data sets are used [39]. Different types of data sets provide different performance. More importantly, the treatment of the data set is very sensitive to the numerical results of the model measurements. Therefore, a collection of data that represents the object of research and its treatment before entering the modeling stage is very important and must be considered. In this subsection, based on 81 main studies, the distribution of data set types from January 2015 to June 2020 is shown in Figure 8. A total of 74.07% of research studies use technical indicator datasets, being the largest percentage of the 7 dataset combinations that we found. Furthermore, 12.35% of research studies use a combination of technical indicators and news. These data sets are mostly located in public repositories of stock exchanges such as China, Korea, German, Netherlands, Great Britain, Japan, Hongkong, Canada, France, India, America Stock Exchange, etc. and are freely distributed.

Methods Used in Stock Market Predictions
Since 2015, 48 methods have been adopted and proposed as the best algorithm for stock prediction as shown in Figure 9. The method is an application of five main topics of stock prediction.

The Most Often Used Method in Stock Prediction
Identification was carried out on 9 classification methods and the most widely adopted estimation of the 48 methods. Such methods include, ANN, LSTM, CNN, MLP, LR, RF, SVM, k-NN, and NB.
The percentages of SVM, RF, LSTM, and ANN are used by 65% of stock prediction research, meaning that this method still has the potential to be applied to stock prediction topics, will continue to be developed and receive special attention from researcher.

Best Performing Method for Stock Prediction
While many studies on predicting stock price movements individually report on measuring the performance of the proposed modeling methods, there is no strong consensus as to which method is best when each is looked at individually. Lee et al. [40] and Li et al. [41] concluded performance the SVM performs very well compared to other machine learning algorithms such as DT and Neural Network-based (MLP & LSTMNN) by producing a higher level of accuracy and return. high. However, Hall et al. highlights that studies using SVM underperformed. These may perform below expectations because they require parameter optimization for best performance [42].
RF seems to be the best performing method used in the field of stock prediction [4]. Likewise with Khan et al. [5] the random forest method provides high accuracy with a combination of types of technical data set indicators and related company news sentiment.
Several studies on stock market predictions show that as a predictor, the RMSE Neural Network value is very good [7], [43] in regression problems. There is also, to find the best hyperparameter Neural Network, Genetic Algorithm is adopted. NN has proven to be more adequate to deal with complex and non-linear relationships between Stock Exchange metrics and stock indicators that fluctuate over time series [33]. On the other hand, it is still a chore to handle the appropriate parameters of the network architecture on the Neural Network, including learning rate, number of hidden neurons, momentum, and training cycles [44].
Finally, it can be concluded that the best performance results are obtained from the right method and the right pre-processing for the right data sets [4]. For all data, no specialist method is high-performing. Table 6 shows the methods, types of data sets, and performance results of each method that performed well.
However, although various stock prediction methods and improvement techniques have been proposed, none have been shown to perform consistently when predicting. consistently highperforming predictive methods are a major challenge for economists, investment managers, and data scientists. There is a crucial need for a consistently high-performing stock prediction framework that must be more robust against class imbalance, noise, and issues associated datasets.

Ensemble Machine Learning
Training multiple learning machines, combining their outputs, and making a final predictive decision from the combined outputs through weighting, majority voting, or using a meta-leaner algorithm is an understanding of ensemble learning, treating multiple learning machines like decision-making "committees". [39]. Several empirical and theoretical studies prove that models built using the ensemble method achieve higher accuracy than single models. In research [11], one type of ensemble method was able to outperform the performance of the latest state of the art decision tree based and deep learning methods. Then the decisions of each ensemble member can be combined with several methods such as majority vote, average, probabilistic, and weighting. Most types of ensemble methods can be applied very well in various types of learning tasks.
The Bagging and Boosting ensemble algorithm is among the effective and popular ensemble methods in machine learning. There is one more type of ensemble, namely Stacking with the Meta-Learner concept. Bagging and Boosting combine a single algorithm of the same type, while Stacking can combine different types of models. Stacking will perform well with several different models combined. Because when one model gives poor results at a point, but there are other models that get good results at that point, the combination of Stacking models will correct errors and improve performance results. In several studies, Meta-Learner built with the Stacking method has the ability to surpass several Decision Tree-Based and Deep Learning ensemble models.

Feature Selection
In order to improve the performance of machine learning techniques to reduce the dimension of the data is proposed, this is the study of feature selection. If a data set with f features and d dimensions, feature selection aims to reduce d to d' and d' d [39]. To reduce this dimension is the most frequently used approach. There is also a feature extraction approach that is quite effective, of course experiments need to be carried out to obtain the best results. For the feature extraction simulation, assuming it has six features f1, f2, f3, f4, f5, and f6 if it produces 3 features, then the 3 selected features are a subset of the original 6 features (for example f1, f3, f5), but 3 extracted is a combination of 6 original features.
Some of the advantages of selecting features include improving the performance of learning algorithms, accelerating computing, and effective data collection. Only retain relevant features and eliminate irrelevant features which are commonly called niose without affecting learning performance, redundant features also include irrelevant ones.
Filters, wrappers and embedded are three popular feature selection methods. The filter method evaluates learners independently, ranks features based on statistical measure scores, and selects the best. Wrapper is known as a very expensive method, because to determine whether a feature should be selected or removed requires an evaluation by learning. The embedding method is like in decision tree induction where features must be selected first at each brainching point. The filter and wrapper methods are most widely used when feature selection is carried out during data pre-processing.
In addition, to find out the best combination of features in terms of reducing dimensions, the following techniques can be used, (1) such as Subset Selection (feature ablation), this technique is a time-consuming brute force method, namely by trying all combinations of features, (2) Principal Components Analysis (PCA) or dimensionality reduction improves on previous techniques that focus on eliminating features, which can lead to the removal of potential features in the model not being able to understand the complexity of the problem. PCA transforms data from one space to another, where data is represented with smaller dimensions. With a note, data with smaller dimensions must be able to represent the characteristics of the data in the original dimensions, so that one feature in the new dimension may contain information about several features in the original dimension.

Parameter and Hyper-parameter Optimization
In machine learning, the presence of certain variables makes a learning algorithm much more flexible than the "Seeming Stupid Model", in the sense that it is able to map more data from the input domain to the output domain correctly. These variables are called parameters, weights, or coefficients. There is also a model component, namely hyper-parameters that affect the model output. Parameters and hyper-parameters that are set at random may not match the training data or maximum output, so optimization or tuning is necessary.
Several studies have proven that optimized parameters and hyper-parameters can improve model performance than models without optimization. Research [36] in hyper-parameter XGBoost algorithm optimized using Genetic Algorithm, [51] LSTM neural network parameters optimized using Random Search, [5] hyperparameter Random Forest algorithm optimized using the Grid Search method, [52] experimented with four algorithms namely ANN, SVM, RF, and LR which were optimized using the Grid Search method, as well as [43] the Deep Naural Network that was built optimized using the grid Search method.

Proposed Frameworks for Stock Prediction
A total of ten frameworks that have innovations accompanied by complete data mining stages in the last five years in the field of stock market prediction are described in the following sub-chapters.

Markovic et al.'s Framework
The results of their research [53] reveal that the Support vector Machine classifier has an average predictive performance of 60 percent for the three stock exchange markets (BELEX15, S&P500, FTSE100), after selecting attributes using the AHP method. The feature weighting results generated by the Analytical Hierarchy Process method are used for ranking and feature selection, and used with LS-SVM through a weighted kernel. The results of the performance measurement show that the proposed method performs better than the ANN and RF methods.

Weng et al.'s Framework
Three stock prediction methods are compare in this study [55]. Their research reveals that the Support Vector Machine classifier has an average prediction accuracy of 82 percent, an average precision of 80 percent, after attribute selection using the Recursive Feature Elimination (RFE) algorithm. SVM-RFE significantly outperforms Neaural Network and Decision Tree prediction methods. Figure 15 shows the predictive framework for Weng et al.

Weng et al.'s Framework
Weng et al., [13] also follow up on research results [53] and also the results of their previous research [55] about stock price predictions. In its development, Weng et al. In more detail, the preprocessor data includes cleaning data for existing missing and outliers and transformation data. The main differences between the frameworks of Weng et al. and the previous framework lies in the selection of attributes using a PCA selector by comparing the performance of four classification ensemble algorithm techniques to predict stock price movements.

Guo et al.'s Framework
Different from other studies, [10] trying to use Particle Swarm Optimization (PSO) in selecting features. Furthermore, comparing the performance of two classification algorithm techniques to predict stock prices.

Zhang et al.'s Framework
Zhang et al. [38] published a study that focuses on learning stock price prediction schemes consisting of two section: scheme validation and stock price forecasting. The learning scheme is evaluated for its performance for schematic evaluation, while stock price forecasting focuses on making the final forecast using a time series data set according to the methodology. then predictors are used to forecast historical patterns of new stock price trends.
Learning method is composed of: (1) preprocessor data, (2) attribute selection, (3) balance class distribution, 4) learning algorithms. The four main differences from its predecessor framework lie in preprocessing data which uses pattern duration, model duration, and test duration. The feature selection in this study uses the forward sequential search (FSS) method, selecting only one of all candidates for all existing sections. [38] also improvised the class imbalance problem in a classification, so they added the embeded RF algorithm method for the imbalance class problem distribution.

Alsubaie et al.'s Framework
Research by Alsubaie et al. [57] compares the Naïve Bayes (NB) and Cost-sensitive Finetuned Naïve Bayes (CSFTNB) algorithm techniques to predict stock price movements. In feature selection, they compared Relief-F as the proposed method with Correlation and Gain Ratio.

Rustam et al.'s Framework
Rustam et al. [58] perform the attribute selection method using PSO as has been done by Guo et al. [10], after normalization of technical data indicators and support vector regression (SVR) learning algorithms. Feature selection using Particle Swarm Optimization showed superior performance against the experimental results of the study. The average RMSE measurement results are very small, which is below 0.1.

Conclusion and Future Work
This literature review identifies and analyzes research topic trends, types of data sets, learning algorithm, methods improvements, types of best performing methods, and frameworks used in stock exchange prediction. A total of 81 studies were investigated, which were published regarding stock predictions in the period January 2015 to June 2020 which took into account the inclusion and exclusion criteria.
Estimation or regression, clustering, association, classification, and preprocessing analysis of data sets are the five main focuses revealed in the main study of stock prediction research. The classification method gets a share of 35.80% from related studies, the estimation method is 56.79%, data analytics is 4.94%, the rest is clustering and association is 1.23%. For this type of input data set, 74.07% of the research uses technical indicator datasets, then 12.35% of research studies use a combination of technical indicators and news data, the third place is 6.17% using a combination of technical indicators and fundamentals, the rest is below 5%. Therefore, there is still ample opportunity for further research on topics of high interest such as estimation and classification, as well as the types of datasets that have recently been used such as the combination of technical with news, as well as with macroeconomics and fundamentals need to be followed up more deeply.
To develop a stock prediction model 48 different methods have been applied, 9 of the most widely applied methods were identified. They are ANN, LSTM, CNN, MLP, LR, RF, SVM, k-NN and NB.
There is no strong consensus as to which method is best when each is looked at individually. The SVM using a technical dataset type indicator in [40] and [41] has reported that it works very well by producing an accuracy rate of 71.5% and a return of 13.9 % per week compared to RF with an accuracy rate of 66.2% and a return of 10.8% per week and LR with an accuracy rate of 65.8% and a return of 11.1%. RF is the best performing method in [3] combining types of technical data sets, indicators and company fundamentals. Likewise with Picasso et al. [4] combining the types of technical data set indicators and financial news (sentiment) RF method provides an accuracy rate of 67.0% and an annual return of 85.2% which is the same as work [5] level 73.71% accuracy for the combination of technical indicators and news in the RF method. The Neural Network research [46] using a type of technical data set indicator provides an accuracy rate of up to 74.1%, as well as the RNN in [47] shows an accuracy rate of 73.42 %. In another study, the ensemble Stacking technique was used, combining technical and macroeconomic indicators [11] using the four basic DecisionTree-Based algorithms with the Stacking metaclassifier LR method, the results showed RF 66.57%, ERT 70.07% , XGBoost 65.96%, LighGBM 66.87%, and Stacking-LR obtained 70.74% results, the change from base-classifier to meta-classifier was 0.67%.
Finally, it can be concluded that the best performance results are obtained from the right method and the right pre-processing for the right data sets. For all data, no specialist method is high-performing. The results of this study also identify ten frameworks that are highly systematic and therefore influential in the field of prediction. In the future, from this systematic literature review we find and propose to perform a combination of input data set types for stock prediction because it has provided significant performance compared to a single data set type. In addition, it can contribute to combining several techniques to improve machine learning performance for stock market prediction; (1) by combining several machine learning methods through ensemble techniques, (2) by using boosting method, (3) using deep learning method because there are several developments of neural networks (4) by adding dimension reduction or feature engineering (selected) methods and (5) by using optimization or hyper-parameter tuning. We believe that there are still many research gaps that can be found for further research in improving stock prediction performance from three things, namely; (1) method selection, (2) data set type selection, and (3) machine learning performance enhancement modification techniques.
Finally, table 7 will present a list of 81 main studies from January 2015 to June 2020. The list consists of six columns including year, name of researcher, name of publication journal, type of data collection, method, and topic. To complete the basic mind map, it is necessary to present a complete mind map resulting from a systematic review of stock prediction literature. Figure 22 is a complete mind map that has been used to comprehensively explore the relationship between the basic idea and the results of the exploration to answer the problem formulation. To see the big picture of all relevant issues and analyze options, a complete mind map form is a new perspective [60]. A complete mind map will make it easier to manage knowledge information comprehensively and integrate new scientific work. In this study, mind maps are used to present the results of a systematic literature review related to stock predictions.