Improving Classification Performance on Imbalanced Medical Data using Generative Adversarial Network
Abstract
In many real-world applications, the problem of data imbalance is a common challenge that significantly affects the performance of machine learning algorithms. Data imbalance means each target of classes is not balanced. This problem often appears in medical data, where the positive cases of a disease or condition are much fewer than the negative cases. In this paper, we propose to explore the oversampling-based Generative Adversarial Networks (GAN) method to improve the performance of the classification algorithm over imbalanced medical datasets. We expect that GAN will be able to learn the actual data distribution and generate synthetic samples that are similar to the original ones. We evaluate our proposed methods on several metrics: Recall, Precision, F1 score, AUC score, and FP rate. These metrics measure the ability of the classifier to correctly identify the minority class and reduce the false positives and false negatives. Our experimental results show that the application of GAN performs better than other methods in several metrics across datasets and can be used as an alternative method to improve the performance of the classification model on imbalanced medical data.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).