New approach with ensemble method to address class imbalance problem

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

An attractive research in recent years is solving class imbalance problem in imbalanced dataset. The class is imbalanced when the number of one class (majority) is more than another one (minority). The classification of this imbalanced class causes imbalanced distribution and poor predictive classification accuracy. This paper introduces a new ensemble –based method for imbalanced data set classification using Synthetic Minority Over-sampling Technique (SMOTE) and Rotation Forest algorithm to address class imbalance problem. Rotation Forest applied as ensemble classifier combines with well-known re-sampling method (SMOTE). It constructs classifiers with obtaining features by rotating subspaces of the original dataset. The advantages of Rotation Forest rather than other ensemble methods (Boosting, Bagging, Random Subspace) is that same information held as original data sets and no information lost in data sets which used to construct classifiers. Experimental results reveal the effectiveness of SMOTE and Rotation Forest performance at data level in overall accuracy, Cohen’s kappa Coefficient, False Negative rate, AUC, and RMSE compared to other related classification ensemble methods (SMOTE-Boost, SMOTE-Bagging, SMOTE-random subspace) on twenty KEEL repository imbalanced datasets (binary dataset not multi-class) which selected randomly from different ratios by implementing Java-based WEKA and STATISTICA software. SMOTE implemented for training data by values of N=100, 200, 300, and 400. Kappa-Error diagram is plotted to analysis the behavior of ensemble methods. The experimental results clarify the validness of proposed ensemble classifier.

Original languageEnglish
Pages (from-to)23-33
Number of pages11
JournalJournal of Theoretical and Applied Information Technology
Volume72
Issue number1
Publication statusPublished - 2015

Fingerprint

Oversampling
Ensemble Methods
Sampling
Classifiers
Ensemble Classifier
Bagging
Subspace
Classifier
Cohen's kappa
Resampling Methods
Experimental Results
Multi-class
Boosting
Class
Repository
Java
Rotating
Ensemble
Diagram
Binary

Keywords

  • Bagging
  • Boosting
  • Random Subspace
  • Rotation Forest
  • SMOTE

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

New approach with ensemble method to address class imbalance problem. / Fattahi, Seyyedali; Othman, Zalinda; Ali Othman, Zulaiha.

In: Journal of Theoretical and Applied Information Technology, Vol. 72, No. 1, 2015, p. 23-33.

Research output: Contribution to journalArticle

@article{13048584e48e447eb9f85f6d872e6bf6,
title = "New approach with ensemble method to address class imbalance problem",
abstract = "An attractive research in recent years is solving class imbalance problem in imbalanced dataset. The class is imbalanced when the number of one class (majority) is more than another one (minority). The classification of this imbalanced class causes imbalanced distribution and poor predictive classification accuracy. This paper introduces a new ensemble –based method for imbalanced data set classification using Synthetic Minority Over-sampling Technique (SMOTE) and Rotation Forest algorithm to address class imbalance problem. Rotation Forest applied as ensemble classifier combines with well-known re-sampling method (SMOTE). It constructs classifiers with obtaining features by rotating subspaces of the original dataset. The advantages of Rotation Forest rather than other ensemble methods (Boosting, Bagging, Random Subspace) is that same information held as original data sets and no information lost in data sets which used to construct classifiers. Experimental results reveal the effectiveness of SMOTE and Rotation Forest performance at data level in overall accuracy, Cohen’s kappa Coefficient, False Negative rate, AUC, and RMSE compared to other related classification ensemble methods (SMOTE-Boost, SMOTE-Bagging, SMOTE-random subspace) on twenty KEEL repository imbalanced datasets (binary dataset not multi-class) which selected randomly from different ratios by implementing Java-based WEKA and STATISTICA software. SMOTE implemented for training data by values of N=100, 200, 300, and 400. Kappa-Error diagram is plotted to analysis the behavior of ensemble methods. The experimental results clarify the validness of proposed ensemble classifier.",
keywords = "Bagging, Boosting, Random Subspace, Rotation Forest, SMOTE",
author = "Seyyedali Fattahi and Zalinda Othman and {Ali Othman}, Zulaiha",
year = "2015",
language = "English",
volume = "72",
pages = "23--33",
journal = "Journal of Theoretical and Applied Information Technology",
issn = "1992-8645",
publisher = "Asian Research Publishing Network (ARPN)",
number = "1",

}

TY - JOUR

T1 - New approach with ensemble method to address class imbalance problem

AU - Fattahi, Seyyedali

AU - Othman, Zalinda

AU - Ali Othman, Zulaiha

PY - 2015

Y1 - 2015

N2 - An attractive research in recent years is solving class imbalance problem in imbalanced dataset. The class is imbalanced when the number of one class (majority) is more than another one (minority). The classification of this imbalanced class causes imbalanced distribution and poor predictive classification accuracy. This paper introduces a new ensemble –based method for imbalanced data set classification using Synthetic Minority Over-sampling Technique (SMOTE) and Rotation Forest algorithm to address class imbalance problem. Rotation Forest applied as ensemble classifier combines with well-known re-sampling method (SMOTE). It constructs classifiers with obtaining features by rotating subspaces of the original dataset. The advantages of Rotation Forest rather than other ensemble methods (Boosting, Bagging, Random Subspace) is that same information held as original data sets and no information lost in data sets which used to construct classifiers. Experimental results reveal the effectiveness of SMOTE and Rotation Forest performance at data level in overall accuracy, Cohen’s kappa Coefficient, False Negative rate, AUC, and RMSE compared to other related classification ensemble methods (SMOTE-Boost, SMOTE-Bagging, SMOTE-random subspace) on twenty KEEL repository imbalanced datasets (binary dataset not multi-class) which selected randomly from different ratios by implementing Java-based WEKA and STATISTICA software. SMOTE implemented for training data by values of N=100, 200, 300, and 400. Kappa-Error diagram is plotted to analysis the behavior of ensemble methods. The experimental results clarify the validness of proposed ensemble classifier.

AB - An attractive research in recent years is solving class imbalance problem in imbalanced dataset. The class is imbalanced when the number of one class (majority) is more than another one (minority). The classification of this imbalanced class causes imbalanced distribution and poor predictive classification accuracy. This paper introduces a new ensemble –based method for imbalanced data set classification using Synthetic Minority Over-sampling Technique (SMOTE) and Rotation Forest algorithm to address class imbalance problem. Rotation Forest applied as ensemble classifier combines with well-known re-sampling method (SMOTE). It constructs classifiers with obtaining features by rotating subspaces of the original dataset. The advantages of Rotation Forest rather than other ensemble methods (Boosting, Bagging, Random Subspace) is that same information held as original data sets and no information lost in data sets which used to construct classifiers. Experimental results reveal the effectiveness of SMOTE and Rotation Forest performance at data level in overall accuracy, Cohen’s kappa Coefficient, False Negative rate, AUC, and RMSE compared to other related classification ensemble methods (SMOTE-Boost, SMOTE-Bagging, SMOTE-random subspace) on twenty KEEL repository imbalanced datasets (binary dataset not multi-class) which selected randomly from different ratios by implementing Java-based WEKA and STATISTICA software. SMOTE implemented for training data by values of N=100, 200, 300, and 400. Kappa-Error diagram is plotted to analysis the behavior of ensemble methods. The experimental results clarify the validness of proposed ensemble classifier.

KW - Bagging

KW - Boosting

KW - Random Subspace

KW - Rotation Forest

KW - SMOTE

UR - http://www.scopus.com/inward/record.url?scp=84922693857&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84922693857&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84922693857

VL - 72

SP - 23

EP - 33

JO - Journal of Theoretical and Applied Information Technology

JF - Journal of Theoretical and Applied Information Technology

SN - 1992-8645

IS - 1

ER -