Feature selection for multi-label document based on wrapper approach through class association rules

Roiss Alhutaish, Nazlia Omar

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Each document in a multi-label classification is connected to a subset of labels. These documents usually include a big number of features, which can hamper the performance of learning algorithms. Therefore, feature selection is helpful in isolating the redundant and irrelevant elements that can hold the performance back. The current study proposes a Naive Bayesian (NB) multilabel classification algorithm by incorporating a wrapper approach for the strategy of feature selection aiming at determining the best minimum confidence threshold. This paper also suggests transforming the multi-label documents prior to utilizing the standard algorithm of feature selection. In such a process, the document was copied into labels that belonged to by adopting all the assigned characteristics for each label. Then, this study conducted an evaluation of seven minimum confidence thresholds. Additionally, Class Association Rules (CARs) represents the wrapper approach for this evaluation. The experiments carried out with benchmark datasets revealed that the Naïve Bayes Multi-label (NBML) classifier with business dataset scored an average precision of 87.9% upon using a 0.1 % of minimum confidence threshold.

Original languageEnglish
Pages (from-to)642-649
Number of pages8
JournalInternational Journal on Advanced Science, Engineering and Information Technology
Volume7
Issue number2
DOIs
Publication statusPublished - 2017

Fingerprint

Association rules
Feature extraction
Labels
Benchmarking
learning
Learning
Learning algorithms
Classifiers
Datasets
Industry
Experiments

Keywords

  • Class association rules
  • Multi-label classification
  • Naive Bayesian
  • Wrapper approach

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Computer Science(all)
  • Engineering(all)

Cite this

@article{52a1a3d03479411596587a7f80b9796f,
title = "Feature selection for multi-label document based on wrapper approach through class association rules",
abstract = "Each document in a multi-label classification is connected to a subset of labels. These documents usually include a big number of features, which can hamper the performance of learning algorithms. Therefore, feature selection is helpful in isolating the redundant and irrelevant elements that can hold the performance back. The current study proposes a Naive Bayesian (NB) multilabel classification algorithm by incorporating a wrapper approach for the strategy of feature selection aiming at determining the best minimum confidence threshold. This paper also suggests transforming the multi-label documents prior to utilizing the standard algorithm of feature selection. In such a process, the document was copied into labels that belonged to by adopting all the assigned characteristics for each label. Then, this study conducted an evaluation of seven minimum confidence thresholds. Additionally, Class Association Rules (CARs) represents the wrapper approach for this evaluation. The experiments carried out with benchmark datasets revealed that the Na{\"i}ve Bayes Multi-label (NBML) classifier with business dataset scored an average precision of 87.9{\%} upon using a 0.1 {\%} of minimum confidence threshold.",
keywords = "Class association rules, Multi-label classification, Naive Bayesian, Wrapper approach",
author = "Roiss Alhutaish and Nazlia Omar",
year = "2017",
doi = "10.18517/ijaseit.7.2.1040",
language = "English",
volume = "7",
pages = "642--649",
journal = "International Journal on Advanced Science, Engineering and Information Technology",
issn = "2088-5334",
publisher = "INSIGHT - Indonesian Society for Knowledge and Human Development",
number = "2",

}

TY - JOUR

T1 - Feature selection for multi-label document based on wrapper approach through class association rules

AU - Alhutaish, Roiss

AU - Omar, Nazlia

PY - 2017

Y1 - 2017

N2 - Each document in a multi-label classification is connected to a subset of labels. These documents usually include a big number of features, which can hamper the performance of learning algorithms. Therefore, feature selection is helpful in isolating the redundant and irrelevant elements that can hold the performance back. The current study proposes a Naive Bayesian (NB) multilabel classification algorithm by incorporating a wrapper approach for the strategy of feature selection aiming at determining the best minimum confidence threshold. This paper also suggests transforming the multi-label documents prior to utilizing the standard algorithm of feature selection. In such a process, the document was copied into labels that belonged to by adopting all the assigned characteristics for each label. Then, this study conducted an evaluation of seven minimum confidence thresholds. Additionally, Class Association Rules (CARs) represents the wrapper approach for this evaluation. The experiments carried out with benchmark datasets revealed that the Naïve Bayes Multi-label (NBML) classifier with business dataset scored an average precision of 87.9% upon using a 0.1 % of minimum confidence threshold.

AB - Each document in a multi-label classification is connected to a subset of labels. These documents usually include a big number of features, which can hamper the performance of learning algorithms. Therefore, feature selection is helpful in isolating the redundant and irrelevant elements that can hold the performance back. The current study proposes a Naive Bayesian (NB) multilabel classification algorithm by incorporating a wrapper approach for the strategy of feature selection aiming at determining the best minimum confidence threshold. This paper also suggests transforming the multi-label documents prior to utilizing the standard algorithm of feature selection. In such a process, the document was copied into labels that belonged to by adopting all the assigned characteristics for each label. Then, this study conducted an evaluation of seven minimum confidence thresholds. Additionally, Class Association Rules (CARs) represents the wrapper approach for this evaluation. The experiments carried out with benchmark datasets revealed that the Naïve Bayes Multi-label (NBML) classifier with business dataset scored an average precision of 87.9% upon using a 0.1 % of minimum confidence threshold.

KW - Class association rules

KW - Multi-label classification

KW - Naive Bayesian

KW - Wrapper approach

UR - http://www.scopus.com/inward/record.url?scp=85018520115&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018520115&partnerID=8YFLogxK

U2 - 10.18517/ijaseit.7.2.1040

DO - 10.18517/ijaseit.7.2.1040

M3 - Article

AN - SCOPUS:85018520115

VL - 7

SP - 642

EP - 649

JO - International Journal on Advanced Science, Engineering and Information Technology

JF - International Journal on Advanced Science, Engineering and Information Technology

SN - 2088-5334

IS - 2

ER -