Feature selection based on supervised topic modeling for boosting-based multi-label text categorization

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The text representation model Bag-Of-Words is a simple and typical model which uses the single words as elements to represent the texts in the feature space. However, using the single words as features will produce a high dimensional feature space, which result in the learning computational cost, particularly for ensemble learning algorithms, such as the boosting algorithm AdaBoost.MH. The straightforward solution of this matter can be managed by using a feature selection method capable of reducing the features space effectively. This work describes how to utilize the supervised topic model Labeled Latent Dirichlet Allocation for feature selection, as well accelerating AdaBoost.MH learning for multi-label text categorization. The experimental results on three benchmarks demonstrated that using Labeled Latent Dirichlet Allocation for feature selection improves and accelerates AdaBoost.MH and exceeds the performance of three existing methods.

Original languageEnglish
Title of host publicationProceedings of the 2017 6th International Conference on Electrical Engineering and Informatics
Subtitle of host publicationSustainable Society Through Digital Innovation, ICEEI 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
Volume2017-November
ISBN (Electronic)9781538604755
DOIs
Publication statusPublished - 9 Mar 2018
Event6th International Conference on Electrical Engineering and Informatics, ICEEI 2017 - Langkawi, Malaysia
Duration: 25 Nov 201727 Nov 2017

Other

Other6th International Conference on Electrical Engineering and Informatics, ICEEI 2017
CountryMalaysia
CityLangkawi
Period25/11/1727/11/17

Fingerprint

Text Categorization
Adaptive boosting
AdaBoost
Boosting
Feature Space
Feature Selection
Feature extraction
Labels
Learning
Dirichlet
Modeling
Ensemble Learning
Benchmarking
Learning algorithms
Accelerate
Computational Cost
Learning Algorithm
Exceed
High-dimensional
Model

Keywords

  • AdaBoostMH
  • feature selection
  • Latent Dirichlet Allocation
  • supervised topic modeling
  • text categorization

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Optimization
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Software
  • Electrical and Electronic Engineering
  • Health Informatics

Cite this

Al-Salemi, B., Ayob, M., Mohd Noah, S. A., & Ab Aziz, M. J. (2018). Feature selection based on supervised topic modeling for boosting-based multi-label text categorization. In Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017 (Vol. 2017-November, pp. 1-6). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICEEI.2017.8312411

Feature selection based on supervised topic modeling for boosting-based multi-label text categorization. / Al-Salemi, Bassam; Ayob, Masri; Mohd Noah, Shahrul Azman; Ab Aziz, Mohd Juzaiddin.

Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017. Vol. 2017-November Institute of Electrical and Electronics Engineers Inc., 2018. p. 1-6.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Al-Salemi, B, Ayob, M, Mohd Noah, SA & Ab Aziz, MJ 2018, Feature selection based on supervised topic modeling for boosting-based multi-label text categorization. in Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017. vol. 2017-November, Institute of Electrical and Electronics Engineers Inc., pp. 1-6, 6th International Conference on Electrical Engineering and Informatics, ICEEI 2017, Langkawi, Malaysia, 25/11/17. https://doi.org/10.1109/ICEEI.2017.8312411
Al-Salemi B, Ayob M, Mohd Noah SA, Ab Aziz MJ. Feature selection based on supervised topic modeling for boosting-based multi-label text categorization. In Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017. Vol. 2017-November. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1-6 https://doi.org/10.1109/ICEEI.2017.8312411
Al-Salemi, Bassam ; Ayob, Masri ; Mohd Noah, Shahrul Azman ; Ab Aziz, Mohd Juzaiddin. / Feature selection based on supervised topic modeling for boosting-based multi-label text categorization. Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics: Sustainable Society Through Digital Innovation, ICEEI 2017. Vol. 2017-November Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1-6
@inproceedings{7f19049314f449058eeceff2c1f6fa7a,
title = "Feature selection based on supervised topic modeling for boosting-based multi-label text categorization",
abstract = "The text representation model Bag-Of-Words is a simple and typical model which uses the single words as elements to represent the texts in the feature space. However, using the single words as features will produce a high dimensional feature space, which result in the learning computational cost, particularly for ensemble learning algorithms, such as the boosting algorithm AdaBoost.MH. The straightforward solution of this matter can be managed by using a feature selection method capable of reducing the features space effectively. This work describes how to utilize the supervised topic model Labeled Latent Dirichlet Allocation for feature selection, as well accelerating AdaBoost.MH learning for multi-label text categorization. The experimental results on three benchmarks demonstrated that using Labeled Latent Dirichlet Allocation for feature selection improves and accelerates AdaBoost.MH and exceeds the performance of three existing methods.",
keywords = "AdaBoostMH, feature selection, Latent Dirichlet Allocation, supervised topic modeling, text categorization",
author = "Bassam Al-Salemi and Masri Ayob and {Mohd Noah}, {Shahrul Azman} and {Ab Aziz}, {Mohd Juzaiddin}",
year = "2018",
month = "3",
day = "9",
doi = "10.1109/ICEEI.2017.8312411",
language = "English",
volume = "2017-November",
pages = "1--6",
booktitle = "Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Feature selection based on supervised topic modeling for boosting-based multi-label text categorization

AU - Al-Salemi, Bassam

AU - Ayob, Masri

AU - Mohd Noah, Shahrul Azman

AU - Ab Aziz, Mohd Juzaiddin

PY - 2018/3/9

Y1 - 2018/3/9

N2 - The text representation model Bag-Of-Words is a simple and typical model which uses the single words as elements to represent the texts in the feature space. However, using the single words as features will produce a high dimensional feature space, which result in the learning computational cost, particularly for ensemble learning algorithms, such as the boosting algorithm AdaBoost.MH. The straightforward solution of this matter can be managed by using a feature selection method capable of reducing the features space effectively. This work describes how to utilize the supervised topic model Labeled Latent Dirichlet Allocation for feature selection, as well accelerating AdaBoost.MH learning for multi-label text categorization. The experimental results on three benchmarks demonstrated that using Labeled Latent Dirichlet Allocation for feature selection improves and accelerates AdaBoost.MH and exceeds the performance of three existing methods.

AB - The text representation model Bag-Of-Words is a simple and typical model which uses the single words as elements to represent the texts in the feature space. However, using the single words as features will produce a high dimensional feature space, which result in the learning computational cost, particularly for ensemble learning algorithms, such as the boosting algorithm AdaBoost.MH. The straightforward solution of this matter can be managed by using a feature selection method capable of reducing the features space effectively. This work describes how to utilize the supervised topic model Labeled Latent Dirichlet Allocation for feature selection, as well accelerating AdaBoost.MH learning for multi-label text categorization. The experimental results on three benchmarks demonstrated that using Labeled Latent Dirichlet Allocation for feature selection improves and accelerates AdaBoost.MH and exceeds the performance of three existing methods.

KW - AdaBoostMH

KW - feature selection

KW - Latent Dirichlet Allocation

KW - supervised topic modeling

KW - text categorization

UR - http://www.scopus.com/inward/record.url?scp=85050775169&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050775169&partnerID=8YFLogxK

U2 - 10.1109/ICEEI.2017.8312411

DO - 10.1109/ICEEI.2017.8312411

M3 - Conference contribution

AN - SCOPUS:85050775169

VL - 2017-November

SP - 1

EP - 6

BT - Proceedings of the 2017 6th International Conference on Electrical Engineering and Informatics

PB - Institute of Electrical and Electronics Engineers Inc.

ER -