RFBoost: An improved multi-label boosting algorithm and its application to text categorisation

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

The AdaBoost.MH boosting algorithm is considered to be one of the most accurate algorithms for multi-label classification. AdaBoost.MH works by iteratively building a committee of weak hypotheses of decision stumps. In each round of AdaBoost.MH learning, all features are examined, but only one feature is used to build a new weak hypothesis. This learning mechanism may entail a high degree of computational time complexity, particularly in the case of a large-scale dataset. This paper describes a way to manage the learning complexity and improve the classification performance of AdaBoost.MH. We propose an improved version of AdaBoost.MH, called RFBoost. The weak learning in RFBoost is based on filtering a small fixed number of ranked features in each boosting round rather than using all features, as AdaBoost.MH does. We propose two methods for ranking the features: One Boosting Round and Labeled Latent Dirichlet Allocation (LLDA), a supervised topic model based on Gibbs sampling. Additionally, we investigate the use of LLDA as a feature selection method for reducing the feature space based on the maximal conditional probabilities of words across labels. Our experimental results on eight well-known benchmarks for multi-label text categorisation show that RFBoost is significantly more efficient and effective than the baseline algorithms. Moreover, the LLDA-based feature ranking yields the best performance for RFBoost.

Original languageEnglish
JournalKnowledge-Based Systems
DOIs
Publication statusAccepted/In press - 14 Jan 2015

Fingerprint

Adaptive boosting
Labels
Text categorization
Boosting
Feature extraction
Sampling
Dirichlet
Ranking

Keywords

  • AdaBoost.MH
  • Boosting
  • Labeled Latent Dirichlet Allocation
  • Multi-label classification
  • RFBoost
  • Text categorisation

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Management Information Systems
  • Information Systems and Management

Cite this

@article{c783be14d0c24663ad33f5e645a9fbc5,
title = "RFBoost: An improved multi-label boosting algorithm and its application to text categorisation",
abstract = "The AdaBoost.MH boosting algorithm is considered to be one of the most accurate algorithms for multi-label classification. AdaBoost.MH works by iteratively building a committee of weak hypotheses of decision stumps. In each round of AdaBoost.MH learning, all features are examined, but only one feature is used to build a new weak hypothesis. This learning mechanism may entail a high degree of computational time complexity, particularly in the case of a large-scale dataset. This paper describes a way to manage the learning complexity and improve the classification performance of AdaBoost.MH. We propose an improved version of AdaBoost.MH, called RFBoost. The weak learning in RFBoost is based on filtering a small fixed number of ranked features in each boosting round rather than using all features, as AdaBoost.MH does. We propose two methods for ranking the features: One Boosting Round and Labeled Latent Dirichlet Allocation (LLDA), a supervised topic model based on Gibbs sampling. Additionally, we investigate the use of LLDA as a feature selection method for reducing the feature space based on the maximal conditional probabilities of words across labels. Our experimental results on eight well-known benchmarks for multi-label text categorisation show that RFBoost is significantly more efficient and effective than the baseline algorithms. Moreover, the LLDA-based feature ranking yields the best performance for RFBoost.",
keywords = "AdaBoost.MH, Boosting, Labeled Latent Dirichlet Allocation, Multi-label classification, RFBoost, Text categorisation",
author = "Bassam Al-Salemi and {Mohd Noah}, {Shahrul Azman} and {Ab Aziz}, {Mohd Juzaiddin}",
year = "2015",
month = "1",
day = "14",
doi = "10.1016/j.knosys.2016.03.029",
language = "English",
journal = "Knowledge-Based Systems",
issn = "0950-7051",
publisher = "Elsevier",

}

TY - JOUR

T1 - RFBoost

T2 - An improved multi-label boosting algorithm and its application to text categorisation

AU - Al-Salemi, Bassam

AU - Mohd Noah, Shahrul Azman

AU - Ab Aziz, Mohd Juzaiddin

PY - 2015/1/14

Y1 - 2015/1/14

N2 - The AdaBoost.MH boosting algorithm is considered to be one of the most accurate algorithms for multi-label classification. AdaBoost.MH works by iteratively building a committee of weak hypotheses of decision stumps. In each round of AdaBoost.MH learning, all features are examined, but only one feature is used to build a new weak hypothesis. This learning mechanism may entail a high degree of computational time complexity, particularly in the case of a large-scale dataset. This paper describes a way to manage the learning complexity and improve the classification performance of AdaBoost.MH. We propose an improved version of AdaBoost.MH, called RFBoost. The weak learning in RFBoost is based on filtering a small fixed number of ranked features in each boosting round rather than using all features, as AdaBoost.MH does. We propose two methods for ranking the features: One Boosting Round and Labeled Latent Dirichlet Allocation (LLDA), a supervised topic model based on Gibbs sampling. Additionally, we investigate the use of LLDA as a feature selection method for reducing the feature space based on the maximal conditional probabilities of words across labels. Our experimental results on eight well-known benchmarks for multi-label text categorisation show that RFBoost is significantly more efficient and effective than the baseline algorithms. Moreover, the LLDA-based feature ranking yields the best performance for RFBoost.

AB - The AdaBoost.MH boosting algorithm is considered to be one of the most accurate algorithms for multi-label classification. AdaBoost.MH works by iteratively building a committee of weak hypotheses of decision stumps. In each round of AdaBoost.MH learning, all features are examined, but only one feature is used to build a new weak hypothesis. This learning mechanism may entail a high degree of computational time complexity, particularly in the case of a large-scale dataset. This paper describes a way to manage the learning complexity and improve the classification performance of AdaBoost.MH. We propose an improved version of AdaBoost.MH, called RFBoost. The weak learning in RFBoost is based on filtering a small fixed number of ranked features in each boosting round rather than using all features, as AdaBoost.MH does. We propose two methods for ranking the features: One Boosting Round and Labeled Latent Dirichlet Allocation (LLDA), a supervised topic model based on Gibbs sampling. Additionally, we investigate the use of LLDA as a feature selection method for reducing the feature space based on the maximal conditional probabilities of words across labels. Our experimental results on eight well-known benchmarks for multi-label text categorisation show that RFBoost is significantly more efficient and effective than the baseline algorithms. Moreover, the LLDA-based feature ranking yields the best performance for RFBoost.

KW - AdaBoost.MH

KW - Boosting

KW - Labeled Latent Dirichlet Allocation

KW - Multi-label classification

KW - RFBoost

KW - Text categorisation

UR - http://www.scopus.com/inward/record.url?scp=84963787944&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963787944&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2016.03.029

DO - 10.1016/j.knosys.2016.03.029

M3 - Article

AN - SCOPUS:84963787944

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

SN - 0950-7051

ER -