Feature selection in web ner using genetic algorithm approach

Mohammed Moath Abdulghani, Sabrina Tiun

Research output: Contribution to journalArticle

Abstract

Named Entity Recognition (NER) is the field of recognizing nouns such as names of people, corporations, places and dates. The process of extracting NEs is mainly relying on supervised machine learning techniques. Hence, utilizing proper features have a significant impact on the performance of recognizing the entities. Several approaches have been proposed for reducing the feature dimensionality of NER. However, these approaches have concentrated on the traditional features or so-called textual features. Recently, extracting information form web pages has caught the researchers’ attentions regarding the valuable information that lies on such pages. Extracting NEs from web pages has brought tremendous kinds of features which are inspired from the web nature. Apparently, combining the traditional features with the web features would expand the feature space. Therefore, there is an essential demand for accommodating feature selection for the process of extracting NEs from web pages. This paper proposes a feature selection approach based on Genetic Algorithm for extracting NEs from web pages. The dataset was collected from business web pages. Whilst, the feature set consists of text features such as n-gram and web features such as block position and font type. Finally, a SVM classifier was used to classify the NEs. Results shown that Genetic Algorithm has the ability to identify the most accurate features.

Original languageEnglish
Pages (from-to)552-560
Number of pages9
JournalJournal of Theoretical and Applied Information Technology
Volume93
Issue number2
Publication statusPublished - 30 Nov 2016
Externally publishedYes

Fingerprint

Feature Selection
Feature extraction
Websites
Genetic algorithms
Genetic Algorithm
Named Entity Recognition
Learning systems
Industry
Classifiers
N-gram
Supervised Learning
Feature Space
Date
Expand
Dimensionality
Machine Learning
Classify
Classifier

Keywords

  • Feature selection
  • Genetic algorithm
  • Named entity recognition
  • Support vector machine
  • Web pages

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Feature selection in web ner using genetic algorithm approach. / Abdulghani, Mohammed Moath; Tiun, Sabrina.

In: Journal of Theoretical and Applied Information Technology, Vol. 93, No. 2, 30.11.2016, p. 552-560.

Research output: Contribution to journalArticle

@article{5702e548377a40c0861fa6cef59a9a24,
title = "Feature selection in web ner using genetic algorithm approach",
abstract = "Named Entity Recognition (NER) is the field of recognizing nouns such as names of people, corporations, places and dates. The process of extracting NEs is mainly relying on supervised machine learning techniques. Hence, utilizing proper features have a significant impact on the performance of recognizing the entities. Several approaches have been proposed for reducing the feature dimensionality of NER. However, these approaches have concentrated on the traditional features or so-called textual features. Recently, extracting information form web pages has caught the researchers’ attentions regarding the valuable information that lies on such pages. Extracting NEs from web pages has brought tremendous kinds of features which are inspired from the web nature. Apparently, combining the traditional features with the web features would expand the feature space. Therefore, there is an essential demand for accommodating feature selection for the process of extracting NEs from web pages. This paper proposes a feature selection approach based on Genetic Algorithm for extracting NEs from web pages. The dataset was collected from business web pages. Whilst, the feature set consists of text features such as n-gram and web features such as block position and font type. Finally, a SVM classifier was used to classify the NEs. Results shown that Genetic Algorithm has the ability to identify the most accurate features.",
keywords = "Feature selection, Genetic algorithm, Named entity recognition, Support vector machine, Web pages",
author = "Abdulghani, {Mohammed Moath} and Sabrina Tiun",
year = "2016",
month = "11",
day = "30",
language = "English",
volume = "93",
pages = "552--560",
journal = "Journal of Theoretical and Applied Information Technology",
issn = "1992-8645",
publisher = "Asian Research Publishing Network (ARPN)",
number = "2",

}

TY - JOUR

T1 - Feature selection in web ner using genetic algorithm approach

AU - Abdulghani, Mohammed Moath

AU - Tiun, Sabrina

PY - 2016/11/30

Y1 - 2016/11/30

N2 - Named Entity Recognition (NER) is the field of recognizing nouns such as names of people, corporations, places and dates. The process of extracting NEs is mainly relying on supervised machine learning techniques. Hence, utilizing proper features have a significant impact on the performance of recognizing the entities. Several approaches have been proposed for reducing the feature dimensionality of NER. However, these approaches have concentrated on the traditional features or so-called textual features. Recently, extracting information form web pages has caught the researchers’ attentions regarding the valuable information that lies on such pages. Extracting NEs from web pages has brought tremendous kinds of features which are inspired from the web nature. Apparently, combining the traditional features with the web features would expand the feature space. Therefore, there is an essential demand for accommodating feature selection for the process of extracting NEs from web pages. This paper proposes a feature selection approach based on Genetic Algorithm for extracting NEs from web pages. The dataset was collected from business web pages. Whilst, the feature set consists of text features such as n-gram and web features such as block position and font type. Finally, a SVM classifier was used to classify the NEs. Results shown that Genetic Algorithm has the ability to identify the most accurate features.

AB - Named Entity Recognition (NER) is the field of recognizing nouns such as names of people, corporations, places and dates. The process of extracting NEs is mainly relying on supervised machine learning techniques. Hence, utilizing proper features have a significant impact on the performance of recognizing the entities. Several approaches have been proposed for reducing the feature dimensionality of NER. However, these approaches have concentrated on the traditional features or so-called textual features. Recently, extracting information form web pages has caught the researchers’ attentions regarding the valuable information that lies on such pages. Extracting NEs from web pages has brought tremendous kinds of features which are inspired from the web nature. Apparently, combining the traditional features with the web features would expand the feature space. Therefore, there is an essential demand for accommodating feature selection for the process of extracting NEs from web pages. This paper proposes a feature selection approach based on Genetic Algorithm for extracting NEs from web pages. The dataset was collected from business web pages. Whilst, the feature set consists of text features such as n-gram and web features such as block position and font type. Finally, a SVM classifier was used to classify the NEs. Results shown that Genetic Algorithm has the ability to identify the most accurate features.

KW - Feature selection

KW - Genetic algorithm

KW - Named entity recognition

KW - Support vector machine

KW - Web pages

UR - http://www.scopus.com/inward/record.url?scp=85002080458&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85002080458&partnerID=8YFLogxK

M3 - Article

VL - 93

SP - 552

EP - 560

JO - Journal of Theoretical and Applied Information Technology

JF - Journal of Theoretical and Applied Information Technology

SN - 1992-8645

IS - 2

ER -