Features discovery for Web classification using Support Vector Machine

M. S. Othman, L. M. Yusuf, J. Salim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

The ever fast-expanding web information resources pose a big challenge to internet users seeking the most relevant, latest and quality information. The sheer vast amount of web information has resulted in restructuring of the resources. Thus, an appropriate web classification method needs to be established in order for quality web information to be accessed. This paper intends to discuss the web document features that classify the web information resources. Six web document features have been identified which are text, meta tag and title (A), title and text (B), title (C), meta tag and title (D), meta tag (E) and text (F). The Support Vector Machine (SVM) method is used to classify the web document while four types of kernels namely: Radial Basis Function (RBF), linear, polynomial and sigmoid kernels was applied to test the accuracy of the classification. The studies show that the text, meta tag and title (A) features is the best features for classification of web document that employs the four kernels followed by the features on title and text (B) as well as the features on meta tag and title (C). The studies also found that the linear kernel is the best kernel in classifying the web document compared to the RBF, polynomial and sigmoid kernel.

Original languageEnglish
Title of host publicationProceedings - 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010
Pages36-40
Number of pages5
DOIs
Publication statusPublished - 2010
Event2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010 - Kuala Lumpur
Duration: 22 Jun 201023 Jun 2010

Other

Other2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010
CityKuala Lumpur
Period22/6/1023/6/10

Fingerprint

Support vector machines
Polynomials
Internet

Keywords

  • Support Vector Machine (SVM)
  • Web classification
  • Web document

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Information Systems

Cite this

Othman, M. S., Yusuf, L. M., & Salim, J. (2010). Features discovery for Web classification using Support Vector Machine. In Proceedings - 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010 (pp. 36-40). [5566043] https://doi.org/10.1109/ICICCI.2010.16

Features discovery for Web classification using Support Vector Machine. / Othman, M. S.; Yusuf, L. M.; Salim, J.

Proceedings - 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010. 2010. p. 36-40 5566043.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Othman, MS, Yusuf, LM & Salim, J 2010, Features discovery for Web classification using Support Vector Machine. in Proceedings - 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010., 5566043, pp. 36-40, 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010, Kuala Lumpur, 22/6/10. https://doi.org/10.1109/ICICCI.2010.16
Othman MS, Yusuf LM, Salim J. Features discovery for Web classification using Support Vector Machine. In Proceedings - 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010. 2010. p. 36-40. 5566043 https://doi.org/10.1109/ICICCI.2010.16
Othman, M. S. ; Yusuf, L. M. ; Salim, J. / Features discovery for Web classification using Support Vector Machine. Proceedings - 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010. 2010. pp. 36-40
@inproceedings{93334653764a4b8d924c96d850b3abf6,
title = "Features discovery for Web classification using Support Vector Machine",
abstract = "The ever fast-expanding web information resources pose a big challenge to internet users seeking the most relevant, latest and quality information. The sheer vast amount of web information has resulted in restructuring of the resources. Thus, an appropriate web classification method needs to be established in order for quality web information to be accessed. This paper intends to discuss the web document features that classify the web information resources. Six web document features have been identified which are text, meta tag and title (A), title and text (B), title (C), meta tag and title (D), meta tag (E) and text (F). The Support Vector Machine (SVM) method is used to classify the web document while four types of kernels namely: Radial Basis Function (RBF), linear, polynomial and sigmoid kernels was applied to test the accuracy of the classification. The studies show that the text, meta tag and title (A) features is the best features for classification of web document that employs the four kernels followed by the features on title and text (B) as well as the features on meta tag and title (C). The studies also found that the linear kernel is the best kernel in classifying the web document compared to the RBF, polynomial and sigmoid kernel.",
keywords = "Support Vector Machine (SVM), Web classification, Web document",
author = "Othman, {M. S.} and Yusuf, {L. M.} and J. Salim",
year = "2010",
doi = "10.1109/ICICCI.2010.16",
language = "English",
isbn = "9780769540146",
pages = "36--40",
booktitle = "Proceedings - 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010",

}

TY - GEN

T1 - Features discovery for Web classification using Support Vector Machine

AU - Othman, M. S.

AU - Yusuf, L. M.

AU - Salim, J.

PY - 2010

Y1 - 2010

N2 - The ever fast-expanding web information resources pose a big challenge to internet users seeking the most relevant, latest and quality information. The sheer vast amount of web information has resulted in restructuring of the resources. Thus, an appropriate web classification method needs to be established in order for quality web information to be accessed. This paper intends to discuss the web document features that classify the web information resources. Six web document features have been identified which are text, meta tag and title (A), title and text (B), title (C), meta tag and title (D), meta tag (E) and text (F). The Support Vector Machine (SVM) method is used to classify the web document while four types of kernels namely: Radial Basis Function (RBF), linear, polynomial and sigmoid kernels was applied to test the accuracy of the classification. The studies show that the text, meta tag and title (A) features is the best features for classification of web document that employs the four kernels followed by the features on title and text (B) as well as the features on meta tag and title (C). The studies also found that the linear kernel is the best kernel in classifying the web document compared to the RBF, polynomial and sigmoid kernel.

AB - The ever fast-expanding web information resources pose a big challenge to internet users seeking the most relevant, latest and quality information. The sheer vast amount of web information has resulted in restructuring of the resources. Thus, an appropriate web classification method needs to be established in order for quality web information to be accessed. This paper intends to discuss the web document features that classify the web information resources. Six web document features have been identified which are text, meta tag and title (A), title and text (B), title (C), meta tag and title (D), meta tag (E) and text (F). The Support Vector Machine (SVM) method is used to classify the web document while four types of kernels namely: Radial Basis Function (RBF), linear, polynomial and sigmoid kernels was applied to test the accuracy of the classification. The studies show that the text, meta tag and title (A) features is the best features for classification of web document that employs the four kernels followed by the features on title and text (B) as well as the features on meta tag and title (C). The studies also found that the linear kernel is the best kernel in classifying the web document compared to the RBF, polynomial and sigmoid kernel.

KW - Support Vector Machine (SVM)

KW - Web classification

KW - Web document

UR - http://www.scopus.com/inward/record.url?scp=77958473955&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958473955&partnerID=8YFLogxK

U2 - 10.1109/ICICCI.2010.16

DO - 10.1109/ICICCI.2010.16

M3 - Conference contribution

AN - SCOPUS:77958473955

SN - 9780769540146

SP - 36

EP - 40

BT - Proceedings - 2010 International Conference on Intelligent Computing and Cognitive Informatics, ICICCI 2010

ER -