Web classification using extraction and machine learning techniques

L. M. Yusuf, M. S. Othman, J. Salim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Internet services that has become easier to access has contributed to the drastic increase in the number of web pages. This phenomenon has created new difficulties to internet users about retrieving the latest, relevant and excellent web information. This is due to the enormous contents of web information that have caused problems in the restructuring of web information. Thus, in order to ensure the latest, quality and relevant web information is optimally retrievable, it is necessary to undertake the task of web document classification. This paper discusses the result of classifying web document using the extraction and machine learning techniques. Four types of kernels namely the Radial Basis Function (RBF), linear, polynomial and sigmoid are applied to test the accuracy of the classification. The results show that the accuracy percentage of web document classification will increase whenever more web document is used. The results also show that linear kernel technique is the best in web document classification compared to RBF, polynomial and sigmoid.

Original languageEnglish
Title of host publicationProceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10
Pages765-770
Number of pages6
Volume2
DOIs
Publication statusPublished - 2010
Event2010 International Symposium on Information Technology, ITSim'10 - Kuala Lumpur
Duration: 15 Jun 201017 Jun 2010

Other

Other2010 International Symposium on Information Technology, ITSim'10
CityKuala Lumpur
Period15/6/1017/6/10

Fingerprint

Learning systems
Polynomials
Internet
Websites

Keywords

  • Extraction
  • Machine learning
  • Web classification
  • Web document

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Yusuf, L. M., Othman, M. S., & Salim, J. (2010). Web classification using extraction and machine learning techniques. In Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10 (Vol. 2, pp. 765-770). [5561603] https://doi.org/10.1109/ITSIM.2010.5561603

Web classification using extraction and machine learning techniques. / Yusuf, L. M.; Othman, M. S.; Salim, J.

Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10. Vol. 2 2010. p. 765-770 5561603.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yusuf, LM, Othman, MS & Salim, J 2010, Web classification using extraction and machine learning techniques. in Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10. vol. 2, 5561603, pp. 765-770, 2010 International Symposium on Information Technology, ITSim'10, Kuala Lumpur, 15/6/10. https://doi.org/10.1109/ITSIM.2010.5561603
Yusuf LM, Othman MS, Salim J. Web classification using extraction and machine learning techniques. In Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10. Vol. 2. 2010. p. 765-770. 5561603 https://doi.org/10.1109/ITSIM.2010.5561603
Yusuf, L. M. ; Othman, M. S. ; Salim, J. / Web classification using extraction and machine learning techniques. Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10. Vol. 2 2010. pp. 765-770
@inproceedings{6ba5b3d917004fceb4d233b3f4d63af9,
title = "Web classification using extraction and machine learning techniques",
abstract = "Internet services that has become easier to access has contributed to the drastic increase in the number of web pages. This phenomenon has created new difficulties to internet users about retrieving the latest, relevant and excellent web information. This is due to the enormous contents of web information that have caused problems in the restructuring of web information. Thus, in order to ensure the latest, quality and relevant web information is optimally retrievable, it is necessary to undertake the task of web document classification. This paper discusses the result of classifying web document using the extraction and machine learning techniques. Four types of kernels namely the Radial Basis Function (RBF), linear, polynomial and sigmoid are applied to test the accuracy of the classification. The results show that the accuracy percentage of web document classification will increase whenever more web document is used. The results also show that linear kernel technique is the best in web document classification compared to RBF, polynomial and sigmoid.",
keywords = "Extraction, Machine learning, Web classification, Web document",
author = "Yusuf, {L. M.} and Othman, {M. S.} and J. Salim",
year = "2010",
doi = "10.1109/ITSIM.2010.5561603",
language = "English",
isbn = "9781424467181",
volume = "2",
pages = "765--770",
booktitle = "Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10",

}

TY - GEN

T1 - Web classification using extraction and machine learning techniques

AU - Yusuf, L. M.

AU - Othman, M. S.

AU - Salim, J.

PY - 2010

Y1 - 2010

N2 - Internet services that has become easier to access has contributed to the drastic increase in the number of web pages. This phenomenon has created new difficulties to internet users about retrieving the latest, relevant and excellent web information. This is due to the enormous contents of web information that have caused problems in the restructuring of web information. Thus, in order to ensure the latest, quality and relevant web information is optimally retrievable, it is necessary to undertake the task of web document classification. This paper discusses the result of classifying web document using the extraction and machine learning techniques. Four types of kernels namely the Radial Basis Function (RBF), linear, polynomial and sigmoid are applied to test the accuracy of the classification. The results show that the accuracy percentage of web document classification will increase whenever more web document is used. The results also show that linear kernel technique is the best in web document classification compared to RBF, polynomial and sigmoid.

AB - Internet services that has become easier to access has contributed to the drastic increase in the number of web pages. This phenomenon has created new difficulties to internet users about retrieving the latest, relevant and excellent web information. This is due to the enormous contents of web information that have caused problems in the restructuring of web information. Thus, in order to ensure the latest, quality and relevant web information is optimally retrievable, it is necessary to undertake the task of web document classification. This paper discusses the result of classifying web document using the extraction and machine learning techniques. Four types of kernels namely the Radial Basis Function (RBF), linear, polynomial and sigmoid are applied to test the accuracy of the classification. The results show that the accuracy percentage of web document classification will increase whenever more web document is used. The results also show that linear kernel technique is the best in web document classification compared to RBF, polynomial and sigmoid.

KW - Extraction

KW - Machine learning

KW - Web classification

KW - Web document

UR - http://www.scopus.com/inward/record.url?scp=78049411307&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78049411307&partnerID=8YFLogxK

U2 - 10.1109/ITSIM.2010.5561603

DO - 10.1109/ITSIM.2010.5561603

M3 - Conference contribution

SN - 9781424467181

VL - 2

SP - 765

EP - 770

BT - Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim'10

ER -