Ontological based webpage classification

Wui Kheun Ong, Jer Lang Hong, Wan Fariza Paizi@Fauzi, Ee Xion Tan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Current classification techniques use word matching and clustering techniques to classify webpages. These techniques use ad hoc approach of checking and matching the entire keywords in a webpage for classification. These methods are efficient but not without problems. In general, they suffer from the following problems 1) As they use brute force matching for the entire document, they tend to be slow in their operation 2) words in a document may have similar meaning but they may not be identical in their spelling 3) current techniques fail to match and identify phrases efficiently 4) they also fail to consider for word disambiguation. In this paper, we propose a novel and fast ontological-based webpage classification technique to classify a webpage with high accuracy. To speed up our system, we use a segmentation technique that utilizes visual boundary of a region and matches keywords within the region instead of the entire webpage. We also use a fast clustering technique to match keywords and label the page based on the nearest match. Experiment results show that our system is accurate in webpage classification.

Original languageEnglish
Title of host publicationProceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12
Pages224-228
Number of pages5
DOIs
Publication statusPublished - 4 Jul 2012
Externally publishedYes
Event2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12 - Kuala Lumpur
Duration: 13 Mar 201215 Mar 2012

Other

Other2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12
CityKuala Lumpur
Period13/3/1215/3/12

Fingerprint

Labels
Experiments

Keywords

  • Classification
  • Ontology
  • Webpage

ASJC Scopus subject areas

  • Information Systems

Cite this

Ong, W. K., Hong, J. L., Paizi@Fauzi, W. F., & Tan, E. X. (2012). Ontological based webpage classification. In Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12 (pp. 224-228). [6205006] https://doi.org/10.1109/InfRKM.2012.6205006

Ontological based webpage classification. / Ong, Wui Kheun; Hong, Jer Lang; Paizi@Fauzi, Wan Fariza; Tan, Ee Xion.

Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12. 2012. p. 224-228 6205006.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ong, WK, Hong, JL, Paizi@Fauzi, WF & Tan, EX 2012, Ontological based webpage classification. in Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12., 6205006, pp. 224-228, 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12, Kuala Lumpur, 13/3/12. https://doi.org/10.1109/InfRKM.2012.6205006
Ong WK, Hong JL, Paizi@Fauzi WF, Tan EX. Ontological based webpage classification. In Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12. 2012. p. 224-228. 6205006 https://doi.org/10.1109/InfRKM.2012.6205006
Ong, Wui Kheun ; Hong, Jer Lang ; Paizi@Fauzi, Wan Fariza ; Tan, Ee Xion. / Ontological based webpage classification. Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12. 2012. pp. 224-228
@inproceedings{7c4352abc1ec4c2daf27ef491d872182,
title = "Ontological based webpage classification",
abstract = "Current classification techniques use word matching and clustering techniques to classify webpages. These techniques use ad hoc approach of checking and matching the entire keywords in a webpage for classification. These methods are efficient but not without problems. In general, they suffer from the following problems 1) As they use brute force matching for the entire document, they tend to be slow in their operation 2) words in a document may have similar meaning but they may not be identical in their spelling 3) current techniques fail to match and identify phrases efficiently 4) they also fail to consider for word disambiguation. In this paper, we propose a novel and fast ontological-based webpage classification technique to classify a webpage with high accuracy. To speed up our system, we use a segmentation technique that utilizes visual boundary of a region and matches keywords within the region instead of the entire webpage. We also use a fast clustering technique to match keywords and label the page based on the nearest match. Experiment results show that our system is accurate in webpage classification.",
keywords = "Classification, Ontology, Webpage",
author = "Ong, {Wui Kheun} and Hong, {Jer Lang} and Paizi@Fauzi, {Wan Fariza} and Tan, {Ee Xion}",
year = "2012",
month = "7",
day = "4",
doi = "10.1109/InfRKM.2012.6205006",
language = "English",
isbn = "9781467310901",
pages = "224--228",
booktitle = "Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12",

}

TY - GEN

T1 - Ontological based webpage classification

AU - Ong, Wui Kheun

AU - Hong, Jer Lang

AU - Paizi@Fauzi, Wan Fariza

AU - Tan, Ee Xion

PY - 2012/7/4

Y1 - 2012/7/4

N2 - Current classification techniques use word matching and clustering techniques to classify webpages. These techniques use ad hoc approach of checking and matching the entire keywords in a webpage for classification. These methods are efficient but not without problems. In general, they suffer from the following problems 1) As they use brute force matching for the entire document, they tend to be slow in their operation 2) words in a document may have similar meaning but they may not be identical in their spelling 3) current techniques fail to match and identify phrases efficiently 4) they also fail to consider for word disambiguation. In this paper, we propose a novel and fast ontological-based webpage classification technique to classify a webpage with high accuracy. To speed up our system, we use a segmentation technique that utilizes visual boundary of a region and matches keywords within the region instead of the entire webpage. We also use a fast clustering technique to match keywords and label the page based on the nearest match. Experiment results show that our system is accurate in webpage classification.

AB - Current classification techniques use word matching and clustering techniques to classify webpages. These techniques use ad hoc approach of checking and matching the entire keywords in a webpage for classification. These methods are efficient but not without problems. In general, they suffer from the following problems 1) As they use brute force matching for the entire document, they tend to be slow in their operation 2) words in a document may have similar meaning but they may not be identical in their spelling 3) current techniques fail to match and identify phrases efficiently 4) they also fail to consider for word disambiguation. In this paper, we propose a novel and fast ontological-based webpage classification technique to classify a webpage with high accuracy. To speed up our system, we use a segmentation technique that utilizes visual boundary of a region and matches keywords within the region instead of the entire webpage. We also use a fast clustering technique to match keywords and label the page based on the nearest match. Experiment results show that our system is accurate in webpage classification.

KW - Classification

KW - Ontology

KW - Webpage

UR - http://www.scopus.com/inward/record.url?scp=84863084351&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863084351&partnerID=8YFLogxK

U2 - 10.1109/InfRKM.2012.6205006

DO - 10.1109/InfRKM.2012.6205006

M3 - Conference contribution

AN - SCOPUS:84863084351

SN - 9781467310901

SP - 224

EP - 228

BT - Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12

ER -