Teknik pengukuhan perangkak tumpuan melalui modul pengesan bahasa bagi capaian web bahasa melayu

Translated title of the contribution: Focused crawler enhancement technique with language detection module for malay web retrieval

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Crawler is one of the major components in the architecture of information retrieval systems or search engines. The function is to gather relevant websites aimed to be managed through indexing of links and content. A focused crawler application is designed to select and collect web pages that are relevant to domains or specific topics in the Internet. A good crawler can provide accurate, extensive and relevant information to the user during the process of information seeking using search engines. The inability to detect links and content of Malay language is one of the main issues. Therefore, some of the content of the Malay website cannot be indexed and processed for information retrieval. The lack of research in focused crawler especially for Malay website has motivated this research. The main objective of this study is to identify good crawling strategies for focused crawler in detecting relevant and quality links for Malay website. The focused crawler employed in this research has undergone some modifications resulting from a combination of some crawling strengthening techniques. Findings indicate that the presence of a focused crawler enhancement module provides good results because it can detect Malay language webs accurately. This research is also a turning point for the development of information retrieval for Malay websites as well as enhancing the prominence of Malay language in cyberspace.

Original languageMalay
Pages (from-to)170-185
Number of pages16
JournalGEMA Online Journal of Language Studies
Volume18
Issue number3
DOIs
Publication statusPublished - 1 Aug 2018

Fingerprint

website
information retrieval
language
search engine
indexing
virtual reality
Web Sites
Enhancement
Language
Module
World Wide Web
Internet
lack
Information Retrieval
Search Engine

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Literature and Literary Theory

Cite this

@article{0b31824906d649d985103bcb8e5f6fb8,
title = "Teknik pengukuhan perangkak tumpuan melalui modul pengesan bahasa bagi capaian web bahasa melayu",
abstract = "Crawler is one of the major components in the architecture of information retrieval systems or search engines. The function is to gather relevant websites aimed to be managed through indexing of links and content. A focused crawler application is designed to select and collect web pages that are relevant to domains or specific topics in the Internet. A good crawler can provide accurate, extensive and relevant information to the user during the process of information seeking using search engines. The inability to detect links and content of Malay language is one of the main issues. Therefore, some of the content of the Malay website cannot be indexed and processed for information retrieval. The lack of research in focused crawler especially for Malay website has motivated this research. The main objective of this study is to identify good crawling strategies for focused crawler in detecting relevant and quality links for Malay website. The focused crawler employed in this research has undergone some modifications resulting from a combination of some crawling strengthening techniques. Findings indicate that the presence of a focused crawler enhancement module provides good results because it can detect Malay language webs accurately. This research is also a turning point for the development of information retrieval for Malay websites as well as enhancing the prominence of Malay language in cyberspace.",
keywords = "Crawler, Information retrieval, Malay language, Search engine, Web",
author = "Masnizah Mohd and Paizi@Fauzi, {Wan Fariza} and Amri Jasin",
year = "2018",
month = "8",
day = "1",
doi = "10.17576/gema-2018-1803-10",
language = "Malay",
volume = "18",
pages = "170--185",
journal = "GEMA Online Journal of Language Studies",
issn = "1675-8021",
publisher = "Universiti Kebangsaan Malaysia",
number = "3",

}

TY - JOUR

T1 - Teknik pengukuhan perangkak tumpuan melalui modul pengesan bahasa bagi capaian web bahasa melayu

AU - Mohd, Masnizah

AU - Paizi@Fauzi, Wan Fariza

AU - Jasin, Amri

PY - 2018/8/1

Y1 - 2018/8/1

N2 - Crawler is one of the major components in the architecture of information retrieval systems or search engines. The function is to gather relevant websites aimed to be managed through indexing of links and content. A focused crawler application is designed to select and collect web pages that are relevant to domains or specific topics in the Internet. A good crawler can provide accurate, extensive and relevant information to the user during the process of information seeking using search engines. The inability to detect links and content of Malay language is one of the main issues. Therefore, some of the content of the Malay website cannot be indexed and processed for information retrieval. The lack of research in focused crawler especially for Malay website has motivated this research. The main objective of this study is to identify good crawling strategies for focused crawler in detecting relevant and quality links for Malay website. The focused crawler employed in this research has undergone some modifications resulting from a combination of some crawling strengthening techniques. Findings indicate that the presence of a focused crawler enhancement module provides good results because it can detect Malay language webs accurately. This research is also a turning point for the development of information retrieval for Malay websites as well as enhancing the prominence of Malay language in cyberspace.

AB - Crawler is one of the major components in the architecture of information retrieval systems or search engines. The function is to gather relevant websites aimed to be managed through indexing of links and content. A focused crawler application is designed to select and collect web pages that are relevant to domains or specific topics in the Internet. A good crawler can provide accurate, extensive and relevant information to the user during the process of information seeking using search engines. The inability to detect links and content of Malay language is one of the main issues. Therefore, some of the content of the Malay website cannot be indexed and processed for information retrieval. The lack of research in focused crawler especially for Malay website has motivated this research. The main objective of this study is to identify good crawling strategies for focused crawler in detecting relevant and quality links for Malay website. The focused crawler employed in this research has undergone some modifications resulting from a combination of some crawling strengthening techniques. Findings indicate that the presence of a focused crawler enhancement module provides good results because it can detect Malay language webs accurately. This research is also a turning point for the development of information retrieval for Malay websites as well as enhancing the prominence of Malay language in cyberspace.

KW - Crawler

KW - Information retrieval

KW - Malay language

KW - Search engine

KW - Web

UR - http://www.scopus.com/inward/record.url?scp=85052751056&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052751056&partnerID=8YFLogxK

U2 - 10.17576/gema-2018-1803-10

DO - 10.17576/gema-2018-1803-10

M3 - Article

AN - SCOPUS:85052751056

VL - 18

SP - 170

EP - 185

JO - GEMA Online Journal of Language Studies

JF - GEMA Online Journal of Language Studies

SN - 1675-8021

IS - 3

ER -