Penentuan fitur bagi pengekstrakan tajuk berita akhbar bahasa Melayu

Translated title of the contribution: Determining features of news headline in Malay news document

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Headline summarization is one of the automated text summarization techniques that can reduce the problem of information overload in the retrieval system and reduce the user's cognitive burden while searching and selecting relevant documents in large quantities. This study discusses the process on the determination of Malay language system features in the news genre document. Methodology starts with analysis the corpus of Malay news documents. The corpus contains 140 core news items which were selected from the two mainstream news databases in Malaysia which are Berita Harian and Utusan Malaysia. The selection news criteria are from core news categories, sized 50 to 250 words, the years of publication from 2007 to 2012 and news genres from economic, crime, education and sports. Three linguistic experts in Malay produced a headline summary for each news document manually. The experts need to comply with three conditions which are summary extraction, select-word-inorder word selection techniques and word morphological changes. The experimental results show that three characteristics have been identified, first: the first two sentenses are the important sentences, second: the verse that contains the potential acronym definitions is chosen as the most important sentence and third: the size of the summary of the ideal headline is six words. The consideration of this feature allows a summary of the headline that can be generated automatically, just like the process done by human.

Original languageMalay
Pages (from-to)154-167
Number of pages14
JournalGEMA Online Journal of Language Studies
Volume18
Issue number2
DOIs
Publication statusPublished - 1 May 2018

Fingerprint

news
Malaysia
genre
news selection
expert
Headlines
News
Sports
offense
linguistics
Summary
methodology
language
economics
education
Summarization

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Literature and Literary Theory

Cite this

Penentuan fitur bagi pengekstrakan tajuk berita akhbar bahasa Melayu. / Mohd Noah, Shahrul Azman; Mohamad Ali, Nazlena; Hasan, Mohd Sabri.

In: GEMA Online Journal of Language Studies, Vol. 18, No. 2, 01.05.2018, p. 154-167.

Research output: Contribution to journalArticle

@article{b78e92aa01df435c813931f05bee2f0b,
title = "Penentuan fitur bagi pengekstrakan tajuk berita akhbar bahasa Melayu",
abstract = "Headline summarization is one of the automated text summarization techniques that can reduce the problem of information overload in the retrieval system and reduce the user's cognitive burden while searching and selecting relevant documents in large quantities. This study discusses the process on the determination of Malay language system features in the news genre document. Methodology starts with analysis the corpus of Malay news documents. The corpus contains 140 core news items which were selected from the two mainstream news databases in Malaysia which are Berita Harian and Utusan Malaysia. The selection news criteria are from core news categories, sized 50 to 250 words, the years of publication from 2007 to 2012 and news genres from economic, crime, education and sports. Three linguistic experts in Malay produced a headline summary for each news document manually. The experts need to comply with three conditions which are summary extraction, select-word-inorder word selection techniques and word morphological changes. The experimental results show that three characteristics have been identified, first: the first two sentenses are the important sentences, second: the verse that contains the potential acronym definitions is chosen as the most important sentence and third: the size of the summary of the ideal headline is six words. The consideration of this feature allows a summary of the headline that can be generated automatically, just like the process done by human.",
keywords = "Headline, Malay corpus, Malay news, Natural Language Processing, Text summarization",
author = "{Mohd Noah}, {Shahrul Azman} and {Mohamad Ali}, Nazlena and Hasan, {Mohd Sabri}",
year = "2018",
month = "5",
day = "1",
doi = "10.17576/gema-2018-1802-11",
language = "Malay",
volume = "18",
pages = "154--167",
journal = "GEMA Online Journal of Language Studies",
issn = "1675-8021",
publisher = "Universiti Kebangsaan Malaysia",
number = "2",

}

TY - JOUR

T1 - Penentuan fitur bagi pengekstrakan tajuk berita akhbar bahasa Melayu

AU - Mohd Noah, Shahrul Azman

AU - Mohamad Ali, Nazlena

AU - Hasan, Mohd Sabri

PY - 2018/5/1

Y1 - 2018/5/1

N2 - Headline summarization is one of the automated text summarization techniques that can reduce the problem of information overload in the retrieval system and reduce the user's cognitive burden while searching and selecting relevant documents in large quantities. This study discusses the process on the determination of Malay language system features in the news genre document. Methodology starts with analysis the corpus of Malay news documents. The corpus contains 140 core news items which were selected from the two mainstream news databases in Malaysia which are Berita Harian and Utusan Malaysia. The selection news criteria are from core news categories, sized 50 to 250 words, the years of publication from 2007 to 2012 and news genres from economic, crime, education and sports. Three linguistic experts in Malay produced a headline summary for each news document manually. The experts need to comply with three conditions which are summary extraction, select-word-inorder word selection techniques and word morphological changes. The experimental results show that three characteristics have been identified, first: the first two sentenses are the important sentences, second: the verse that contains the potential acronym definitions is chosen as the most important sentence and third: the size of the summary of the ideal headline is six words. The consideration of this feature allows a summary of the headline that can be generated automatically, just like the process done by human.

AB - Headline summarization is one of the automated text summarization techniques that can reduce the problem of information overload in the retrieval system and reduce the user's cognitive burden while searching and selecting relevant documents in large quantities. This study discusses the process on the determination of Malay language system features in the news genre document. Methodology starts with analysis the corpus of Malay news documents. The corpus contains 140 core news items which were selected from the two mainstream news databases in Malaysia which are Berita Harian and Utusan Malaysia. The selection news criteria are from core news categories, sized 50 to 250 words, the years of publication from 2007 to 2012 and news genres from economic, crime, education and sports. Three linguistic experts in Malay produced a headline summary for each news document manually. The experts need to comply with three conditions which are summary extraction, select-word-inorder word selection techniques and word morphological changes. The experimental results show that three characteristics have been identified, first: the first two sentenses are the important sentences, second: the verse that contains the potential acronym definitions is chosen as the most important sentence and third: the size of the summary of the ideal headline is six words. The consideration of this feature allows a summary of the headline that can be generated automatically, just like the process done by human.

KW - Headline

KW - Malay corpus

KW - Malay news

KW - Natural Language Processing

KW - Text summarization

UR - http://www.scopus.com/inward/record.url?scp=85047894972&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047894972&partnerID=8YFLogxK

U2 - 10.17576/gema-2018-1802-11

DO - 10.17576/gema-2018-1802-11

M3 - Article

VL - 18

SP - 154

EP - 167

JO - GEMA Online Journal of Language Studies

JF - GEMA Online Journal of Language Studies

SN - 1675-8021

IS - 2

ER -