A framework for English and Malay cross-lingual document alignment method

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir

Research output: Contribution to journalArticle

Abstract

Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks.

Original languageEnglish
Article number38
Pages (from-to)190-195
Number of pages6
JournalInternational Journal of Advanced Trends in Computer Science and Engineering
Volume8
Issue number1.3 S1
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

Information retrieval
Glossaries

Keywords

  • Cross-lingual information retrieval
  • Document alignment
  • Malay language
  • Sentiment-based approach

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Electrical and Electronic Engineering

Cite this

A framework for English and Malay cross-lingual document alignment method. / Nasharuddin, Nurul Amelina; Abdullah, Muhamad Taufik; Azman, Azreen; Kadir, Rabiah Abdul.

In: International Journal of Advanced Trends in Computer Science and Engineering, Vol. 8, No. 1.3 S1, 38, 01.01.2019, p. 190-195.

Research output: Contribution to journalArticle

Nasharuddin, Nurul Amelina ; Abdullah, Muhamad Taufik ; Azman, Azreen ; Kadir, Rabiah Abdul. / A framework for English and Malay cross-lingual document alignment method. In: International Journal of Advanced Trends in Computer Science and Engineering. 2019 ; Vol. 8, No. 1.3 S1. pp. 190-195.
@article{45a9a1114fd343eda3620462f013327e,
title = "A framework for English and Malay cross-lingual document alignment method",
abstract = "Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks.",
keywords = "Cross-lingual information retrieval, Document alignment, Malay language, Sentiment-based approach",
author = "Nasharuddin, {Nurul Amelina} and Abdullah, {Muhamad Taufik} and Azreen Azman and Kadir, {Rabiah Abdul}",
year = "2019",
month = "1",
day = "1",
doi = "10.30534/ijatcse/2019/3881.32019",
language = "English",
volume = "8",
pages = "190--195",
journal = "International Journal of Advanced Trends in Computer Science and Engineering",
issn = "2278-3091",
publisher = "World Academy of Research in Science and Engineering",
number = "1.3 S1",

}

TY - JOUR

T1 - A framework for English and Malay cross-lingual document alignment method

AU - Nasharuddin, Nurul Amelina

AU - Abdullah, Muhamad Taufik

AU - Azman, Azreen

AU - Kadir, Rabiah Abdul

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks.

AB - Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks.

KW - Cross-lingual information retrieval

KW - Document alignment

KW - Malay language

KW - Sentiment-based approach

UR - http://www.scopus.com/inward/record.url?scp=85074165483&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074165483&partnerID=8YFLogxK

U2 - 10.30534/ijatcse/2019/3881.32019

DO - 10.30534/ijatcse/2019/3881.32019

M3 - Article

AN - SCOPUS:85074165483

VL - 8

SP - 190

EP - 195

JO - International Journal of Advanced Trends in Computer Science and Engineering

JF - International Journal of Advanced Trends in Computer Science and Engineering

SN - 2278-3091

IS - 1.3 S1

M1 - 38

ER -