Multiword phrases indexing for malay-english cross-language information retrieval

N. H. Rais, M. T. Abdullah, Abdul Kadir Rabiah

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.

Original languageEnglish
Pages (from-to)1554-1562
Number of pages9
JournalInformation Technology Journal
Volume10
Issue number8
DOIs
Publication statusPublished - 2011
Externally publishedYes

Fingerprint

Query languages
Glossaries
Information retrieval systems
Experiments

Keywords

  • Bilingual dictionary
  • Concept-based IR
  • Cross-language information retrieval
  • Proper names identification and translation
  • Query translation

ASJC Scopus subject areas

  • Computer Science (miscellaneous)

Cite this

Multiword phrases indexing for malay-english cross-language information retrieval. / Rais, N. H.; Abdullah, M. T.; Rabiah, Abdul Kadir.

In: Information Technology Journal, Vol. 10, No. 8, 2011, p. 1554-1562.

Research output: Contribution to journalArticle

@article{25f327832b03457ab9a675008c800f19,
title = "Multiword phrases indexing for malay-english cross-language information retrieval",
abstract = "Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9{\%}. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.",
keywords = "Bilingual dictionary, Concept-based IR, Cross-language information retrieval, Proper names identification and translation, Query translation",
author = "Rais, {N. H.} and Abdullah, {M. T.} and Rabiah, {Abdul Kadir}",
year = "2011",
doi = "10.3923/itj.2011.1554.1562",
language = "English",
volume = "10",
pages = "1554--1562",
journal = "Information Technology Journal",
issn = "1812-5638",
publisher = "Asian Network for Scientific Information",
number = "8",

}

TY - JOUR

T1 - Multiword phrases indexing for malay-english cross-language information retrieval

AU - Rais, N. H.

AU - Abdullah, M. T.

AU - Rabiah, Abdul Kadir

PY - 2011

Y1 - 2011

N2 - Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.

AB - Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.

KW - Bilingual dictionary

KW - Concept-based IR

KW - Cross-language information retrieval

KW - Proper names identification and translation

KW - Query translation

UR - http://www.scopus.com/inward/record.url?scp=79959557386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959557386&partnerID=8YFLogxK

U2 - 10.3923/itj.2011.1554.1562

DO - 10.3923/itj.2011.1554.1562

M3 - Article

VL - 10

SP - 1554

EP - 1562

JO - Information Technology Journal

JF - Information Technology Journal

SN - 1812-5638

IS - 8

ER -