Abstract
Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.
Original language | English |
---|---|
Pages (from-to) | 1554-1562 |
Number of pages | 9 |
Journal | Information Technology Journal |
Volume | 10 |
Issue number | 8 |
DOIs | |
Publication status | Published - 2011 |
Externally published | Yes |
Fingerprint
Keywords
- Bilingual dictionary
- Concept-based IR
- Cross-language information retrieval
- Proper names identification and translation
- Query translation
ASJC Scopus subject areas
- Computer Science (miscellaneous)
Cite this
Multiword phrases indexing for malay-english cross-language information retrieval. / Rais, N. H.; Abdullah, M. T.; Rabiah, Abdul Kadir.
In: Information Technology Journal, Vol. 10, No. 8, 2011, p. 1554-1562.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - Multiword phrases indexing for malay-english cross-language information retrieval
AU - Rais, N. H.
AU - Abdullah, M. T.
AU - Rabiah, Abdul Kadir
PY - 2011
Y1 - 2011
N2 - Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.
AB - Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.
KW - Bilingual dictionary
KW - Concept-based IR
KW - Cross-language information retrieval
KW - Proper names identification and translation
KW - Query translation
UR - http://www.scopus.com/inward/record.url?scp=79959557386&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959557386&partnerID=8YFLogxK
U2 - 10.3923/itj.2011.1554.1562
DO - 10.3923/itj.2011.1554.1562
M3 - Article
AN - SCOPUS:79959557386
VL - 10
SP - 1554
EP - 1562
JO - Information Technology Journal
JF - Information Technology Journal
SN - 1812-5638
IS - 8
ER -