An automatic collocation extraction from Arabic corpus

Abdulgabbar Mohammad Saif, Mohd Juzaiddin Ab Aziz

Research output: Contribution to journalArticle

21 Citations (Scopus)

Abstract

Problem statement: The identification of collocations is very important part in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. Because of the complexities of Arabic, the collocations undergo some variations such as, morphological, graphical, syntactic variation that constitutes the difficulties of identifying the collocation. Approach: We used the hybrid method for extracting the collocations from Arabic corpus that is based on linguistic information and association measures. Results: This method extracted the bigram candidates of Arabic collocation from corpus and evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. Conclusion: The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.

Original languageEnglish
Pages (from-to)6-11
Number of pages6
JournalJournal of Computer Science
Volume7
Issue number1
DOIs
Publication statusPublished - 2011

Fingerprint

Syntactics
Information retrieval
Linguistics
Semantics
Processing

Keywords

  • Association measures
  • Collocation extraction
  • Collocation variations
  • Graphical variants
  • Hybrid methods
  • Morphosyntactic
  • n-best evaluation

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

An automatic collocation extraction from Arabic corpus. / Saif, Abdulgabbar Mohammad; Ab Aziz, Mohd Juzaiddin.

In: Journal of Computer Science, Vol. 7, No. 1, 2011, p. 6-11.

Research output: Contribution to journalArticle

@article{f4d0e6d94fea4129b88b1f7c9ae3954c,
title = "An automatic collocation extraction from Arabic corpus",
abstract = "Problem statement: The identification of collocations is very important part in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. Because of the complexities of Arabic, the collocations undergo some variations such as, morphological, graphical, syntactic variation that constitutes the difficulties of identifying the collocation. Approach: We used the hybrid method for extracting the collocations from Arabic corpus that is based on linguistic information and association measures. Results: This method extracted the bigram candidates of Arabic collocation from corpus and evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. Conclusion: The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.",
keywords = "Association measures, Collocation extraction, Collocation variations, Graphical variants, Hybrid methods, Morphosyntactic, n-best evaluation",
author = "Saif, {Abdulgabbar Mohammad} and {Ab Aziz}, {Mohd Juzaiddin}",
year = "2011",
doi = "10.3844/jcssp.2011.6.11",
language = "English",
volume = "7",
pages = "6--11",
journal = "Journal of Computer Science",
issn = "1549-3636",
publisher = "Science Publications",
number = "1",

}

TY - JOUR

T1 - An automatic collocation extraction from Arabic corpus

AU - Saif, Abdulgabbar Mohammad

AU - Ab Aziz, Mohd Juzaiddin

PY - 2011

Y1 - 2011

N2 - Problem statement: The identification of collocations is very important part in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. Because of the complexities of Arabic, the collocations undergo some variations such as, morphological, graphical, syntactic variation that constitutes the difficulties of identifying the collocation. Approach: We used the hybrid method for extracting the collocations from Arabic corpus that is based on linguistic information and association measures. Results: This method extracted the bigram candidates of Arabic collocation from corpus and evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. Conclusion: The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.

AB - Problem statement: The identification of collocations is very important part in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. Because of the complexities of Arabic, the collocations undergo some variations such as, morphological, graphical, syntactic variation that constitutes the difficulties of identifying the collocation. Approach: We used the hybrid method for extracting the collocations from Arabic corpus that is based on linguistic information and association measures. Results: This method extracted the bigram candidates of Arabic collocation from corpus and evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. Conclusion: The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.

KW - Association measures

KW - Collocation extraction

KW - Collocation variations

KW - Graphical variants

KW - Hybrid methods

KW - Morphosyntactic

KW - n-best evaluation

UR - http://www.scopus.com/inward/record.url?scp=79251571496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79251571496&partnerID=8YFLogxK

U2 - 10.3844/jcssp.2011.6.11

DO - 10.3844/jcssp.2011.6.11

M3 - Article

AN - SCOPUS:79251571496

VL - 7

SP - 6

EP - 11

JO - Journal of Computer Science

JF - Journal of Computer Science

SN - 1549-3636

IS - 1

ER -