An automatic noun compound extraction from Arabic corpus

Abdulgabbar Mohammed Saif, Mohd Juzaiddin Ab Aziz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

The identification of noun compound as multi-word lexical units is very important task in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. In this paper, we used the hybrid method for extracting the noun compound from Arabic corpus that is based on linguistic knowledge and statistical measures. For the candidate identification, we have used some linguistic analysis tools such as lemmatization and POS in order to filter the candidates and determine the variations. The association measures have been computed for each candidate to rank the candidates. After that, we have evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.

Original languageEnglish
Title of host publication2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011
Pages224-230
Number of pages7
DOIs
Publication statusPublished - 2011
Event2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011 - Putrajaya
Duration: 28 Jun 201129 Jun 2011

Other

Other2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011
CityPutrajaya
Period28/6/1129/6/11

Fingerprint

Linguistics
Information retrieval
Semantics
Processing

Keywords

  • Arabic noun compund
  • Association measures
  • hybrid method
  • lemmatization
  • morphological variations
  • n-best evaluation method

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Information Systems

Cite this

Saif, A. M., & Ab Aziz, M. J. (2011). An automatic noun compound extraction from Arabic corpus. In 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011 (pp. 224-230). [5995793] https://doi.org/10.1109/STAIR.2011.5995793

An automatic noun compound extraction from Arabic corpus. / Saif, Abdulgabbar Mohammed; Ab Aziz, Mohd Juzaiddin.

2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011. 2011. p. 224-230 5995793.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Saif, AM & Ab Aziz, MJ 2011, An automatic noun compound extraction from Arabic corpus. in 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011., 5995793, pp. 224-230, 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011, Putrajaya, 28/6/11. https://doi.org/10.1109/STAIR.2011.5995793
Saif AM, Ab Aziz MJ. An automatic noun compound extraction from Arabic corpus. In 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011. 2011. p. 224-230. 5995793 https://doi.org/10.1109/STAIR.2011.5995793
Saif, Abdulgabbar Mohammed ; Ab Aziz, Mohd Juzaiddin. / An automatic noun compound extraction from Arabic corpus. 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011. 2011. pp. 224-230
@inproceedings{fd8d222b59ce4f98aa993a30d5f3a094,
title = "An automatic noun compound extraction from Arabic corpus",
abstract = "The identification of noun compound as multi-word lexical units is very important task in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. In this paper, we used the hybrid method for extracting the noun compound from Arabic corpus that is based on linguistic knowledge and statistical measures. For the candidate identification, we have used some linguistic analysis tools such as lemmatization and POS in order to filter the candidates and determine the variations. The association measures have been computed for each candidate to rank the candidates. After that, we have evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.",
keywords = "Arabic noun compund, Association measures, hybrid method, lemmatization, morphological variations, n-best evaluation method",
author = "Saif, {Abdulgabbar Mohammed} and {Ab Aziz}, {Mohd Juzaiddin}",
year = "2011",
doi = "10.1109/STAIR.2011.5995793",
language = "English",
isbn = "9781612843537",
pages = "224--230",
booktitle = "2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011",

}

TY - GEN

T1 - An automatic noun compound extraction from Arabic corpus

AU - Saif, Abdulgabbar Mohammed

AU - Ab Aziz, Mohd Juzaiddin

PY - 2011

Y1 - 2011

N2 - The identification of noun compound as multi-word lexical units is very important task in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. In this paper, we used the hybrid method for extracting the noun compound from Arabic corpus that is based on linguistic knowledge and statistical measures. For the candidate identification, we have used some linguistic analysis tools such as lemmatization and POS in order to filter the candidates and determine the variations. The association measures have been computed for each candidate to rank the candidates. After that, we have evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.

AB - The identification of noun compound as multi-word lexical units is very important task in natural language processing applications that require some degree of semantic interpretation such as, machine translation, information retrieval and text summarization. In this paper, we used the hybrid method for extracting the noun compound from Arabic corpus that is based on linguistic knowledge and statistical measures. For the candidate identification, we have used some linguistic analysis tools such as lemmatization and POS in order to filter the candidates and determine the variations. The association measures have been computed for each candidate to rank the candidates. After that, we have evaluated the association measures by using the n-best evaluation method. We reported the precision values for each association measure in each n-best list. The experimental results showed that the log-likelihood ratio is the best association measure that achieved highest precision.

KW - Arabic noun compund

KW - Association measures

KW - hybrid method

KW - lemmatization

KW - morphological variations

KW - n-best evaluation method

UR - http://www.scopus.com/inward/record.url?scp=80052558202&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052558202&partnerID=8YFLogxK

U2 - 10.1109/STAIR.2011.5995793

DO - 10.1109/STAIR.2011.5995793

M3 - Conference contribution

SN - 9781612843537

SP - 224

EP - 230

BT - 2011 International Conference on Semantic Technology and Information Retrieval, STAIR 2011

ER -