Rule based shallow parser for Arabic language

Mona Ali Mohammed, Nazlia Omar

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Problem statement: One of language processing approaches that compute a basic analysis of sentence structure rather than attempting full syntactic analysis is shallow syntactic parsing. It is an analysis of a sentence which identifies the constituents (noun groups, verb groups, prepositional groups), but does not specify their internal structure, nor their role in the main sentence. The only technique used for Arabic shallow parser is Support Vector Machine (SVM) based approach. The problem faced by shallow parser developers is the boundary identification which is applied to ensure the generation of high accuracy system performance. Approach: The specific objective of the research was to identify the entire Noun Phrases (NPs), Verb Phrases (VPs) and Prepositional Phrases (PPs) boundaries in the Arabic language. This study discussed various idiosyncrasies of Arabic sentences to derive more accurate rules to detect start and the end boundaries of each clause in an Arabic sentence. New rules were proposed to the shallow parser features up to the generation of two levels from full parse-tree. We described an implementation and evaluate the rule-based shallow parser that handles chunking of Arabic sentences. This research was based on a critical analysis of the Arabic sentences architecture. It discussed various idiosyncrasies of Arabic sentences to derive more accurate rules to detect the start and the end boundaries of each clause in an Arabic sentence. Results: The system was tested manually on 70 Arabic sentences which composed of 1776 words, with the length of the sentences between 4-50 words. The result obtained was significantly better than state of the art Arabic published results, which achieved F-scores of 97%. Conclusion: The main achievement includes the development of Arabic shallow parser based on rule-based approaches. Chunking which constitutes the main contribution is achieved on two successive stages that include grouped sequences of adjacent words on the basis of linguistic properties.

Original languageEnglish
Pages (from-to)1505-1514
Number of pages10
JournalJournal of Computer Science
Volume7
Issue number10
DOIs
Publication statusPublished - 2011

Fingerprint

Syntactics
Linguistics
Support vector machines
Processing

Keywords

  • Arabic language phrases
  • Arabic language processing
  • Arabic shallow parsing
  • Hand shallow
  • Natural Language Processing (NLP)
  • Part Of Speech (POS)
  • Rule based approaches
  • Text chunking

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Rule based shallow parser for Arabic language. / Mohammed, Mona Ali; Omar, Nazlia.

In: Journal of Computer Science, Vol. 7, No. 10, 2011, p. 1505-1514.

Research output: Contribution to journalArticle

Mohammed, Mona Ali ; Omar, Nazlia. / Rule based shallow parser for Arabic language. In: Journal of Computer Science. 2011 ; Vol. 7, No. 10. pp. 1505-1514.
@article{852dd90d682246c1adbdc59b39631c5b,
title = "Rule based shallow parser for Arabic language",
abstract = "Problem statement: One of language processing approaches that compute a basic analysis of sentence structure rather than attempting full syntactic analysis is shallow syntactic parsing. It is an analysis of a sentence which identifies the constituents (noun groups, verb groups, prepositional groups), but does not specify their internal structure, nor their role in the main sentence. The only technique used for Arabic shallow parser is Support Vector Machine (SVM) based approach. The problem faced by shallow parser developers is the boundary identification which is applied to ensure the generation of high accuracy system performance. Approach: The specific objective of the research was to identify the entire Noun Phrases (NPs), Verb Phrases (VPs) and Prepositional Phrases (PPs) boundaries in the Arabic language. This study discussed various idiosyncrasies of Arabic sentences to derive more accurate rules to detect start and the end boundaries of each clause in an Arabic sentence. New rules were proposed to the shallow parser features up to the generation of two levels from full parse-tree. We described an implementation and evaluate the rule-based shallow parser that handles chunking of Arabic sentences. This research was based on a critical analysis of the Arabic sentences architecture. It discussed various idiosyncrasies of Arabic sentences to derive more accurate rules to detect the start and the end boundaries of each clause in an Arabic sentence. Results: The system was tested manually on 70 Arabic sentences which composed of 1776 words, with the length of the sentences between 4-50 words. The result obtained was significantly better than state of the art Arabic published results, which achieved F-scores of 97{\%}. Conclusion: The main achievement includes the development of Arabic shallow parser based on rule-based approaches. Chunking which constitutes the main contribution is achieved on two successive stages that include grouped sequences of adjacent words on the basis of linguistic properties.",
keywords = "Arabic language phrases, Arabic language processing, Arabic shallow parsing, Hand shallow, Natural Language Processing (NLP), Part Of Speech (POS), Rule based approaches, Text chunking",
author = "Mohammed, {Mona Ali} and Nazlia Omar",
year = "2011",
doi = "10.3844/jcssp.2011.1505.1514",
language = "English",
volume = "7",
pages = "1505--1514",
journal = "Journal of Computer Science",
issn = "1549-3636",
publisher = "Science Publications",
number = "10",

}

TY - JOUR

T1 - Rule based shallow parser for Arabic language

AU - Mohammed, Mona Ali

AU - Omar, Nazlia

PY - 2011

Y1 - 2011

N2 - Problem statement: One of language processing approaches that compute a basic analysis of sentence structure rather than attempting full syntactic analysis is shallow syntactic parsing. It is an analysis of a sentence which identifies the constituents (noun groups, verb groups, prepositional groups), but does not specify their internal structure, nor their role in the main sentence. The only technique used for Arabic shallow parser is Support Vector Machine (SVM) based approach. The problem faced by shallow parser developers is the boundary identification which is applied to ensure the generation of high accuracy system performance. Approach: The specific objective of the research was to identify the entire Noun Phrases (NPs), Verb Phrases (VPs) and Prepositional Phrases (PPs) boundaries in the Arabic language. This study discussed various idiosyncrasies of Arabic sentences to derive more accurate rules to detect start and the end boundaries of each clause in an Arabic sentence. New rules were proposed to the shallow parser features up to the generation of two levels from full parse-tree. We described an implementation and evaluate the rule-based shallow parser that handles chunking of Arabic sentences. This research was based on a critical analysis of the Arabic sentences architecture. It discussed various idiosyncrasies of Arabic sentences to derive more accurate rules to detect the start and the end boundaries of each clause in an Arabic sentence. Results: The system was tested manually on 70 Arabic sentences which composed of 1776 words, with the length of the sentences between 4-50 words. The result obtained was significantly better than state of the art Arabic published results, which achieved F-scores of 97%. Conclusion: The main achievement includes the development of Arabic shallow parser based on rule-based approaches. Chunking which constitutes the main contribution is achieved on two successive stages that include grouped sequences of adjacent words on the basis of linguistic properties.

AB - Problem statement: One of language processing approaches that compute a basic analysis of sentence structure rather than attempting full syntactic analysis is shallow syntactic parsing. It is an analysis of a sentence which identifies the constituents (noun groups, verb groups, prepositional groups), but does not specify their internal structure, nor their role in the main sentence. The only technique used for Arabic shallow parser is Support Vector Machine (SVM) based approach. The problem faced by shallow parser developers is the boundary identification which is applied to ensure the generation of high accuracy system performance. Approach: The specific objective of the research was to identify the entire Noun Phrases (NPs), Verb Phrases (VPs) and Prepositional Phrases (PPs) boundaries in the Arabic language. This study discussed various idiosyncrasies of Arabic sentences to derive more accurate rules to detect start and the end boundaries of each clause in an Arabic sentence. New rules were proposed to the shallow parser features up to the generation of two levels from full parse-tree. We described an implementation and evaluate the rule-based shallow parser that handles chunking of Arabic sentences. This research was based on a critical analysis of the Arabic sentences architecture. It discussed various idiosyncrasies of Arabic sentences to derive more accurate rules to detect the start and the end boundaries of each clause in an Arabic sentence. Results: The system was tested manually on 70 Arabic sentences which composed of 1776 words, with the length of the sentences between 4-50 words. The result obtained was significantly better than state of the art Arabic published results, which achieved F-scores of 97%. Conclusion: The main achievement includes the development of Arabic shallow parser based on rule-based approaches. Chunking which constitutes the main contribution is achieved on two successive stages that include grouped sequences of adjacent words on the basis of linguistic properties.

KW - Arabic language phrases

KW - Arabic language processing

KW - Arabic shallow parsing

KW - Hand shallow

KW - Natural Language Processing (NLP)

KW - Part Of Speech (POS)

KW - Rule based approaches

KW - Text chunking

UR - http://www.scopus.com/inward/record.url?scp=80053188453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053188453&partnerID=8YFLogxK

U2 - 10.3844/jcssp.2011.1505.1514

DO - 10.3844/jcssp.2011.1505.1514

M3 - Article

VL - 7

SP - 1505

EP - 1514

JO - Journal of Computer Science

JF - Journal of Computer Science

SN - 1549-3636

IS - 10

ER -