Improving arabic part-of-speech tagging through morphological analysis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This paper describes our newly-developed second order hidden Markov model part-of-speech tagging system specially designed to tag Arabic texts using small training data. The tagger achieves encouraging results. In addition, the paper also presents a hybrid tagging architecture for Arabic, in which our tagger augmented with a weighted morphological analyzer. Finally, we compare the tagger results-both standalone and utilizing a highly coverage morphological analyzer. Experimental results are presented and discussed using small training corpus. The experiments show that the best proposed hybrid architecture significantly improves unknown words POS tagging accuracy. 96.6% precision rates are obtained when unknown words occur in the test set.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages317-326
Number of pages10
Volume6591 LNAI
EditionPART 1
DOIs
Publication statusPublished - 2011
Event3rd International Conference on Intelligent Information and Database Systems, ACIIDS 2011 - Daegu
Duration: 20 Apr 201122 Apr 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume6591 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other3rd International Conference on Intelligent Information and Database Systems, ACIIDS 2011
CityDaegu
Period20/4/1122/4/11

Fingerprint

Morphological Analysis
Tagging
Hidden Markov models
Experiments
Unknown
Test Set
Markov Model
Coverage
Speech
Experimental Results
Experiment
Architecture
Training

Keywords

  • Arabic languages
  • Hidden Markov model
  • Morphological analysis
  • Unknown words

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Albared, M., Omar, N., & Ab Aziz, M. J. (2011). Improving arabic part-of-speech tagging through morphological analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 1 ed., Vol. 6591 LNAI, pp. 317-326). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6591 LNAI, No. PART 1). https://doi.org/10.1007/978-3-642-20039-7-32

Improving arabic part-of-speech tagging through morphological analysis. / Albared, Mohammed; Omar, Nazlia; Ab Aziz, Mohd Juzaiddin.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6591 LNAI PART 1. ed. 2011. p. 317-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6591 LNAI, No. PART 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Albared, M, Omar, N & Ab Aziz, MJ 2011, Improving arabic part-of-speech tagging through morphological analysis. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 edn, vol. 6591 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 6591 LNAI, pp. 317-326, 3rd International Conference on Intelligent Information and Database Systems, ACIIDS 2011, Daegu, 20/4/11. https://doi.org/10.1007/978-3-642-20039-7-32
Albared M, Omar N, Ab Aziz MJ. Improving arabic part-of-speech tagging through morphological analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 ed. Vol. 6591 LNAI. 2011. p. 317-326. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). https://doi.org/10.1007/978-3-642-20039-7-32
Albared, Mohammed ; Omar, Nazlia ; Ab Aziz, Mohd Juzaiddin. / Improving arabic part-of-speech tagging through morphological analysis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6591 LNAI PART 1. ed. 2011. pp. 317-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).
@inproceedings{9737ef912ec94ab2bddd95e67cb4fa6c,
title = "Improving arabic part-of-speech tagging through morphological analysis",
abstract = "This paper describes our newly-developed second order hidden Markov model part-of-speech tagging system specially designed to tag Arabic texts using small training data. The tagger achieves encouraging results. In addition, the paper also presents a hybrid tagging architecture for Arabic, in which our tagger augmented with a weighted morphological analyzer. Finally, we compare the tagger results-both standalone and utilizing a highly coverage morphological analyzer. Experimental results are presented and discussed using small training corpus. The experiments show that the best proposed hybrid architecture significantly improves unknown words POS tagging accuracy. 96.6{\%} precision rates are obtained when unknown words occur in the test set.",
keywords = "Arabic languages, Hidden Markov model, Morphological analysis, Unknown words",
author = "Mohammed Albared and Nazlia Omar and {Ab Aziz}, {Mohd Juzaiddin}",
year = "2011",
doi = "10.1007/978-3-642-20039-7-32",
language = "English",
isbn = "9783642200380",
volume = "6591 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 1",
pages = "317--326",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 1",

}

TY - GEN

T1 - Improving arabic part-of-speech tagging through morphological analysis

AU - Albared, Mohammed

AU - Omar, Nazlia

AU - Ab Aziz, Mohd Juzaiddin

PY - 2011

Y1 - 2011

N2 - This paper describes our newly-developed second order hidden Markov model part-of-speech tagging system specially designed to tag Arabic texts using small training data. The tagger achieves encouraging results. In addition, the paper also presents a hybrid tagging architecture for Arabic, in which our tagger augmented with a weighted morphological analyzer. Finally, we compare the tagger results-both standalone and utilizing a highly coverage morphological analyzer. Experimental results are presented and discussed using small training corpus. The experiments show that the best proposed hybrid architecture significantly improves unknown words POS tagging accuracy. 96.6% precision rates are obtained when unknown words occur in the test set.

AB - This paper describes our newly-developed second order hidden Markov model part-of-speech tagging system specially designed to tag Arabic texts using small training data. The tagger achieves encouraging results. In addition, the paper also presents a hybrid tagging architecture for Arabic, in which our tagger augmented with a weighted morphological analyzer. Finally, we compare the tagger results-both standalone and utilizing a highly coverage morphological analyzer. Experimental results are presented and discussed using small training corpus. The experiments show that the best proposed hybrid architecture significantly improves unknown words POS tagging accuracy. 96.6% precision rates are obtained when unknown words occur in the test set.

KW - Arabic languages

KW - Hidden Markov model

KW - Morphological analysis

KW - Unknown words

UR - http://www.scopus.com/inward/record.url?scp=84872155876&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872155876&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-20039-7-32

DO - 10.1007/978-3-642-20039-7-32

M3 - Conference contribution

AN - SCOPUS:84872155876

SN - 9783642200380

VL - 6591 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 317

EP - 326

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -