A comparative study between methods of arabic baseline detection

Atallah Al-Shatnawi, Khairuddin Omar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Preprocessing is the most important stage in the Arabic OCR system; it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages. It is worth mentioning that Arabic language is cursively written, and its characters have between two to four shapes. An Arabic word likely consists of two or more characters which are connected through an imaginary line called baseline. Detecting baseline is one of the main majorities in preprocessing Arabic OCR system. The baseline can be used for both skew normalization and character segmentation. In this paper the challenges of the Arabic baseline detection methods are listed and clarified. Also this paper aims to provide a brief comparison between the methods of Arabic baseline detection. The comparison has been done based on each of the natures of the Arabic language written, and the diacritics, such as dots and zigzag, and the word slop, and the subwords found.

Original languageEnglish
Title of host publicationProceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009
Pages73-77
Number of pages5
Volume1
DOIs
Publication statusPublished - 2009
Event2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009 - Selangor
Duration: 5 Aug 20097 Aug 2009

Other

Other2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009
CitySelangor
Period5/8/097/8/09

Fingerprint

Optical character recognition
Feature extraction

Keywords

  • Arabic
  • Baseline
  • Contour
  • Handwritten
  • Horizontal projection
  • OCR
  • Offline
  • Preprocessing
  • Principle component analysis
  • Skeleton

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Energy Engineering and Power Technology
  • Electrical and Electronic Engineering

Cite this

Al-Shatnawi, A., & Omar, K. (2009). A comparative study between methods of arabic baseline detection. In Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009 (Vol. 1, pp. 73-77). [5254814] https://doi.org/10.1109/ICEEI.2009.5254814

A comparative study between methods of arabic baseline detection. / Al-Shatnawi, Atallah; Omar, Khairuddin.

Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009. Vol. 1 2009. p. 73-77 5254814.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Al-Shatnawi, A & Omar, K 2009, A comparative study between methods of arabic baseline detection. in Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009. vol. 1, 5254814, pp. 73-77, 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009, Selangor, 5/8/09. https://doi.org/10.1109/ICEEI.2009.5254814
Al-Shatnawi A, Omar K. A comparative study between methods of arabic baseline detection. In Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009. Vol. 1. 2009. p. 73-77. 5254814 https://doi.org/10.1109/ICEEI.2009.5254814
Al-Shatnawi, Atallah ; Omar, Khairuddin. / A comparative study between methods of arabic baseline detection. Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009. Vol. 1 2009. pp. 73-77
@inproceedings{bb59a5bd96884dfda3a899f0a53ee078,
title = "A comparative study between methods of arabic baseline detection",
abstract = "Preprocessing is the most important stage in the Arabic OCR system; it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages. It is worth mentioning that Arabic language is cursively written, and its characters have between two to four shapes. An Arabic word likely consists of two or more characters which are connected through an imaginary line called baseline. Detecting baseline is one of the main majorities in preprocessing Arabic OCR system. The baseline can be used for both skew normalization and character segmentation. In this paper the challenges of the Arabic baseline detection methods are listed and clarified. Also this paper aims to provide a brief comparison between the methods of Arabic baseline detection. The comparison has been done based on each of the natures of the Arabic language written, and the diacritics, such as dots and zigzag, and the word slop, and the subwords found.",
keywords = "Arabic, Baseline, Contour, Handwritten, Horizontal projection, OCR, Offline, Preprocessing, Principle component analysis, Skeleton",
author = "Atallah Al-Shatnawi and Khairuddin Omar",
year = "2009",
doi = "10.1109/ICEEI.2009.5254814",
language = "English",
isbn = "9781424449132",
volume = "1",
pages = "73--77",
booktitle = "Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009",

}

TY - GEN

T1 - A comparative study between methods of arabic baseline detection

AU - Al-Shatnawi, Atallah

AU - Omar, Khairuddin

PY - 2009

Y1 - 2009

N2 - Preprocessing is the most important stage in the Arabic OCR system; it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages. It is worth mentioning that Arabic language is cursively written, and its characters have between two to four shapes. An Arabic word likely consists of two or more characters which are connected through an imaginary line called baseline. Detecting baseline is one of the main majorities in preprocessing Arabic OCR system. The baseline can be used for both skew normalization and character segmentation. In this paper the challenges of the Arabic baseline detection methods are listed and clarified. Also this paper aims to provide a brief comparison between the methods of Arabic baseline detection. The comparison has been done based on each of the natures of the Arabic language written, and the diacritics, such as dots and zigzag, and the word slop, and the subwords found.

AB - Preprocessing is the most important stage in the Arabic OCR system; it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages. It is worth mentioning that Arabic language is cursively written, and its characters have between two to four shapes. An Arabic word likely consists of two or more characters which are connected through an imaginary line called baseline. Detecting baseline is one of the main majorities in preprocessing Arabic OCR system. The baseline can be used for both skew normalization and character segmentation. In this paper the challenges of the Arabic baseline detection methods are listed and clarified. Also this paper aims to provide a brief comparison between the methods of Arabic baseline detection. The comparison has been done based on each of the natures of the Arabic language written, and the diacritics, such as dots and zigzag, and the word slop, and the subwords found.

KW - Arabic

KW - Baseline

KW - Contour

KW - Handwritten

KW - Horizontal projection

KW - OCR

KW - Offline

KW - Preprocessing

KW - Principle component analysis

KW - Skeleton

UR - http://www.scopus.com/inward/record.url?scp=70449625467&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449625467&partnerID=8YFLogxK

U2 - 10.1109/ICEEI.2009.5254814

DO - 10.1109/ICEEI.2009.5254814

M3 - Conference contribution

AN - SCOPUS:70449625467

SN - 9781424449132

VL - 1

SP - 73

EP - 77

BT - Proceedings of the 2009 International Conference on Electrical Engineering and Informatics, ICEEI 2009

ER -