Multi-Classifier Jawi Handwritten sub-word recognition

Research output: Contribution to journalArticle

Abstract

The problems and challenges in Jawi handwritten recognition are inherited from Arabic script which consists of cursive natures, large variety of writing styles due to its morphologically rich, ligature, overlapping characters, dialects and the low quality of the manuscripts images. The word segmentation is difficult because the existence of sub words due to the presence of space within words when contain disconnect characters. The performance of previous Jawi handwritten recognition still consider sub-par. There are three main problem of previous approach. First, the recognizer consist of multiple independent components where the improvement of performance in one component not shared across the systems. Secondly, the features extraction using features engineering approach only works on specific subsets of training data and is less capable to handle broader variants of testing data. Finally, the classifier used implicit segmentation where target class is sub-word with limited lexicon. This paper propose use of Deep Learning approach to address the first problem where training is conducted end-to-end from input to class output which enable the improvement of each component to improve overall performance. Secondly, Convolutional Network is use as learning features optimizes the data representation through end-to-end training of the parameters from raw input data to target class. Finally, A multiclassifier implicitly segments the sub-word into sequences of characters are proposed. The classifiers consists of one sub-word length classifier and seven character classifiers. This approach is lexicon-free to address absent of lexicon data. Experiments conducted on a Jawi handwritten standard dataset showed an accuracy of up to 92.20% and suggest that the approach used is superior to state-ofthe- art methods of Jawi handwriting recognition.

Original languageEnglish
Pages (from-to)1528-1533
Number of pages6
JournalInternational Journal on Advanced Science, Engineering and Information Technology
Volume8
Issue number4-2
Publication statusPublished - 1 Jan 2018

Fingerprint

Classifiers
learning
arts
Learning
Handwriting
engineering
Manuscripts
Art
Ligation
Feature extraction
testing
Recognition (Psychology)
Testing
methodology
Experiments
Datasets

Keywords

  • Convolutional network
  • End-to-end learning, learning features
  • Handwritten recognition
  • Jawi
  • Sub-word

ASJC Scopus subject areas

  • Computer Science(all)
  • Agricultural and Biological Sciences(all)
  • Engineering(all)

Cite this

Multi-Classifier Jawi Handwritten sub-word recognition. / Hasan, Anton Heryanto; Omar, Khairuddin; Nasrudin, Mohammad Faidzul.

In: International Journal on Advanced Science, Engineering and Information Technology, Vol. 8, No. 4-2, 01.01.2018, p. 1528-1533.

Research output: Contribution to journalArticle

@article{8b51adba791f4b3d867bdc8f44933329,
title = "Multi-Classifier Jawi Handwritten sub-word recognition",
abstract = "The problems and challenges in Jawi handwritten recognition are inherited from Arabic script which consists of cursive natures, large variety of writing styles due to its morphologically rich, ligature, overlapping characters, dialects and the low quality of the manuscripts images. The word segmentation is difficult because the existence of sub words due to the presence of space within words when contain disconnect characters. The performance of previous Jawi handwritten recognition still consider sub-par. There are three main problem of previous approach. First, the recognizer consist of multiple independent components where the improvement of performance in one component not shared across the systems. Secondly, the features extraction using features engineering approach only works on specific subsets of training data and is less capable to handle broader variants of testing data. Finally, the classifier used implicit segmentation where target class is sub-word with limited lexicon. This paper propose use of Deep Learning approach to address the first problem where training is conducted end-to-end from input to class output which enable the improvement of each component to improve overall performance. Secondly, Convolutional Network is use as learning features optimizes the data representation through end-to-end training of the parameters from raw input data to target class. Finally, A multiclassifier implicitly segments the sub-word into sequences of characters are proposed. The classifiers consists of one sub-word length classifier and seven character classifiers. This approach is lexicon-free to address absent of lexicon data. Experiments conducted on a Jawi handwritten standard dataset showed an accuracy of up to 92.20{\%} and suggest that the approach used is superior to state-ofthe- art methods of Jawi handwriting recognition.",
keywords = "Convolutional network, End-to-end learning, learning features, Handwritten recognition, Jawi, Sub-word",
author = "Hasan, {Anton Heryanto} and Khairuddin Omar and Nasrudin, {Mohammad Faidzul}",
year = "2018",
month = "1",
day = "1",
language = "English",
volume = "8",
pages = "1528--1533",
journal = "International Journal on Advanced Science, Engineering and Information Technology",
issn = "2088-5334",
publisher = "INSIGHT - Indonesian Society for Knowledge and Human Development",
number = "4-2",

}

TY - JOUR

T1 - Multi-Classifier Jawi Handwritten sub-word recognition

AU - Hasan, Anton Heryanto

AU - Omar, Khairuddin

AU - Nasrudin, Mohammad Faidzul

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The problems and challenges in Jawi handwritten recognition are inherited from Arabic script which consists of cursive natures, large variety of writing styles due to its morphologically rich, ligature, overlapping characters, dialects and the low quality of the manuscripts images. The word segmentation is difficult because the existence of sub words due to the presence of space within words when contain disconnect characters. The performance of previous Jawi handwritten recognition still consider sub-par. There are three main problem of previous approach. First, the recognizer consist of multiple independent components where the improvement of performance in one component not shared across the systems. Secondly, the features extraction using features engineering approach only works on specific subsets of training data and is less capable to handle broader variants of testing data. Finally, the classifier used implicit segmentation where target class is sub-word with limited lexicon. This paper propose use of Deep Learning approach to address the first problem where training is conducted end-to-end from input to class output which enable the improvement of each component to improve overall performance. Secondly, Convolutional Network is use as learning features optimizes the data representation through end-to-end training of the parameters from raw input data to target class. Finally, A multiclassifier implicitly segments the sub-word into sequences of characters are proposed. The classifiers consists of one sub-word length classifier and seven character classifiers. This approach is lexicon-free to address absent of lexicon data. Experiments conducted on a Jawi handwritten standard dataset showed an accuracy of up to 92.20% and suggest that the approach used is superior to state-ofthe- art methods of Jawi handwriting recognition.

AB - The problems and challenges in Jawi handwritten recognition are inherited from Arabic script which consists of cursive natures, large variety of writing styles due to its morphologically rich, ligature, overlapping characters, dialects and the low quality of the manuscripts images. The word segmentation is difficult because the existence of sub words due to the presence of space within words when contain disconnect characters. The performance of previous Jawi handwritten recognition still consider sub-par. There are three main problem of previous approach. First, the recognizer consist of multiple independent components where the improvement of performance in one component not shared across the systems. Secondly, the features extraction using features engineering approach only works on specific subsets of training data and is less capable to handle broader variants of testing data. Finally, the classifier used implicit segmentation where target class is sub-word with limited lexicon. This paper propose use of Deep Learning approach to address the first problem where training is conducted end-to-end from input to class output which enable the improvement of each component to improve overall performance. Secondly, Convolutional Network is use as learning features optimizes the data representation through end-to-end training of the parameters from raw input data to target class. Finally, A multiclassifier implicitly segments the sub-word into sequences of characters are proposed. The classifiers consists of one sub-word length classifier and seven character classifiers. This approach is lexicon-free to address absent of lexicon data. Experiments conducted on a Jawi handwritten standard dataset showed an accuracy of up to 92.20% and suggest that the approach used is superior to state-ofthe- art methods of Jawi handwriting recognition.

KW - Convolutional network

KW - End-to-end learning, learning features

KW - Handwritten recognition

KW - Jawi

KW - Sub-word

UR - http://www.scopus.com/inward/record.url?scp=85055353731&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055353731&partnerID=8YFLogxK

M3 - Article

VL - 8

SP - 1528

EP - 1533

JO - International Journal on Advanced Science, Engineering and Information Technology

JF - International Journal on Advanced Science, Engineering and Information Technology

SN - 2088-5334

IS - 4-2

ER -