Digit recognition for Arabic/Jawi and Roman using features from triangle geometry

Mohd Sanusi Azmi, Khairuddin Omar, Mohammad Faidzul Nasrudin, Bahari Idrus, Khadijah Wan Mohd Ghazali

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

A novel method is proposed to recognize the Arab/Jawi and Roman digits. This new method is based on features from the triangle geometry, normalized into nine features. The features are used for zoning which results in five and 25 zones. The algorithm is validated by using three standard datasets which are publicly available and used by researchers in this field. The first dataset is HODA that contains 60,000 images for training and 20,000 images for testing. The second dataset is IFHCDB. This dataset has 52,380 isolated characters and 17,740 digits. Only the 17,740 images of digits are used for this research. For the roman digit, MNIST are chosen. MNIST dataset has 60,000 images for training and 10,000 images for testing. Supervised (SML) and Unsupervised Machine Learning (UML) are used to test the nine features. The SML used are Neural Network (NN) and Support Vector Machine (SVM). Whereas the UML uses Euclidean Distance Method with data mining algorithms; namely Mean Average Precision (eMAP) and Frequency Based (eFB). Results for SML testing for HODA dataset are 98.07% accuracy for SVM, and 96.73% for NN. For IFHCDB and MNIST the accuracy are 91.75% and 93.095% respectively. For the UML tests, HODA dataset is 93.91%, IFHCDB 85.94% and MNIST 86.61%. The train and test images are selected using both random and the original dataset's distribution. The results show that the accuracy of proposed algorithm is over 90% for each SML trained datasets where the highest result is the one that uses 25 zones features.

Original languageEnglish
Title of host publicationAIP Conference Proceedings
Pages526-537
Number of pages12
Volume1522
DOIs
Publication statusPublished - 2013
Event20th National Symposium on Mathematical Sciences - Research in Mathematical Sciences: A Catalyst for Creativity and Innovation, SKSM 2012 - Putrajaya
Duration: 18 Dec 201220 Dec 2012

Other

Other20th National Symposium on Mathematical Sciences - Research in Mathematical Sciences: A Catalyst for Creativity and Innovation, SKSM 2012
CityPutrajaya
Period18/12/1220/12/12

Fingerprint

digits
triangles
machine learning
geometry
education
data mining

Keywords

  • Digit Recognition
  • Features Extraction
  • Triangle Geometry Based Features

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Digit recognition for Arabic/Jawi and Roman using features from triangle geometry. / Azmi, Mohd Sanusi; Omar, Khairuddin; Nasrudin, Mohammad Faidzul; Idrus, Bahari; Wan Mohd Ghazali, Khadijah.

AIP Conference Proceedings. Vol. 1522 2013. p. 526-537.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Azmi, MS, Omar, K, Nasrudin, MF, Idrus, B & Wan Mohd Ghazali, K 2013, Digit recognition for Arabic/Jawi and Roman using features from triangle geometry. in AIP Conference Proceedings. vol. 1522, pp. 526-537, 20th National Symposium on Mathematical Sciences - Research in Mathematical Sciences: A Catalyst for Creativity and Innovation, SKSM 2012, Putrajaya, 18/12/12. https://doi.org/10.1063/1.4801171
Azmi, Mohd Sanusi ; Omar, Khairuddin ; Nasrudin, Mohammad Faidzul ; Idrus, Bahari ; Wan Mohd Ghazali, Khadijah. / Digit recognition for Arabic/Jawi and Roman using features from triangle geometry. AIP Conference Proceedings. Vol. 1522 2013. pp. 526-537
@inproceedings{50d6862022804f3c8108386a7ad6e4db,
title = "Digit recognition for Arabic/Jawi and Roman using features from triangle geometry",
abstract = "A novel method is proposed to recognize the Arab/Jawi and Roman digits. This new method is based on features from the triangle geometry, normalized into nine features. The features are used for zoning which results in five and 25 zones. The algorithm is validated by using three standard datasets which are publicly available and used by researchers in this field. The first dataset is HODA that contains 60,000 images for training and 20,000 images for testing. The second dataset is IFHCDB. This dataset has 52,380 isolated characters and 17,740 digits. Only the 17,740 images of digits are used for this research. For the roman digit, MNIST are chosen. MNIST dataset has 60,000 images for training and 10,000 images for testing. Supervised (SML) and Unsupervised Machine Learning (UML) are used to test the nine features. The SML used are Neural Network (NN) and Support Vector Machine (SVM). Whereas the UML uses Euclidean Distance Method with data mining algorithms; namely Mean Average Precision (eMAP) and Frequency Based (eFB). Results for SML testing for HODA dataset are 98.07{\%} accuracy for SVM, and 96.73{\%} for NN. For IFHCDB and MNIST the accuracy are 91.75{\%} and 93.095{\%} respectively. For the UML tests, HODA dataset is 93.91{\%}, IFHCDB 85.94{\%} and MNIST 86.61{\%}. The train and test images are selected using both random and the original dataset's distribution. The results show that the accuracy of proposed algorithm is over 90{\%} for each SML trained datasets where the highest result is the one that uses 25 zones features.",
keywords = "Digit Recognition, Features Extraction, Triangle Geometry Based Features",
author = "Azmi, {Mohd Sanusi} and Khairuddin Omar and Nasrudin, {Mohammad Faidzul} and Bahari Idrus and {Wan Mohd Ghazali}, Khadijah",
year = "2013",
doi = "10.1063/1.4801171",
language = "English",
isbn = "9780735411500",
volume = "1522",
pages = "526--537",
booktitle = "AIP Conference Proceedings",

}

TY - GEN

T1 - Digit recognition for Arabic/Jawi and Roman using features from triangle geometry

AU - Azmi, Mohd Sanusi

AU - Omar, Khairuddin

AU - Nasrudin, Mohammad Faidzul

AU - Idrus, Bahari

AU - Wan Mohd Ghazali, Khadijah

PY - 2013

Y1 - 2013

N2 - A novel method is proposed to recognize the Arab/Jawi and Roman digits. This new method is based on features from the triangle geometry, normalized into nine features. The features are used for zoning which results in five and 25 zones. The algorithm is validated by using three standard datasets which are publicly available and used by researchers in this field. The first dataset is HODA that contains 60,000 images for training and 20,000 images for testing. The second dataset is IFHCDB. This dataset has 52,380 isolated characters and 17,740 digits. Only the 17,740 images of digits are used for this research. For the roman digit, MNIST are chosen. MNIST dataset has 60,000 images for training and 10,000 images for testing. Supervised (SML) and Unsupervised Machine Learning (UML) are used to test the nine features. The SML used are Neural Network (NN) and Support Vector Machine (SVM). Whereas the UML uses Euclidean Distance Method with data mining algorithms; namely Mean Average Precision (eMAP) and Frequency Based (eFB). Results for SML testing for HODA dataset are 98.07% accuracy for SVM, and 96.73% for NN. For IFHCDB and MNIST the accuracy are 91.75% and 93.095% respectively. For the UML tests, HODA dataset is 93.91%, IFHCDB 85.94% and MNIST 86.61%. The train and test images are selected using both random and the original dataset's distribution. The results show that the accuracy of proposed algorithm is over 90% for each SML trained datasets where the highest result is the one that uses 25 zones features.

AB - A novel method is proposed to recognize the Arab/Jawi and Roman digits. This new method is based on features from the triangle geometry, normalized into nine features. The features are used for zoning which results in five and 25 zones. The algorithm is validated by using three standard datasets which are publicly available and used by researchers in this field. The first dataset is HODA that contains 60,000 images for training and 20,000 images for testing. The second dataset is IFHCDB. This dataset has 52,380 isolated characters and 17,740 digits. Only the 17,740 images of digits are used for this research. For the roman digit, MNIST are chosen. MNIST dataset has 60,000 images for training and 10,000 images for testing. Supervised (SML) and Unsupervised Machine Learning (UML) are used to test the nine features. The SML used are Neural Network (NN) and Support Vector Machine (SVM). Whereas the UML uses Euclidean Distance Method with data mining algorithms; namely Mean Average Precision (eMAP) and Frequency Based (eFB). Results for SML testing for HODA dataset are 98.07% accuracy for SVM, and 96.73% for NN. For IFHCDB and MNIST the accuracy are 91.75% and 93.095% respectively. For the UML tests, HODA dataset is 93.91%, IFHCDB 85.94% and MNIST 86.61%. The train and test images are selected using both random and the original dataset's distribution. The results show that the accuracy of proposed algorithm is over 90% for each SML trained datasets where the highest result is the one that uses 25 zones features.

KW - Digit Recognition

KW - Features Extraction

KW - Triangle Geometry Based Features

UR - http://www.scopus.com/inward/record.url?scp=84876923228&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876923228&partnerID=8YFLogxK

U2 - 10.1063/1.4801171

DO - 10.1063/1.4801171

M3 - Conference contribution

SN - 9780735411500

VL - 1522

SP - 526

EP - 537

BT - AIP Conference Proceedings

ER -