Degraded historical document binarization: A review on issues, challenges, techniques, and future directions

Research output: Contribution to journalReview article

1 Citation (Scopus)

Abstract

In this era of digitization, most hardcopy documents are being transformed into digital formats. In the process of transformation, large quantities of documents are stored and preserved through electronic scanning. These documents are available from various sources such as ancient documentation, old legal records, medical reports, music scores, palm leaf, and reports on security-related issues. In particular, ancient and historical documents are hard to read due to their degradation in terms of low contrast and existence of corrupted artefacts. In recent times, degraded document binarization has been studied widely and several approaches were developed to deal with issues and challenges in document binarization. In this paper, a comprehensive review is conducted on the issues and challenges faced during the image binarization process, followed by insights on various methods used for image binarization. This paper also discusses the advanced methods used for the enhancement of degraded documents that improves the quality of documents during the binarization process. Further discussions are made on the effectiveness and robustness of existing methods, and there is still a scope to develop a hybrid approach that can deal with degraded document binarization more effectively.

Original languageEnglish
Article number48
JournalJournal of Imaging
Volume5
Issue number4
DOIs
Publication statusPublished - 1 Apr 2019

Fingerprint

Analog to digital conversion
Scanning
Degradation
Music
Documentation
Artifacts
Medical Records
Direction compound

Keywords

  • Accuracy
  • Binarization
  • Computing
  • Document degradation
  • Image artefacts
  • Image enhancement
  • Image manipulation
  • Image quality
  • OCR

ASJC Scopus subject areas

  • Radiology Nuclear Medicine and imaging
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design
  • Electrical and Electronic Engineering

Cite this

@article{37202ed5c3a7437fb061ff40eac23717,
title = "Degraded historical document binarization: A review on issues, challenges, techniques, and future directions",
abstract = "In this era of digitization, most hardcopy documents are being transformed into digital formats. In the process of transformation, large quantities of documents are stored and preserved through electronic scanning. These documents are available from various sources such as ancient documentation, old legal records, medical reports, music scores, palm leaf, and reports on security-related issues. In particular, ancient and historical documents are hard to read due to their degradation in terms of low contrast and existence of corrupted artefacts. In recent times, degraded document binarization has been studied widely and several approaches were developed to deal with issues and challenges in document binarization. In this paper, a comprehensive review is conducted on the issues and challenges faced during the image binarization process, followed by insights on various methods used for image binarization. This paper also discusses the advanced methods used for the enhancement of degraded documents that improves the quality of documents during the binarization process. Further discussions are made on the effectiveness and robustness of existing methods, and there is still a scope to develop a hybrid approach that can deal with degraded document binarization more effectively.",
keywords = "Accuracy, Binarization, Computing, Document degradation, Image artefacts, Image enhancement, Image manipulation, Image quality, OCR",
author = "Alaa Sulaiman and Khairuddin Omar and Nasrudin, {Mohammad Faidzul}",
year = "2019",
month = "4",
day = "1",
doi = "10.3390/jimaging5040048",
language = "English",
volume = "5",
journal = "Journal of Imaging",
issn = "2313-433X",
publisher = "Multidisciplinary Digital Publishing Institute",
number = "4",

}

TY - JOUR

T1 - Degraded historical document binarization

T2 - A review on issues, challenges, techniques, and future directions

AU - Sulaiman, Alaa

AU - Omar, Khairuddin

AU - Nasrudin, Mohammad Faidzul

PY - 2019/4/1

Y1 - 2019/4/1

N2 - In this era of digitization, most hardcopy documents are being transformed into digital formats. In the process of transformation, large quantities of documents are stored and preserved through electronic scanning. These documents are available from various sources such as ancient documentation, old legal records, medical reports, music scores, palm leaf, and reports on security-related issues. In particular, ancient and historical documents are hard to read due to their degradation in terms of low contrast and existence of corrupted artefacts. In recent times, degraded document binarization has been studied widely and several approaches were developed to deal with issues and challenges in document binarization. In this paper, a comprehensive review is conducted on the issues and challenges faced during the image binarization process, followed by insights on various methods used for image binarization. This paper also discusses the advanced methods used for the enhancement of degraded documents that improves the quality of documents during the binarization process. Further discussions are made on the effectiveness and robustness of existing methods, and there is still a scope to develop a hybrid approach that can deal with degraded document binarization more effectively.

AB - In this era of digitization, most hardcopy documents are being transformed into digital formats. In the process of transformation, large quantities of documents are stored and preserved through electronic scanning. These documents are available from various sources such as ancient documentation, old legal records, medical reports, music scores, palm leaf, and reports on security-related issues. In particular, ancient and historical documents are hard to read due to their degradation in terms of low contrast and existence of corrupted artefacts. In recent times, degraded document binarization has been studied widely and several approaches were developed to deal with issues and challenges in document binarization. In this paper, a comprehensive review is conducted on the issues and challenges faced during the image binarization process, followed by insights on various methods used for image binarization. This paper also discusses the advanced methods used for the enhancement of degraded documents that improves the quality of documents during the binarization process. Further discussions are made on the effectiveness and robustness of existing methods, and there is still a scope to develop a hybrid approach that can deal with degraded document binarization more effectively.

KW - Accuracy

KW - Binarization

KW - Computing

KW - Document degradation

KW - Image artefacts

KW - Image enhancement

KW - Image manipulation

KW - Image quality

KW - OCR

UR - http://www.scopus.com/inward/record.url?scp=85067654743&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067654743&partnerID=8YFLogxK

U2 - 10.3390/jimaging5040048

DO - 10.3390/jimaging5040048

M3 - Review article

AN - SCOPUS:85067654743

VL - 5

JO - Journal of Imaging

JF - Journal of Imaging

SN - 2313-433X

IS - 4

M1 - 48

ER -