Go home now Header Background Image
Search
Submission Procedure
share: |
 
Follow us
 
 
 
 
Volume 25 / Issue 6

available in:   PDF (2 MB) PS (17 MB)
 
get:  
Similar Docs BibTeX   Write a comment
  
get:  
Links into Future
 
DOI:   10.3217/jucs-025-06-0627

 

Fast Binarization of Unevenly Illuminated Document Images Based on Background Estimation for Optical Character Recognition Purposes

Hubert Michalak (West Pomeranian University of Technology, Poland)

Krzysztof Okarma (West Pomeranian University of Technology, Poland)

Abstract: One of the key operations during the image preprocessing step in Optical Character Recognition (OCR) algorithms is image binarization. Although for uniformly illuminated images, obtained typically by atbed scanners, the use of a single global threshold may be sufficient for further recognition of individual characters, it cannot be applied directly in case of non-uniform lightened document images. Such problem may occur during capturing photos of documents in unknown lighting conditions making a proper text recognition impossible in some parts of the image.

Since the application of popular adaptive thresholding methods, e.g. Niblack, Sauvola and their modifications, based on the analysis of the neighbourhood of each pixel is time consuming, a faster solution might be the division of images into blocks or elimination of non-uniform background. Such an approach can be considered as a balance solution filling the gap between global and local adaptive thresholding. The solution proposed in the paper, useful also for various mobile devices due to limited computational requirements, is based on the approximation of lighting distribution of the background using the reduced resolution images. The proposed method allows to obtain very good OCR results being superior in comparison to typical adaptive binarization algorithms both in terms of the resulting OCR accuracy and computational efficiency.

Keywords: OCR, binarization, document image analysis

Categories: I.4.10, I.4.6, I.7.5