Document Image Enhancement for better OCR Recognition

Ram Krishna Pandey and A G Ramakrishnan

Introduction

In this project we are working on enhancing the quality of low resolution degraded historical Tamil document images for improving its OCR recognition.

Recognition of low-resolution historical document image by optical character recognizers (OCR) is very challenging almost (NP-Hard problem). This problem can be thought to be addressed in multiple ways or its combination i.e. at classification, post processing using language model, preprocessing etc.

Model

Document Image Enhancement increases OCR accuracy

In this project our task is to enhance the quality of such document images at preprocessing stage so the recognition of such type of document images improves and we don’t have to change the design of existing OCR further, the enhancement of such document images should also take less time.

The enhancement of the quality means that PSNR, OCR CHARACTER and WORD level accuracy should improve. For enhancing such degraded low resolution so far we have developed a Deep Learning based architecture which can improve the quality of the images by a relative improvement 140% at the character level.