Document Image Enhancement for better Readability and OCR Recognition

Ram Krishna Pandey and A G Ramakrishnan

Introduction

In this project we are working on enhancing the quality of low resolution degraded document images for improving readability and OCR recognition.

Recognition of low-resolution historical document image by optical character recognizers (OCR) is very challenging almost (NP-Hard problem). This problem can be thought to be addressed in multiple ways or its combination i.e. at classification, post processing by using language model, preprocessing etc.

Model

Document Image Enhancement increases OCR accuracy

In this project our task is to enhance the quality of such document images at preprocessing stage so the readability and the recognition of such type of document images improves and we don’t have to change the design of existing OCR.

The enhancement of the quality means: perceptual quality, PSNR, OCR CHARACTER and WORD level accuracy all should improve.