Large vocabulary continuous speech recognition system for Tamil

Madhavaraj A and A G Ramakrishnan

Introduction

My project focuses on building an automatic speech recognition (ASR) system for Tamil language. ASR remains as the heart of many interactive applications like Apple’s Siri, Microsoft’s Cortana, Amazon’s Echo, etc. Such systems can recognize about 1 lakh words at 90% accuracy and have been were developed only for English and other European languages. Not much effort has been put by the researchers to focus on Indian languages.

Architecture

Speech Recognition System

In my work, I use Kaldi (an open-source toolkit) to develop a domain-independent, large vocabulary, speech recognition system for Tamil. The morphological richness of this language makes it difficult to build a Tamil ASR system. My work also focuses on training data insufficiency problem by using speech corpus for other languages and improving the accuracy of Tamil ASR by adapting the model.