#StackBounty: #python #python-tesseract Using pytesseract ocr in pythonanywhere for non english languages

Bounty: 50

I am creating a website in pythonanywhere for OCR.In this user can upload text-images and download it in editable format. For english language it is working perfectly, but while i try to include some additional languages (south Indian languages) it showing some error messages.

i put my additional traineddata in folder "/home/wiltomalayalamocr/mysite/langfiles" it contains "mal.traineddata" file

and in my code

        pytesseract.pytesseract.tesseract_cmd = r"/usr/bin/tesseract"
        custom_oem_psm_config = '-l {} --psm {} --tessdata-dir "/home/wiltomalayalamocr/mysite/langfiles"'.format(lang,6)
        text = pytesseract.image_to_string(Image.open(filename) , config=custom_oem_psm_config)

in which lang="mal"
but i am getting the error

pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v3.04.01 with Leptonica Error opening data file /usr/share/tesseract-ocr/tessdata/mal.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'mal' Tesseract couldn't load any languages! Could not initialize tesseract.')

i am using python-Flask framework

Anybody can help me ….

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.