#StackBounty: #python #python-tesseract Using pytesseract ocr in pythonanywhere for non english languages

Bounty: 50

I am creating a website in pythonanywhere for OCR.In this user can upload text-images and download it in editable format. For english language it is working perfectly, but while i try to include some additional languages (south Indian languages) it showing some error messages.

i put my additional traineddata in folder "/home/wiltomalayalamocr/mysite/langfiles" it contains "mal.traineddata" file

and in my code

        pytesseract.pytesseract.tesseract_cmd = r"/usr/bin/tesseract"
        custom_oem_psm_config = '-l {} --psm {} --tessdata-dir "/home/wiltomalayalamocr/mysite/langfiles"'.format(lang,6)
        text = pytesseract.image_to_string(Image.open(filename) , config=custom_oem_psm_config)

in which lang="mal"
but i am getting the error

pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v3.04.01 with Leptonica Error opening data file /usr/share/tesseract-ocr/tessdata/mal.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'mal' Tesseract couldn't load any languages! Could not initialize tesseract.')

i am using python-Flask framework

Anybody can help me ….


Get this bounty!!!

#StackBounty: #python #python-3.x #ocr #python-tesseract Why can't get string with PIL and pytesseract?

Bounty: 150

It is a simple Optical Character Recognition (OCR) program in Python 3 to get string, I have uploaded the target gif file here, please download it and save it as /tmp/target.gif.

enter image description here

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract
print(pytesseract.image_to_string(Image.open('/tmp/target.gif')))

I paste all the error info here, please fix it to get the characters from image.

/usr/lib/python3/dist-packages/PIL/Image.py:925: UserWarning: Couldn't allocate palette entry for transparency
  "for transparency")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pytesseract/pytesseract.py", line 309, in image_to_string
    }[output_type]()
  File "/usr/local/lib/python3.5/dist-packages/pytesseract/pytesseract.py", line 308, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "/usr/local/lib/python3.5/dist-packages/pytesseract/pytesseract.py", line 208, in run_and_get_output
    temp_name, input_filename = save_image(image)
  File "/usr/local/lib/python3.5/dist-packages/pytesseract/pytesseract.py", line 136, in save_image
    image.save(input_file_name, format=img_extension, **image.info)
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 1728, in save
    save_handler(self, fp, filename)
  File "/usr/lib/python3/dist-packages/PIL/GifImagePlugin.py", line 407, in _save
    _get_local_header(fp, im, (0, 0), flags)
  File "/usr/lib/python3/dist-packages/PIL/GifImagePlugin.py", line 441, in _get_local_header
    transparency = int(transparency)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

I convert it with convert command in bash.

convert  "/tmp/target.gif"   "/tmp/target.jpg"

I show /tmp/target.gif and /tmp/target.jpg here.
enter image description here

Then execute the above python code again.

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract
print(pytesseract.image_to_string(Image.open('/tmp/target.jpg')))

Nothing can i get with the pytesseract.image_to_string(Image.open('/tmp/target.jpg')),i get blank character.

enter image description here


Get this bounty!!!