Tesseract is an open source OCR engine for recognizing text in images. The project provides both the libtesseract library and the tesseract command-line program. Tesseract supports Unicode through UTF ...