Quantxt - Quantxt is a software platform for processing, search, and discovery in unstructured content using NLP techniques.
Tesseract - Tesseract is an optical character recognition engine for various operating systems
FuzzyWuzzy - FuzzyWuzzy is a Fuzzy String Matching in Python that uses Levenshtein Distance to calculate the differences between sequences.
ABBYY FineReader - ABBYY's latest PDF editor software, FineReader 16 you can easily convert files like PDF to Excel, PDF to Word, edit, share, collaborate & more with this PDF editor!
spaCy - spaCy is a library for advanced natural language processing in Python and Cython.
Onlineocr.net - Free Online OCR service allows you to convert PDF document to MS Word file, scanned images to editable text formats and extract text from JPEG/TIFF/BMP files