Extract Highlighted Text from a Book using Python

Books & Reference Data Science And Machine Learning OCR

OpenCV Landing Page
1

OpenCV

OpenCV is the world's biggest computer vision library
Pricing:
- Open Source
I'm going to use the OpenCV library and its Python interface. OpenCV encompasses many functions for object detection in image and video, machine learning, image processing and many more.

#Data Science And Machine Learning #Data Science Tools #Computer Vision 50 social mentions
Gutenberg Books Landing Page

2

Gutenberg Books

Gutenberg Books is free to use Android, and iOS app with more than 50,000 titles from classic to top hits and features all the important books ever published in history.

Optical Character Recognition (OCR) is a process to extract written or printed text from a document - such as an image - and to convert it into digital text that can be used for further processing, e.g. To index this text in a database and access it via a search engine. Some may remember the effort of Google to digitize every book on the planet and make it available via their Google Books search, or Project Guttenberg which digitizes and provides public domain books.

#Books & Reference #eBook Reader #Ebooks 130 social mentions
Tesseract Landing Page

3

Tesseract

Tesseract is an optical character recognition engine for various operating systems

I'm going to use the Tesseract OCR engine and library, and its Python wrapper PyTesseract for text extraction. But there are numerous libraries out there to extract text from an image. In a real world application I would probably use cloud services from AWS, Google or Microsoft to handle this task.

#OCR #Image Recognition #PDF Editor 72 social mentions