Scan Tailor is an interactive post-processing tool for scanned pages.

  • Is there a way to convert "photographed text" in to just "text" in a .pdf file?
    Scantailor ( is the tool for self-scanned books that exist in images (png, jpg, etc). However, I usually use Irfanview with PDF plugin ( - download both Irfanview and the Plugins from this home page) I have elsewhere in r/PDF shown how you can do batch splitting of two-page scans, clean up muddy pages (yellowed or browned) . In the Reddit search box, search for... - Source: Reddit / about 1 month ago
  • Is anyone backing up/reploading's scanned books to Libgen, etc.?
    Scantailor might be useful. - Source: Reddit / about 1 month ago
  • Where do you BUY your ebooks from?
    Scantailor is a good open source option that has a lot of features centered towards this process. - Source: Reddit / 2 months ago
  • OCRmyPDF: Add an OCR text layer to scanned PDF file
    I use OCRmyPDF on a regular basis to OCR journal articles my library sends me. I've found it works great on English but (with appropriate language packs installed) works poorly on Greek and Hebrew. It also makes no effort to understand the layout of pages (e.g., tables). The project is fantastic, though. I've often considered building a web frontend that cleans up PDFs and then OCRs them using OCRmyPDF. For... - Source: Hacker News / 9 months ago
  • Holding books open while scanning
    - Load the images into a program called scantailor, its an old program but very solid, free and open source. It loads all the TIFs for post processing, it is able to detect and separate pages, rotate and deskew them, detect the content of the page, and cleans up the scan very nicely. It even detects what part of the content is text and what are images, meaning your images will still be shown in RGB, whereas text... - Source: Reddit / about 1 year ago
  • Ok, I have this book in PDF format, and it comes like this, how do I separate each page to two and put one beneath the other
    You can try using a program like ScanTailor. You will have to import all the images to the program and let run the program on default settings. The only gripe with this option is, the output images are huge. I have not used it in a while. Maybe there are improvements to the program. - Source: Reddit / about 1 year ago
  • Tesseract OCR
    You typically need to pre-process the images. I'd recommend for this (OSS, but. - Source: Hacker News / over 1 year ago
  • Is there a efficient way to Ankify 陕西历史博物馆 Museum exhibit plaques?
    After this, make a new project in Scan Tailor. This lets you fix up images and allows OCR apps to read the text much better. - Source: Reddit / over 1 year ago

