Software Alternatives & Reviews

Help downloading a book in PDF from archive.org

mitmproxy Tesseract Archive.org
  1. mitmproxy is an SSL-capable man-in-the-middle proxy for HTTP.
    Pricing:
    • Open Source
    If you can't find a 14-day version of the book, your best bet is to find a way to capture the pages as you flip through the book. It's a pain to do, but I've done it for a few books using mitmproxy. It's not for the faint of heart and I'm not doing a tutorial, but if you configure mitmproxy as an HTTPS man in the middle, then you can create a Python script that matches the URL pattern for page images and saves them somewhere. I wrote a script to do that. I'm using Linux.

    #Developer Tools #Security #Software Development 81 social mentions

  2. Tesseract is an optical character recognition engine for various operating systems
    I use Tesseract OCR software to generate individual PDF pages with embedded text, then put them together into a book with PDFtk.

    #OCR #Image Recognition #PDF Editor 72 social mentions

  3. Internet Archive is a non-profit digital library offering free universal access to books, movies...
    That worked for a little while after archive.org started their "1 hour" loans, but it hasn't for some time.

    #Ebooks #Productivity #Bookmark Manager 8506 social mentions

Discuss: Help downloading a book in PDF from archive.org

Log in or Post with