Software Alternatives, Accelerators & Startups

ArchiveBox

The open-source, self-hosted internet archiving solution.

(0 reviews)
Pricing:
  • Open Source
  • Free
Platforms:
  • Linux
  • Mac OSX
  • Docker
ArchiveBox

ArchiveBox Reviews and Details

This page is designed to help you find out whether ArchiveBox is good and if it is the right choice for you.

Screenshots and images

  • ArchiveBox Landing page
    Landing page //
    2023-06-13

Features & Specs

  1. Offline website saving

  2. Tagging

  3. Scheduled archiving

  4. Recursive crawling

  5. Media extraction

  6. Article text extraction

  7. Static HTML exports

  8. Full-text search

Badges

Promote ArchiveBox. You can add any of these badges on your website.

SaaSHub badge
Show embed code
SaaSHub badge
Show embed code

Videos

Archiving the Internet Before it All Rots Away (talk by by ArchiveBox founder)

Installing ArchiveBox On Ubuntu 20.04 Using A Hyper-V VM To Preserve OSINT Investigation Findings

Questions & Answers

As answered by people managing ArchiveBox.
  1. Which are the primary technologies used for building your product?

    • Django
    • SQLite
    • Wget
    • Chromium
    • Youtube-dl / yt-dlp
    • singlefile
    • readability
    • mercury
    • git
    • ripgrep
    • sonic
  2. Who are some of the biggest customers of ArchiveBox?

  3. What's the story behind ArchiveBox?

    ArchiveBox aims to enable more of the internet to be saved from deterioration by empowering people to self-host their own archives. The intent is for all the web content you care about to be viewable with common software in 50 - 100 years without needing to run ArchiveBox or other specialized software to replay it.

    Vast treasure troves of knowledge are lost every day on the internet to link rot. As a society, we have an imperative to preserve some important parts of that treasure, just like we preserve our books, paintings, and music in physical libraries long after the originals go out of print or fade into obscurity.

    Whether it's to resist censorship by saving articles before they get taken down or edited, or just to save a collection of early 2010's flash games you love to play, having the tools to archive internet content enables to you save the stuff you care most about before it disappears.

    Image from WTF is Link Rot?... The balance between the permanence and ephemeral nature of content on the internet is part of what makes it beautiful. I don't think everything should be preserved in an automated fashion--making all content permanent and never removable, but I do think people should be able to decide for themselves and effectively archive specific content that they care about.

    Because modern websites are complicated and often rely on dynamic content, ArchiveBox archives the sites in several different formats beyond what public archiving services like Archive.org/Archive.is save. Using multiple methods and the market-dominant browser to execute JS ensures we can save even the most complex, finicky websites in at least a few high-quality, long-term data formats.

  4. Why should a person choose ArchiveBox over its competitors?

    ArchiveBox differentiates itself from similar self-hosted projects by providing both a comprehensive CLI interface for managing your archive, a Web UI that can be used either independently or together with the CLI, and a simple on-disk data format that can be used without either.

    ArchiveBox is neither the highest fidelity nor the simplest tool available for self-hosted archiving, rather it's a jack-of-all-trades that tries to do most things well by default. It can be as simple or advanced as you want, and is designed to do everything out-of-the-box but be tuned to suit your needs.

    If you want better fidelity for very complex interactive pages with heavy JS/streams/API requests, check out ArchiveWeb.page and ReplayWeb.page.

    If you want more bookmark categorization and note-taking features, check out Archivy, Memex, Polar, or LinkAce.

    If you need more advanced recursive spider/crawling ability beyond --depth=1, check out Browsertrix, Photon, or Scrapy and pipe the outputted URLs into ArchiveBox.

  5. How would you describe your primary audience?

    • journalists
    • lawyers
    • librarians
    • digital preservation specialists
    • researchers
    • students
    • homelab / self-hosting community

Social recommendations and mentions

We have tracked the following product recommendations or mentions on various public social media platforms and blogs. They can help you see what people think about ArchiveBox and what they use it for.

Do you know an article comparing ArchiveBox to other products?
Suggest a link to a post with product alternatives.

Suggest an article

ArchiveBox discussion

Log in or Post with

Is ArchiveBox good? This is an informative page that will help you find out. Moreover, you can review and discuss ArchiveBox here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.