Software Alternatives & Reviews

Top 12 Open-Source Alternatives to CommonCrawl

Scrapy StormCrawler Apache Nutch YaCy Apache Solr Mixnode DuckDuckGo Hacker News Search Mwmbl Search Kagi

Summary

The top open-source alternatives to CommonCrawl are Scrapy, StormCrawler, and Apache Nutch. One of the criteria for ordering this list is the number of mentions that products have on reliable external sources. You can suggest additional sources through the form here.
  1. 1
    Scrapy | A Fast and Powerful Scraping and Web Crawling Framework
    Pricing:
    • Open Source

    #Web Scraping #Data Extraction #Data 93 social mentions

  2. StormCrawler is an open source SDK for building distributed web crawlers with Apache Storm.
    Pricing:
    • Open Source

    #Web Scraping #Data Extraction #Data

  3. Apache Nutch is a highly extensible and scalable open source web crawler software project.
    Pricing:
    • Open Source

    #Web Scraping #Data Extraction #Utilities 2 social mentions

  4. 4
    YaCy is a free search engine that anyone can use to build a search portal for their intranet or to...
    Pricing:
    • Open Source

    #Search Engine #Internet Search #Web Search 71 social mentions

  5. Solr is an open source enterprise search server based on Lucene search library, with XML/HTTP and...
    Pricing:
    • Open Source

    #Custom Search Engine #Custom Search #Search Engine 17 social mentions

  6. Turn the web into a database!
    Pricing:
    • Open Source

    #Web Scraping #Data Extraction #Data

  7. The Internet privacy company that empowers you to seamlessly take control of your personal information online, without any tradeoffs.
    Pricing:
    • Open Source

    #Search Engine #Web Search #Internet Search 1666 social mentions

  8. 10
    Kagi is a privacy-focused, user-centric search engine. Great search experience starts with Kagi!
    Pricing:
    • Open Source

    #Search Engine #Security & Privacy #Privacy Search Engine 120 social mentions

  9. 11
    Scrape Google search results from our fast, easy, and complete API.
    Pricing:
    • Open Source

    #SEO #SEO Tools #APIs 69 social mentions

  10. 12
    Colly is a scraping framework to extract structured data from websites.
    Pricing:
    • Open Source

    #Web Scraping #Data Extraction #Data 9 social mentions

Suggest an alternative
If you think we've missed something, please suggest an alternative to CommonCrawl.
Please use the Feedback button if you think any of the listed products shouldn't be regarded as open-source.

Generic CommonCrawl discussion

Log in or Post with