Company offering cloud based web scraping and data extraction platform that works not only with HTML pages as data source but also with JS, JSON, XML, documents like iCal, XSLX, XLS, CSV and images. Extracted data kept in the database as dataset which can be downloaded in various formats, retrieved via API or pushed to any other destination upon completion. Integrated with such services like Zapier, Tableau, OSM, Luminati, DeathByCaptcha.
Scrapy - Scrapy | A Fast and Powerful Scraping and Web Crawling Framework
import.io - Import. io helps its users find the internet data they need, organize and store it, and transform it into a format that provides them with the context they need.
StormCrawler - StormCrawler is an open source SDK for building distributed web crawlers with Apache Storm.
Apache Nutch - Apache Nutch is a highly extensible and scalable open source web crawler software project.
ParseHub - ParseHub is a free web scraping tool. With our advanced web scraper, extracting data is as easy as clicking the data you need.
Heritrix - Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web...