Company offering cloud based web scraping and data extraction platform that works not only with HTML pages as data source but also with JS, JSON, XML, documents like iCal, XSLX, XLS, CSV and images. Extracted data kept in the database as dataset which can be downloaded in various formats, retrieved via API or pushed to any other destination upon completion. Integrated with such services like Zapier, Tableau, OSM, Luminati, DeathByCaptcha.
Scrapy - Scrapy | A Fast and Powerful Scraping and Web Crawling Framework
import.io - Import. io helps its users find the internet data they need, organize and store it, and transform it into a format that provides them with the context they need.
Apache Nutch - Apache Nutch is a highly extensible and scalable open source web crawler software project.
Octoparse - Octoparse provides easy web scraping for anyone. Our advanced web crawler, allows users to turn web pages into structured spreadsheets within clicks.
Heritrix - Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web...
Content Grabber - Content Grabber is an automated web scraping tool.