Software Alternatives, Accelerators & Startups

Scrapy

A Fast and Powerful Scraping and Web Crawling Framework.

Scrapy

Scrapy Reviews and Details

This page is designed to help you find out whether Scrapy is good and if it is the right choice for you.

Screenshots and images

  • Scrapy Landing page
    Landing page //
    2021-10-11

Features & Specs

  1. Efficiency

    Scrapy is designed to be efficient and robust, capable of handling multiple tasks simultaneously and scraping large websites in a fast and reliable manner.

  2. Built-in Tooling

    Scrapy comes with built-in tools for handling common tasks such as following links, extracting data using XPath and CSS, and exporting data in a variety of formats.

  3. Customization

    Scrapy offers extensive customization options, allowing users to build complex spiders and modify their behavior through middleware and pipelines.

  4. Python Integration

    Being a Python framework, Scrapy integrates seamlessly with the Python ecosystem, enabling the use of libraries like Pandas, NumPy, and others to process and analyze scraped data.

  5. Community Support

    Scrapy has a large and active community, providing extensive documentation, tutorials, and third-party extensions to enhance functionality.

  6. Asynchronous Processing

    Scrapyโ€™s asynchronous processing model enhances performance by allowing multiple concurrent requests, reducing the time required for crawling sites.

Badges & Trophies

Promote Scrapy. You can add any of these badges on your website.

SaaSHub badge
Show embed code
SaaSHub badge
Show embed code

Videos

Python Scrapy Tutorial - 22 - Web Scraping Amazon

Scrapy - Overview and Demo (web crawling and scraping)

GFuel LemoNADE Taste Test & Review! | Scrapy

Social recommendations and mentions

We have tracked the following product recommendations or mentions on various public social media platforms and blogs. They can help you see what people think about Scrapy and what they use it for.
  • How to write and publish a Python package to PyPI
    This guide walks through the full process using uv, a fast, modern Python toolchain that replaces pip, virtualenv, pip-tools, twine, and build with a single tool. We will write a reusable Scrapy download handler, structure it as a proper Python package, test it, and publish it to PyPI. - Source: dev.to / 3 days ago
  • How to tell if a page uses JavaScript rendering (and what to do about it)
    In Scrapy, Zyte API integrates via the scrapy-zyte-api package:. - Source: dev.to / 3 days ago
  • How to Use rs-trafilatura with Scrapy
    Scrapy is the standard Python framework for web scraping. It handles crawling, scheduling, and data pipelines. rs-trafilatura plugs into Scrapy as an item pipeline โ€” your spider yields items with HTML, and the pipeline adds structured extraction results automatically. - Source: dev.to / about 1 month ago
  • Current problems and mistakes of web scraping in Python and tricks to solve them!
    One might ask, what about Scrapy? I'll be honest: I don't really keep up with their updates. But I haven't heard about Zyte doing anything to bypass TLS fingerprinting. So out of the box Scrapy will also be blocked, but nothing is stopping you from using curl_cffi in your Scrapy Spider. - Source: dev.to / over 1 year ago
  • Automate Spider Creation in Scrapy with Jinja2 and JSON
    Install scrapy (Offical website) either using pip or conda (Follow for detailed instructions):. - Source: dev.to / almost 2 years ago
  • Analyzing Svenskalag Data using DBT and DuckDB
    Using Scrapy I fetched the data needed (activities and attendance). Scrapy handled authentication using a form request in a very simple way:. - Source: dev.to / almost 2 years ago
  • Scrapy Vs. Crawlee
    Scrapy is an open-source Python-based web scraping framework that extracts data from websites. With Scrapy, you create spiders, which are autonomous scripts to download and process web content. The limitation of Scrapy is that it does not work very well with JavaScript rendered websites, as it was designed for static HTML pages. We will do a comparison later in the article about this. - Source: dev.to / almost 2 years ago
  • What is SERP? Meaning, Use Cases and Approaches
    While there is no specific library for SERP, there are some web scraping libraries that can do the Google Search Page Ranking. One of them which is quite famous is Scrapy - It is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It offers rich developer community support and has been used by more than 50+ projects. - Source: dev.to / over 2 years ago
  • Creating an advanced search engine with PostgreSQL
    If you're looking for a turn-key solution, I'd have to dig a little. I generally write a scraper in python that dumps into a database or flat file (depending on number of records I'm hunting). Scraping is a separate subject, but once you write one you can generally reuse relevant portions for many others. If you can get adept at a scraping framework like Scrapy you can do it fairly quickly, but there aren't many... - Source: Hacker News / almost 3 years ago
  • What do .NET devs use for web scraping these days?
    I know this might not be a good answer, as it's not .NET, but we use https://scrapy.org/ (Python). Source: almost 3 years ago
  • BeutifulSoup and getting URLs
    Take a look at Scrapy. It has a fairly advanced throttling mechanism for you to not get banned. Source: almost 3 years ago
  • Looking for a Python (or R) program or package to save only images from any plain vanilla website
    Not only Windows, you can also use it on Mac and Linux too. But for Python and CLI, you can use scrapy. Source: about 3 years ago
  • Automating Amazon Price Tracking with Python
    The first step in automating Amazon price tracking with Python is to scrape the product pages of Amazon.com for the desired product. To do this, you can use a web scraping library like BeautifulSoup or Scrapy. In the following example, we will use BeautifulSoup to scrape the product page for a MacBook Pro on Amazon.com:. - Source: dev.to / about 3 years ago
  • WebScraping
    Lots of good suggestions here -- wanted to suggest the python tool, https://scrapy.org. Source: about 3 years ago
  • What are the most underrated python libraries?
    Scrapy โ€“ one of the most comprehensive web scraping frameworks available for Python developers. Scrapy was designed around speed, reliability and extensibility โ€“ allowing users to quickly extract data from websites with minimal effort thanks to its powerful spiders that automatically traverse through whole sites from page-to-page until all relevant information has been scraped off them. Source: about 3 years ago
  • Show HN: SiteGPT โ€“ Create ChatGPT-like chatbots trained on your website content
    Not to go full "Dropbox in a weekend", but if you're technical enough to self-host, this is something you can build for yourself Everyone is going straight to embeddings, but it'd be easy enough to use old school NLP summarization from NLTK (https://www.nltk.org/) Hook that up a web scraping library like https://scrapy.org/ and get a summary of each page. Then embed a site map in your system prompt and use... - Source: Hacker News / about 3 years ago
  • Celery lock a variable between two processes
    In general celery tasks should be idempotent if possible, for scraping consider if Scrapy might not be more appropriate, it already implements a lot of the rate limiting/retrying you have to replicate in celery yourself. But regarding locking you are right to consider databases/redis since celery workers might run on entirely different machines even. In the case of a paginated scrape with celery, you could... Source: about 3 years ago
  • fastest web scraping options
    You can use automation tools like Selenium or Playwright. You can work with a full-fledged framework such as Scrapy. I also recently discovered a Python tool like selectolax Lexbor, which allows you to extract data very quickly. Source: about 3 years ago
  • Scrapy extension blocking login to AVer PTZ Camera (CAM520 Pro)
    This is not related to https://scrapy.org/ and so not related to this subreddit either. Source: about 3 years ago
  • What steps do you apply in the "L" when doing ELT?
    The sha256 is there establish the uniqueness of the file. It isnโ€™t great for capturing whether or not you have already seen the file before, tho, because it is rather expensive to calculate (imagine your csv file were gigabytes on size โ€” you would have to stream in whole file down in order to see if it had changed!). In the past I have used a sha256 of information that the server hosting the file gives me about... Source: about 3 years ago
  • How to run webs scraping script every 15 minutes
    You may want to check out [estela](https://estela.bitmaker.la/docs/), which is a spider management solution, developed by [Bitmaker](https://bitmaker.la) that allows you to run [Scrapy](https://scrapy.org) spiders. Source: about 3 years ago

Summary of the public mentions of Scrapy

Scrapy has emerged as a prominent player in the web scraping and data extraction landscape, and public sentiment reflects its widespread acceptance and reliance by developers. The framework, renowned for its open-source status and robust capabilities, facilitates the streamlined creation of spiders for automated data scraping tasks. Professional discourse and community feedback highlight several key attributes, strengths, and limitations of Scrapy that shape its standing in the competitive market of web scraping tools.

Strengths:

  1. Comprehensive Framework: Scrapy is celebrated for its comprehensive and high-level framework. It stands out as one of the most extensive web scraping frameworks in Python, designed to swiftly and efficiently extract data from websites. The focus on speed, reliability, and extensibility makes it a preferred choice among developers who value these attributes in a scraping tool.

  2. Cross-Platform Portability: The framework boasts impressive portability, operating seamlessly across various platforms such as Windows, Linux, Mac, and BSD. This versatility ensures that developers can deploy Scrapy in diverse environments without significant compatibility concerns.

  3. Developer Community Support: Scrapy benefits from a vibrant and supportive developer community. This collective knowledge base offers substantial resources, easing the learning curve for new users and providing solutions to common challenges. The rich ecosystem of plugins and extensions further enhances its functionality.

  4. Automation and Efficiency: By allowing users to write just the rules, Scrapy simplifies the process of deploying spiders, which autonomously crawl and extract data. This automation significantly reduces manual intervention, enabling users to focus on more critical tasks such as data analysis and integration.

Limitations:

  1. JavaScript Rendering: While Scrapy excels in scraping static HTML pages, it faces challenges with JavaScript-rendered websites. This limitation necessitates additional configurations or complementary tools like Selenium for handling dynamic content, thereby potentially increasing the complexity of scraping tasks.

  2. Adaptability to Website Changes: Scrapy spiders can be sensitive to changes in website structure. The adaptability of spiders depends greatly on the skill of the developer and the robustness of the spiderโ€™s coding. Maintaining spiders for sites with frequent updates can require substantial effort.

  3. Out-of-the-Box Functionality: Though Scrapy offers a robust framework, it may not provide immediate out-of-the-box solutions for all web scraping scenarios, especially those involving advanced fingerprinting techniques like TLS fingerprinting. This gap underscores the need for additional tools or configurations.

Conclusion:

The public perception of Scrapy underscores its role as a dependable and powerful tool for web scraping within the data extraction domain. While it offers extensive capabilities for efficiently managing and executing scraping tasks, users should be mindful of its limitations concerning JavaScript-rendered websites and evolving site architectures. The frameworkโ€™s extensibility and strong community support serve to mitigate these challenges, making it a formidable choice for seasoned developers and businesses looking to harness structured web data effectively. As technology evolves, continuous updates and enhancements to Scrapy can further consolidate its position among its competitors.

Do you know an article comparing Scrapy to other products?
Suggest a link to a post with product alternatives.

Suggest an article

Scrapy discussion

Log in or Post with
  1. User avatar
    Skoshart
    ยท over 1 year ago
    ยท Reply

    Useful tool.

Is Scrapy good? This is an informative page that will help you find out. Moreover, you can review and discuss Scrapy here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.