Software Alternatives & Reviews

Scrapy Reviews

Scrapy | A Fast and Powerful Scraping and Web Crawling Framework

Social recommendations and mentions

We have tracked the following product recommendations or mentions on Reddit and HackerNews. They can help you see what people think about Scrapy and what they use it for.
  • Celery lock a variable between two processes
    In general celery tasks should be idempotent if possible, for scraping consider if Scrapy might not be more appropriate, it already implements a lot of the rate limiting/retrying you have to replicate in celery yourself. But regarding locking you are right to consider databases/redis since celery workers might run on entirely different machines even. In the case of a paginated scrape with celery, you could... - Source: Reddit / 18 days ago
  • fastest web scraping options
    You can use automation tools like Selenium or Playwright. You can work with a full-fledged framework such as Scrapy. I also recently discovered a Python tool like selectolax Lexbor, which allows you to extract data very quickly. - Source: Reddit / 21 days ago
  • Scrapy extension blocking login to AVer PTZ Camera (CAM520 Pro)
    This is not related to https://scrapy.org/ and so not related to this subreddit either. - Source: Reddit / 21 days ago
  • What steps do you apply in the "L" when doing ELT?
    The sha256 is there establish the uniqueness of the file. It isn’t great for capturing whether or not you have already seen the file before, tho, because it is rather expensive to calculate (imagine your csv file were gigabytes on size — you would have to stream in whole file down in order to see if it had changed!). In the past I have used a sha256 of information that the server hosting the file gives me about... - Source: Reddit / about 1 month ago
  • How to run webs scraping script every 15 minutes
    You may want to check out [estela](https://estela.bitmaker.la/docs/), which is a spider management solution, developed by [Bitmaker](https://bitmaker.la) that allows you to run [Scrapy](https://scrapy.org) spiders. - Source: Reddit / about 1 month ago
  • Extracting JSON data
    Hi, in this case the data is in the html itself (no data.json). You can use this xpath to get the data: //div[@id="vue-match-centre"]/@q-data There are many ways to get this info, the one my company uses is the scrapy framework. Here is some code that uses scrapy to get this data into a json file:. - Source: Reddit / about 1 month ago
  • What Python library is the best to scrap from OpenCritic?
    I recommend using Scrapy as that is what we use at my place of work, Bitmaker (bitmaker.la). An example spider would look like this:. - Source: Reddit / about 2 months ago
  • Is there a program available for bulk image reverse searching?
    In the past I used stuff like beautifulsoup for webscraping but I’ve heard good things about https://scrapy.org/. - Source: Reddit / about 2 months ago
  • How I used Scrapy for my ML Project
    I wanted to invest my time and energy in learning the fastest, most efficient one, that can scale with my as my projects get more and more complex scrapy. After all, I want my projects to shine so bright in my cv it blinds the recruiter's eyes. - Source: dev.to / about 2 months ago
  • scrapy.Request(url, callback) vs response.follow(url, callback)
    The fist question I am asking is merely for general understanding. And for the record, the second sentence on the 'Requests and Response' section of scrapy.org is:. - Source: Reddit / 2 months ago
  • Looking for a good open source web scraping tool
    Scrapy was a good option when I had to use it, built with Python and built-in framework to scrape lots of pages (crawling) https://scrapy.org. - Source: Reddit / 2 months ago
  • Anybody here versed in AI, machine learning, deep learning, etc.?
    You can use 'web scraping' if it's legal, to then scrape all the data from the web, into text, then get all of that data. How you do this, is if you're using chrome, right-click on a particular value in the table, then click 'inspect'. Then it will direct you to the HTML for that element in 'developer tools'. Right-click that HTML responsible for it, then do 'copy', then you will see a list of options, I like to... - Source: Reddit / 3 months ago
  • Python web scraping exercise for beginners
    I'd suggest looking into scrapy for scraping websites. It was a number of built in features that help with basic scraping activities, and organized to help you scale a scraper across multiple sites. - Source: Reddit / 4 months ago
  • looking for recommendations on text for python / web scraping!
    You can check on scrapy.org. A powerful tool and easy to learn. - Source: Reddit / 4 months ago
  • Python and Selenium are better for scraping data
    As someone that's built a couple scrappers in his job, just use Scrapy. - Source: Reddit / 4 months ago
  • Weekend Discussion Thread for the Weekend of November 18, 2022
    I did some in the past by writing lots of python boilerplate around requests for the HTTP requests and lxml for the parsing, but I think today you can go pretty far with a specialized framework like scrapy: https://scrapy.org/. - Source: Reddit / 4 months ago
  • Tool to Scrape Manuals and Sensitive PDFs to Generate Stronger Wordlists for Lateral Movement and Initial Access
    Surprised at the name of this project given there is an incredibly popular project called scrapy related to web scraping. This project would really benefit from a rebrand. - Source: Reddit / 5 months ago
  • What are some cool things you've automated with python?
    I was looking for a used cars. I written a scraper using Scrapy, that was gathering all new offers, filtered by my criteria, every hour. Then it was sending me nicely formatted email. - Source: Reddit / 5 months ago
  • 13 ways to scrape any public data from any website
    Scrapy is a high-level webscraping framework designed to scrape data at scale and can be used to create a whole ETL pipeline. - Source: dev.to / 6 months ago
  • Big News at The Extract Summit in London
    Of course, the scrapy framework was in the spotlight. If you are interested, you can find the solution is here. - Source: dev.to / 6 months ago
  • Why is Playwright and Puppeteer so slow? Am I using it wrong?
    I honestly don't know. I personally is Python and Scrapy rather than C# for webscraping. In my experience you usually won't need the JS often if you spend the time to understand how the target site works. - Source: Reddit / 7 months ago

External sources with reviews and comparisons of Scrapy

Top 15 Best TinyTask Alternatives in 2022
The software is simply deployable via the cloud, or you can host the spiders on your server using Scrapy. Only the rules need to be written; Scrapy will take care of the rest to separate the facts. With Scrapy’s portability and ability to run on Windows, Linux, Mac, and BSD platforms, new features can be added without affecting the program’s core.

Do you know an article comparing Scrapy to other products?
Suggest a link to a post with product alternatives.