No Virtual Podcast Hangouts by Podyssey videos yet. You could help us improve this page by suggesting one.
Based on our record, Scrapy seems to be more popular. It has been mentiond 84 times since March 2021. We are tracking product recommendations and mentions on Reddit, HackerNews and some other platforms. They can help you identify which product is more popular and what people think of it.
In general celery tasks should be idempotent if possible, for scraping consider if Scrapy might not be more appropriate, it already implements a lot of the rate limiting/retrying you have to replicate in celery yourself. But regarding locking you are right to consider databases/redis since celery workers might run on entirely different machines even. In the case of a paginated scrape with celery, you could... - Source: Reddit / 16 days ago
You can use automation tools like Selenium or Playwright. You can work with a full-fledged framework such as Scrapy. I also recently discovered a Python tool like selectolax Lexbor, which allows you to extract data very quickly. - Source: Reddit / 18 days ago
This is not related to https://scrapy.org/ and so not related to this subreddit either. - Source: Reddit / 18 days ago
The sha256 is there establish the uniqueness of the file. It isn’t great for capturing whether or not you have already seen the file before, tho, because it is rather expensive to calculate (imagine your csv file were gigabytes on size — you would have to stream in whole file down in order to see if it had changed!). In the past I have used a sha256 of information that the server hosting the file gives me about... - Source: Reddit / 29 days ago
You may want to check out [estela](https://estela.bitmaker.la/docs/), which is a spider management solution, developed by [Bitmaker](https://bitmaker.la) that allows you to run [Scrapy](https://scrapy.org) spiders. - Source: Reddit / about 1 month ago
Skuuudle - Discover, match and monitor your eCommerce competitors.
Apify - Apify is a web scraping and automation platform that can turn any website into an API.
ParseHub - ParseHub is a free web scraping tool. With our advanced web scraper, extracting data is as easy as clicking the data you need.
Flutter.dev - Build beautiful native apps in record time 🚀
Octoparse - Octoparse provides easy web scraping for anyone. Our advanced web crawler, allows users to turn web pages into structured spreadsheets within clicks.
import.io - Import. io helps its users find the internet data they need, organize and store it, and transform it into a format that provides them with the context they need.