Based on our record, Hacker News Search seems to be a lot more popular than CommonCrawl. While we know about 1927 links to Hacker News Search, we've tracked only 91 mentions of CommonCrawl. We are tracking product recommendations and mentions on various public social media platforms and blogs. They can help you identify which product is more popular and what people think of it.
The top answer is written by Justin Skycak (https://www.justinmath.com/) who works on Math Academy (https://www.mathacademy.com/). Math Academy is awesome. I am a happy customer. Previous HN comments about it: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=mathacademy&sort=byDate&type=comment. - Source: Hacker News / 1 day ago
Here's some other posts on Alexander's work: Beautiful Software: Christopher Alexander's research initiative on computing - https://news.ycombinator.com/item?id=34011469 Dec 2009 (30 comments) “A pattern language” explained (2016) - https://news.ycombinator.com/item?id=18644150 Jun 2021 (22 comments) Christopher Alexander: An Introduction for Object-Oriented Designers -... - Source: Hacker News / 2 days ago
Note that my advice is more towards people who want to do an investment, is planning a startup, a company that might grow up, etc > Why is there a need to have specialists just to interface with one's local government True, in theory you shouldn't need it. And more than current officials, there's a lot of legislation that is to blame, but this is besides the point. You consult with specialists because they know... - Source: Hacker News / 2 days ago
What distributed file system would you use for a greenfield homelab project today? Requirements / desires: * Reliable * Performant * Easy to setup and operate Some options: SeaweedFS - https://github.com/seaweedfs/seaweedfs 289 hits: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=seaweedfs&sort=byPopularity&type=all JuiceFS - https://github.com/juicedata/juicefs 2047 hits:... - Source: Hacker News / 5 days ago
FYI the best way to filter by author is 'author:Animats' this will only show results from the user Animats and won't match animats inside the comment text. https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=%22delayed%20ack%22%20author%3AAnimats&sort=byDate&type=comment. - Source: Hacker News / 6 days ago
Common Crawl Foundation | REMOTE | Full and part-time | https://commoncrawl.org/ | web datasets I'm the CTO at the Common Crawl Foundation, which has a 17 year old, 8. - Source: Hacker News / 14 days ago
Https://commoncrawl.org/ is a non-profit which offers a pre-crawled dataset. The specifics of individual tools probably vary. I imagine most tools would be based on academic datasets. - Source: Hacker News / 4 months ago
Should the NYT not sue https://commoncrawl.org/ ? OpenAI just used the data from commoncrawl for training. - Source: Hacker News / 5 months ago
What you’re likely referring to is Common Crawl: https://commoncrawl.org. - Source: Hacker News / 5 months ago
> ... a project called "Nutch" would allow web users to crawl the web themselves. Perhaps that promise is similar to the promises being made about "AI" today. The project did not turn out to be used in the way it was predicted (marketed), or even used by web users at all. Actually Nutch is used to produce the Common Crawl[0] and 60% of GPT-3's training data was Common Crawl[1], so in a way it is being used... - Source: Hacker News / 6 months ago
DuckDuckGo - The Internet privacy company that empowers you to seamlessly take control of your personal information online, without any tradeoffs.
Scrapy - Scrapy | A Fast and Powerful Scraping and Web Crawling Framework
Medium - Welcome to Medium, a place to read, write, and interact with the stories that matter most to you.
StormCrawler - StormCrawler is an open source SDK for building distributed web crawlers with Apache Storm.
40 Hadiths - Hadith Nawawi is an Islamic Android App that is designed with the purpose to enlighten the heart and souls of Muslims around the globe with the authentic teachings of Prophet Muhammad (PBUH).
Apache Nutch - Apache Nutch is a highly extensible and scalable open source web crawler software project.