Based on our record, Hacker News Search seems to be a lot more popular than Apache Nutch. While we know about 1927 links to Hacker News Search, we've tracked only 2 mentions of Apache Nutch. We are tracking product recommendations and mentions on various public social media platforms and blogs. They can help you identify which product is more popular and what people think of it.
The top answer is written by Justin Skycak (https://www.justinmath.com/) who works on Math Academy (https://www.mathacademy.com/). Math Academy is awesome. I am a happy customer. Previous HN comments about it: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=mathacademy&sort=byDate&type=comment. - Source: Hacker News / 2 days ago
Here's some other posts on Alexander's work: Beautiful Software: Christopher Alexander's research initiative on computing - https://news.ycombinator.com/item?id=34011469 Dec 2009 (30 comments) “A pattern language” explained (2016) - https://news.ycombinator.com/item?id=18644150 Jun 2021 (22 comments) Christopher Alexander: An Introduction for Object-Oriented Designers -... - Source: Hacker News / 2 days ago
Note that my advice is more towards people who want to do an investment, is planning a startup, a company that might grow up, etc > Why is there a need to have specialists just to interface with one's local government True, in theory you shouldn't need it. And more than current officials, there's a lot of legislation that is to blame, but this is besides the point. You consult with specialists because they know... - Source: Hacker News / 2 days ago
What distributed file system would you use for a greenfield homelab project today? Requirements / desires: * Reliable * Performant * Easy to setup and operate Some options: SeaweedFS - https://github.com/seaweedfs/seaweedfs 289 hits: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=seaweedfs&sort=byPopularity&type=all JuiceFS - https://github.com/juicedata/juicefs 2047 hits:... - Source: Hacker News / 5 days ago
FYI the best way to filter by author is 'author:Animats' this will only show results from the user Animats and won't match animats inside the comment text. https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=%22delayed%20ack%22%20author%3AAnimats&sort=byDate&type=comment. - Source: Hacker News / 6 days ago
Hi, I have read few comments under the post, there are great suggestions also your questions regarding task are on the point. But I believe handling this with a script might be not easy. If I were you, I would use Apache Nutch or similar open source software/library.I have used Nutch for my thesis for similar task that I had to scrap a lot of blog pages and the other pages they were referencing. You can configure... Source: over 1 year ago
I've never used it, but I was on a project where we considered Apache Nutch: https://nutch.apache.org/. Source: over 1 year ago
DuckDuckGo - The Internet privacy company that empowers you to seamlessly take control of your personal information online, without any tradeoffs.
Scrapy - Scrapy | A Fast and Powerful Scraping and Web Crawling Framework
Medium - Welcome to Medium, a place to read, write, and interact with the stories that matter most to you.
StormCrawler - StormCrawler is an open source SDK for building distributed web crawlers with Apache Storm.
40 Hadiths - Hadith Nawawi is an Islamic Android App that is designed with the purpose to enlighten the heart and souls of Muslims around the globe with the authentic teachings of Prophet Muhammad (PBUH).
Heritrix - Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web...