Sphinx Search might be a bit more popular than Xapian. We know about 10 links to it since March 2021 and only 7 links to Xapian. We are tracking product recommendations and mentions on various public social media platforms and blogs. They can help you identify which product is more popular and what people think of it.
Recoll is free/open source (GPL) that can index PDFs and search them very quickly. It uses Xapian under the hood. I have over 165,000 documents indexed on an old laptop running Linux and can query them all in a split second. Source: over 1 year ago
+ xapian which has been around a while, and while gpl licensed, is quite capable https://xapian.org/. - Source: Hacker News / over 2 years ago
Tangentially related if you need search without the clustering and high availability story of elastic search and friends I highly recommend Xapian. Its like the SQLite of search. Single library that provides the basic set of features you would expect in a quality search experience: facets, ranked search, boolean operators, stemming etc etc. https://xapian.org/. - Source: Hacker News / over 2 years ago
For fast searching, it usually requires indexing the files in question. There are a number of text-file indexing solutions, many of which use xapian, sphinx, or lucene/solr under the hood. Based on conditions (watching files/directories, cron jobs, new-mail triggers, etc), they'll add/remove files to the index, and you can then use a corresponding command to compose queries across that data. If it's indexed, it... Source: over 3 years ago
There is also xapian/recoll https://xapian.org/ which works great for "desktop" search. - Source: Hacker News / over 3 years ago
Sphinx is a search engine that can be integrated into a website to provide advanced search functionality such as full-text, Boolean, and faceted search. It is a powerful open-source search engine that can handle large amounts of data and quickly return results. - Source: dev.to / about 2 years ago
Have been using Sphinx. It does some processing around suffixes, tenses, and so on, and looks at word proximity (BM25), but is definitely limited. Source: over 2 years ago
Lucene is the thing you think you need. Elastic Search is a nice wrapper for it. But these are Java, so maybe you want Sphinx Search (C++) or MeiliSearch (Rust). Source: over 2 years ago
Using a natural language search will almost certainly be a better solution and PHP may not be the best tool for this task. Figure out how you are going to get the text out of the PDF and where you are going to put it. Look at things like sphinx and full text search in boolean mode for doing the keyword matching. Source: almost 3 years ago
In practice though you don't do any of this, you get a library to do it for you. I've used Sphinx Search in the past for some fairly hefty (In the order of terabytes), and there's a good book covering how to get it all set up and started. Source: almost 3 years ago
ElasticSearch - Elasticsearch is an open source, distributed, RESTful search engine.
ElasticHQ - Tool for ElasticSearch management and monitoring.
Apache Solr - Solr is an open source enterprise search server based on Lucene search library, with XML/HTTP and...
Algolia - Algolia's Search API makes it easy to deliver a great search experience in your apps & websites. Algolia Search provides hosted full-text, numerical, faceted and geolocalized search.
Elastic Stack - Meet the search platform that helps you search, solve, and succeed
OpenSearch - OpenSearch is a community-driven, open source search and analytics suite derived from Apache 2.0 licensed Elasticsearch 7.10.2 & Kibana 7.10.2. It consists of a search engine daemon, and a visualization and user interface, OpenSearch Dashboards.