Based on our record, txtai should be more popular than Annoy. It has been mentiond 63 times since March 2021. We are tracking product recommendations and mentions on various public social media platforms and blogs. They can help you identify which product is more popular and what people think of it.
Txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. - Source: dev.to / 15 days ago
I tend to agree with this sentiment. Many junior devs and/or those in college want to contribute. Then they feel entitled to merge a PR that they worked hard on often without guidance. I'm all for working with people but projects have standards and not all ideas make sense. In many cases, especially with commercial open source, the project is the base of a companies identity. So it's not just for drive-by ideas to... - Source: Hacker News / about 2 months ago
Bootstrapping only works if you have the runway to do it and you don't feel the need to grow fast. With NeuML (https://neuml.com), I've went the bootstrapping route. I've been able to build a fairly successful open source project (txtai 6K stars https://github.com/neuml/txtai) and a revenue positive company. It's a "live within your means" strategy. VC funding can have... - Source: Hacker News / 4 months ago
I agree that in many cases people are puffing their feathers to try to be something they're not (at least not yet). Some believe in the fake it until you make it mentality. With NeuML (https://neuml.com), the website is a simple HTML page. On social media, I'm honest about what NeuML is, that I'm in my 40s with a family and not striving to be the next Steve Jobs. I've been able to build a fairly successful open... - Source: Hacker News / 5 months ago
I'll add txtai (https://github.com/neuml/txtai) to the list. There is still plenty of room for innovation in this space. Just need to focus on the right projects that are innovating and not the ones (re)working on problems solved in 2020/2021. - Source: Hacker News / 5 months ago
The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy. Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy), an open source embeddings database. - Source: Hacker News / 10 months ago
If you want to go larger you could still use some simple setup in conjunction with faiss, annoy or hnsw. Source: 12 months ago
I then use annoy to compare them. Annoy can use different measures for distance, like cosine, euclidean and more. Source: about 1 year ago
Yes you can do this for equality predicates if your row groups are sorted . This blog post (that I didn't write) might add more color. You can't do this for any kind of text searching. If you need to do this with file based storage I'd recommend using a vector based text search and utilize a ANN index library like Annoy. Source: about 1 year ago
If you need large scale (1000+ dimension, millions+ source points, >1000 queries per second) and accept imperfect results / approximate nearest neighbors, then other people have already mentioned some of the best libraries (FAISS, Annoy). Source: about 1 year ago
Weaviate - Welcome to Weaviate
Scikit-learn - scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language.
Vespa.ai - Store, search, rank and organize big data
Milvus - Vector database built for scalable similarity search Open-source, highly scalable, and blazing fast.
Qdrant - Qdrant is a high-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Vectara Neural Search - Neural search as a service API with breakthrough relevance