txtai VS Annoy

Annoy

Annoy is a C++ library with Python bindings to search for points in space that are close to a given query point.

Landing page //
2022-11-02

Landing page //
2023-10-10

Introducing txtai

Annoy videos

+ Add

Does Asking for Reviews Annoy My Customers?

Category Popularity

0-100% (relative to txtai and Annoy)

Annoy

Search Engine

74 74%

Search Engine

26% 26

Utilities

60 60%

Utilities

40% 40

Databases

100 100%

Databases

0% 0

Custom Search Engine

61 61%

Custom Search Engine

39% 39

User comments

Share your experience with using txtai and Annoy. For example, how are they different and which one is better?

Social recommendations and mentions

Based on our record, txtai should be more popular than Annoy. It has been mentiond 62 times since March 2021. We are tracking product recommendations and mentions on various public social media platforms and blogs. They can help you identify which product is more popular and what people think of it.

txtai mentions (62)

What contributing to Open-source is, and what it isn't
I tend to agree with this sentiment. Many junior devs and/or those in college want to contribute. Then they feel entitled to merge a PR that they worked hard on often without guidance. I'm all for working with people but projects have standards and not all ideas make sense. In many cases, especially with commercial open source, the project is the base of a companies identity. So it's not just for drive-by ideas to... - Source: Hacker News / 17 days ago
Bootstrap or VC?
Bootstrapping only works if you have the runway to do it and you don't feel the need to grow fast. With NeuML (https://neuml.com), I've went the bootstrapping route. I've been able to build a fairly successful open source project (txtai 6K stars https://github.com/neuml/txtai) and a revenue positive company. It's a "live within your means" strategy. VC funding can have... - Source: Hacker News / 3 months ago
Ask HN: What happened to startups, why is everything so polished?
I agree that in many cases people are puffing their feathers to try to be something they're not (at least not yet). Some believe in the fake it until you make it mentality. With NeuML (https://neuml.com), the website is a simple HTML page. On social media, I'm honest about what NeuML is, that I'm in my 40s with a family and not striving to be the next Steve Jobs. I've been able to build a fairly successful open... - Source: Hacker News / 4 months ago
Are we at peak vector database?
I'll add txtai (https://github.com/neuml/txtai) to the list. There is still plenty of room for innovation in this space. Just need to focus on the right projects that are innovating and not the ones (re)working on problems solved in 2020/2021. - Source: Hacker News / 4 months ago
Show HN: Open-source Rule-based PDF parser for RAG
Nice project! I've long used Tika for document parsing given it's maturity and wide number of formats supported. The XHTML output helps with chunking documents for RAG. Here's a couple examples: - https://neuml.hashnode.dev/build-rag-pipelines-with-txtai - https://neuml.hashnode.dev/extract-text-from-documents Disclaimer: I'm the primary author of txtai ( - Source: Hacker News / 4 months ago

Annoy mentions (35)

Do we think about vector dbs wrong?
The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy. Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy), an open source embeddings database. - Source: Hacker News / 8 months ago
Vector Databases 101
If you want to go larger you could still use some simple setup in conjunction with faiss, annoy or hnsw. Source: 11 months ago
Calculating document similarity in a special domain
I then use annoy to compare them. Annoy can use different measures for distance, like cosine, euclidean and more. Source: 12 months ago
Can Parquet file format index string columns?
Yes you can do this for equality predicates if your row groups are sorted . This blog post (that I didn't write) might add more color. You can't do this for any kind of text searching. If you need to do this with file based storage I'd recommend using a vector based text search and utilize a ANN index library like Annoy. Source: 12 months ago
[D]: Best nearest neighbour search for high dimensions
If you need large scale (1000+ dimension, millions+ source points, >1000 queries per second) and accept imperfect results / approximate nearest neighbors, then other people have already mentioned some of the best libraries (FAISS, Annoy). Source: 12 months ago

What are some alternatives?

When comparing txtai and Annoy, you can also consider the following products

Vectara Neural Search - Neural search as a service API with breakthrough relevance

Qdrant - Qdrant is a high-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Milvus - Vector database built for scalable similarity search Open-source, highly scalable, and blazing fast.

Vespa.ai - Store, search, rank and organize big data

Scikit-learn - scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language.

Weaviate - Welcome to Weaviate

txtai vs Vectara Neural Search

txtai vs Qdrant

txtai vs Milvus

txtai vs Vespa.ai

txtai vs Scikit-learn

txtai vs Weaviate

Annoy vs Vectara Neural Search

Annoy vs Qdrant