The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy. Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy), an open source embeddings database. - Source: Hacker News / 8 months ago
If you want to go larger you could still use some simple setup in conjunction with faiss, annoy or hnsw. Source: 10 months ago
I then use annoy to compare them. Annoy can use different measures for distance, like cosine, euclidean and more. Source: 11 months ago
Yes you can do this for equality predicates if your row groups are sorted . This blog post (that I didn't write) might add more color. You can't do this for any kind of text searching. If you need to do this with file based storage I'd recommend using a vector based text search and utilize a ANN index library like Annoy. Source: 11 months ago
If you need large scale (1000+ dimension, millions+ source points, >1000 queries per second) and accept imperfect results / approximate nearest neighbors, then other people have already mentioned some of the best libraries (FAISS, Annoy). Source: 12 months ago
Would be possible to further speed up the process with using something like ANNOY? https://github.com/spotify/annoy. Source: 12 months ago
I like Faiss but I tried Spotify's annoy[1] for a recent project and was pretty impressed. Since lots of people don't seem to understand how useful these embedding libraries are here's an example. I built a thing that indexes bouldering and climbing competition videos, then builds an embedding of the climber's body position per frame. I then can automatically match different climbers on the same problem. It works... - Source: Hacker News / about 1 year ago
If you just want quick in memory search then pynndescent is a decent option: it's easy to install, and easy to get running. Another good option is Annoy; it's just as easy to install and get running with python, but it is a little less performant if you want to do a lot of queries, or get a knn-graph quickly. Source: about 1 year ago
Probably I won't be bale to explain better than it's stated on annoy page: https://github.com/spotify/annoy But the bottom line is speed. Instead of computing similarities of embeddings one by one you do it via index that works way faster. Source: over 1 year ago
Perhaps you can store your embeddings anywhere (sql or even a file) and use Approximate Nearest Neighbors like https://github.com/spotify/annoy for comparison? Source: over 1 year ago
Hi, I have a huge list of hashes of images, that I have to compare and find matching items and delete duplicates. Is there something similar to spotify/annoy in Rust or BK-Tree/VP-Tree implementation? Thanks. Source: over 1 year ago
Is your music recommendation system open source? Would be down to check it out and learn a thing or two from it. On the topic of vector search, I'm fairly certain that Spotify still uses Annoy (https://github.com/spotify/annoy). Like Faiss, it's a great library but not quite a database, which would ideally have features like replication (https://milvus.io/docs/replica.md), caching, and access control, to name a few. - Source: Hacker News / over 1 year ago
To improve the running time you could try an approximate algorithm: https://github.com/spotify/annoy/. Source: over 1 year ago
Ducks, the story: I was using Python in-memory vector search engine called Annoy [1] to do semantic search on various kinds of data. It worked great for finding "similar" objects. Story A has similar text to story B, image A looks like image B, etc. But doing basic metadata lookups was surprisingly hard. How do I get all images matching some criteria (say, size range, or tags)? I'd have to serialize them all into... - Source: Hacker News / over 1 year ago
The actual data that is used by Spotify that is in fast storage is likely in a compressed feature vector format (see https://github.com/spotify/annoy) that makes no sense to humans. The process of getting the “raw” data likely isn’t optimized; and the business has no appetite in optimizing this process because no one has literally died from not getting their raw data in 10 seconds. Source: over 1 year ago
Oh, like spotify Annoy I am also like beginner level so I don't understand quickly but I am definitely getting there. Thanks. Source: over 1 year ago
It'd definitely be a nice-to-have. Luckily it shouldn't be to hard to create a custom estimator using something like Spotify's Annoy library. I might try it out whenever I come back and revisit the project. Source: almost 2 years ago
Add your examples to the index and build the trees in annoy. I feel like its straight forward. There you have to provide the dimension of the features which is the feature vector you get. In my case I am reusing (without fine-tuning) the effecientNetB3 without the last layer. Hence it results in feature vectors with 1536 dimensions. Https://github.com/spotify/annoy. Source: almost 2 years ago
Approximate Nearest Neighbors is what Spotify uses for music recommendations: Https://github.com/spotify/annoy. Source: about 2 years ago
Embeddings - The embeddings index file. This is an Approximate Nearest Neighbor (ANN) index with either Faiss (default), Hnswlib or Annoy, depending on the settings. - Source: dev.to / about 2 years ago
Take for example Spotifies implementation of ANN https://github.com/spotify/annoy. Source: over 2 years ago
Do you know an article comparing Annoy to other products?
Suggest a link to a post with product alternatives.
This is an informative page about Annoy. You can review and discuss the product here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.