I tend to agree with this sentiment. Many junior devs and/or those in college want to contribute. Then they feel entitled to merge a PR that they worked hard on often without guidance. I'm all for working with people but projects have standards and not all ideas make sense. In many cases, especially with commercial open source, the project is the base of a companies identity. So it's not just for drive-by ideas to... - Source: Hacker News / 10 days ago
Bootstrapping only works if you have the runway to do it and you don't feel the need to grow fast. With NeuML (https://neuml.com), I've went the bootstrapping route. I've been able to build a fairly successful open source project (txtai 6K stars https://github.com/neuml/txtai) and a revenue positive company. It's a "live within your means" strategy. VC funding can have... - Source: Hacker News / 3 months ago
I agree that in many cases people are puffing their feathers to try to be something they're not (at least not yet). Some believe in the fake it until you make it mentality. With NeuML (https://neuml.com), the website is a simple HTML page. On social media, I'm honest about what NeuML is, that I'm in my 40s with a family and not striving to be the next Steve Jobs. I've been able to build a fairly successful open... - Source: Hacker News / 3 months ago
I'll add txtai (https://github.com/neuml/txtai) to the list. There is still plenty of room for innovation in this space. Just need to focus on the right projects that are innovating and not the ones (re)working on problems solved in 2020/2021. - Source: Hacker News / 3 months ago
Nice project! I've long used Tika for document parsing given it's maturity and wide number of formats supported. The XHTML output helps with chunking documents for RAG. Here's a couple examples: - https://neuml.hashnode.dev/build-rag-pipelines-with-txtai - https://neuml.hashnode.dev/extract-text-from-documents Disclaimer: I'm the primary author of txtai ( - Source: Hacker News / 3 months ago
Txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. - Source: dev.to / 3 months ago
If you're interested in graphs + RAG and want an alternate approach, txtai has a semantic graph component. https://neuml.hashnode.dev/introducing-the-semantic-graph https://github.com/neuml/txtai Disclaimer: I'm the primary author of txtai. - Source: Hacker News / 4 months ago
My perspective as an open source developer of txtai (https://github.com/neuml/txtai). When you get started in open source, it's a great way for a small team to get the word out. Conversely, when starting as proprietary software or SaaS, you're looking at advertising, websites, sales calls and so forth. If an open source company is lucky enough to be successful, the next... - Source: Hacker News / 4 months ago
I agree that RAG doesn't have to be paired with vector search. Other types of search can work in some cases. Where vector search excels is that it can encode a complex question as a vector and does a good job bringing back the top n results. Its not impossible to do some of this with keyword search (term expansion, stopwords and so forth). Vector search just makes it easy. In the end, yes this is a better search... - Source: Hacker News / 5 months ago
With that in mind, txtai now has the capability to easily integrate additional LLM frameworks. While local models through Hugging Face Transformers continues to be the default choice, these additional LLM frameworks broaden the number of options available. - Source: dev.to / 5 months ago
Cool use case, glad to see txtai [1] is helping (I'm the main dev for txtai). Since you're using txtai, this article I just wrote yesterday might be helpful: https://neuml.hashnode.dev/build-rag-pipelines-with-txtai Looks like you've received a lot of great ideas here already though! 1 - https://github.com/neuml/txtai. - Source: Hacker News / 5 months ago
Nice project! I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions. Paperetl (https://github.com/neuml/paperetl) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built... - Source: Hacker News / 5 months ago
I've seen a number of projects come over the last couple years. I'm the author of txtai (https://github.com/neuml/txtai. - Source: Hacker News / 5 months ago
You can try txtai (https://github.com/neuml/txtai). For example, a partial Faiss configuration with 4-bit PQ quantization and only using 5% of the data to train an IVF index is shown below. faiss={"components": "IVF,PQ384x4fs", "sample": 0.05}. - Source: Hacker News / 5 months ago
Adding txtai to the list https://github.com/neuml/txtai txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. Txtai can satisfy most vector database use cases such as being a knowledge source for retrieval augmented generation (RAG). Txtai is independently developed (not VC-backed) and released under an Apache... - Source: Hacker News / 5 months ago
Article: https://neuml.hashnode.dev/all-about-vector-quantization GitHub: https://github.com/neuml/txtai. Source: 6 months ago
Adding txtai to the list for consideration. Couple relevant links. 1. https://neuml.hashnode.dev/custom-api-endpoints 2. https://github.com/neuml/txtai. - Source: Hacker News / 6 months ago
Project is open source and available on GitHub: https://github.com/neuml/txtai. - Source: Hacker News / 6 months ago
If you want an easy way to evaluate Faiss, Hnswlib and Annoy vector backends, check out txtai - https://github.com/neuml/txtai. Txtai also supports NumPy and PyTorch vector storage. Disclaimer: I am the author of txtai. - Source: Hacker News / 6 months ago
More info can be found below. GitHub: https://github.com/neuml/txtai. - Source: Hacker News / 8 months ago
The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy. Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy), an open source embeddings database. - Source: Hacker News / 8 months ago
Do you know an article comparing txtai to other products?
Suggest a link to a post with product alternatives.
This is an informative page about txtai. You can review and discuss the product here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.