Not sure how relevant this is but note that Coqui TTS (the realistic TTS) has already shut down https://coqui.ai. - Source: Hacker News / about 1 month ago
You can take a look at https://coqui.ai. Source: 8 months ago
I haven't messed with anything more fancy than Festival but I would look at coqui.ai. Source: 11 months ago
This. You can create voice models for TTS with a variety of systems - commercial and free eg https://www.resemble.ai https://coqui.ai etc - and use that with gpt text. But I don’t think you can get gpt to directly do the tts. My guess is OP accidentally made this confusing in their post title. Source: 11 months ago
If so, and you have some basic programming knowledge, I would look at the coqui platform (https://coqui.ai) and/or TTS package (https://github.com/coqui-ai/TTS), it's probably the most beginner-friendly. Depending on the language/model you choose, you'd probably want at least an hour's worth of audio training data (but the more, the better). Source: about 1 year ago
This is easier to do than you might think using OSS tools: https://coqui.ai/. Source: about 1 year ago
As others already mentioned, there's Tortoise-tts. I used coqui.ai a bunch the other day but hit the limit on free usage for my account. I suppose you could just make a new account and retrain all your voices whenever you hit the limit, but that's just gonna make them shut down the free usage trial (if they notice how many people are doing that). Source: about 1 year ago
I will also say -- UberDuck and AI TTS in general, when compared to the SURGE of development and tools that's happened on the image/video side of AI, is TERRIBLE. UberDuck's community specifically seems geared towards kids making memes -- I suspect they just ended up there and didn't design it that way, but wading through the terrible user created models to find ones that work was tiresome. I tried to get... - Source: Hacker News / about 1 year ago
Http://coqui.ai is probably the best, state of the art voice package for both tts and stt. It takes a little elbow grease and reading time to use it with any flair. Source: over 1 year ago
No, the reference wavs are only used during inference (asking the already trained model to make predictions). During training, the dataset was huge and it contained a lot of prosodic information - running Capacitron on LJSpeech would not make a lot of sense because the dataset is fairly monotonic. If you're interested about voice cloning from a small amount of data, you should check out the research from Coqui,... Source: almost 2 years ago
I've got one long audio of her as "dataset". But no idea what tool to use to make the fake. Could anyone recommend a tool and a tutorial to get to it ? (I've stumble accross coqui.ai github project for instance, but the instructions are quite unclear as how to clone a voice with it). Source: almost 2 years ago
For "natural" output you need a trained model for your language and a software for WaveNNN. YourTTS and coqui.ai are the two best approach for realtime TTS. Source: almost 2 years ago
Do you know an article comparing coqui STT to other products?
Suggest a link to a post with product alternatives.
This is an informative page about coqui STT. You can review and discuss the product here. The primary details have not been verified within the last quarter, and they might be outdated. If you think we are missing something, please use the means on this page to comment or suggest changes. All reviews and comments are highly encouranged and appreciated as they help everyone in the community to make an informed choice. Please always be kind and objective when evaluating a product and sharing your opinion.