<- Back
Comments (43)
- lukebechtelNice!Just made it an MCP server so claude can tell me when it's done with something :)https://github.com/Marviel/speak_when_done
- singpolyma3Love this.It says MIT license but then readme has a separate section on prohibited use that maybe adds restrictions to make it nonfree? Not sure the legal implications here.
- armcatOh this is sweet, thanks for sharing! I've been a huge fan of Kokoro and event setup my own fully-local voice assistant [1]. Will definitely give Pocket TTS a go![1] https://github.com/acatovic/ova
- ImustaskforhelpPerhaps I have been not talking to voice models that much or the chatgpt voice always felt weird and off because I was thinking it goes to a cloud server and everything but from Pocket TTS I discovered unmute.sh which is open source and I think is from the same company as Pocket TTS/can I think use Pocket TTS as wellI saw some agentic models at 4B or similar which can punch above its weights or even some basic models. I can definitely see them in the context of home lab without costing too much money.I think atleast unmute.sh is similar/competed with chatgpt's voice model. It's crazy how good and (effective) open source models are from top to bottom. There's basically just about anything for almost everyone.I feel like the only true moat might exist in coding models. Some are pretty good but its the only industry where people might pay 10x-20x more for the best (minimax/z.ai subscription fees vs claude code)It will be interesting to see if we will see another deepseek moment in AI which might beat claude sonnet or similar. I think Deepseek has deepseek 4 so it will be interesting to see how/if it can beat sonnet(Sorry for going offtopic)
- mgaudetEep.So, on my M1 mac, did `uvx pocket-tts serve`. Plugged in> It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way—in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only(Beginning of Tale of Two Cities)but the problem is Javert skips over parts of sentences! Eg, it starts:> "It was the best of times, it was the worst of times, it was the age of wisdom, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the spring of hope, it was the winter of despair, we had everything before us, ..."Notice how it skips over "it was the age of foolishness,", "it was the winter of despair,"Which... Doesn't exactly inspire faith in a TTS system.(Marius seems better; posted https://github.com/kyutai-labs/pocket-tts/issues/38)
- dust42Good quality but unfortunately it is single language English only.
- indigodaddyPerfect timing that is exactly what I am looking for for a fun little thing I'm working on. The voices sound good!
- tschellenbachIt's cool how lightweight it is. Recently added support to Vision Agents for Pocket. https://github.com/GetStream/Vision-Agents/tree/main/plugins...
- GaggiXI love that everyone is making their own TTS model as they are not as expensive as many other models to train. Also there are plenty of different architecture.Another recent example: https://github.com/supertone-inc/supertonic
- syntaxingIs there something similar for STT? I’m using whisper distill models and they work ok. Sometimes it gets what I say completely wrong.
- oybng>If you want access to the model with voice cloning, go to https://huggingface.co/kyutai/pocket-tts and accept the terms, then make sure you're logged in locally with `uvx hf auth login` lol
- snvzzRelative to AmigaOS translator.device + narrator.device, this sure seems bloated.