If you’re using the Home Assistant voice assistant mechanism (not Alexa/Google/etc.) how’s it working for you?
Given there’s a number of knobs that you can use, what do you use and what works well?
- Wake word model. There’s the default models and custom
- Conservation agent and model
- Speech to text models (e.g. speech-to-phrase or whisper)
- Text to speech models


I use the HA Voice Preview in two different rooms and got rid of my Alexa Dots. I’ve been trying both speech-to-phrase and whisper with medium.en running on the GPU for STT, tried llama3.2 and granite4 for the LLM with local command handling
I’ve been trying to get it working better, but it’s been a struggle. The wake word responds to me, but not my girlfriend’s voice. I try setting timers, and it says done, but never triggers the timer.
I’d love to improve operating performance of my assistant, but want to know what options work well for others. I’ve been experimenting with an intermediary STT proxy to send it to both whisper and speech-to-phrase to see which one has more confidence.
In Home Assistant, in the settings, if you go to Voice Assistants then click the … on your assistant and click Debug, you can see what it thought you said (and what it did).
Setting a timer on an up to date Home Assistant will repeat back what it set. E.g. If I say “Set a timer for 2 minutes” it will say “Timer set for 2 minutes”. It says “Done” when running some Home Assistant task/automation, so it’s probably not understanding you correctly (hence what the debug option is good for). I use the cloud voice recognition as I couldn’t get the local version to understand my accent when I tried it (a year ago). It’s through Azure but is proxied by Home Assistant so they don’t know it’s you.
My wife swears it’s sexist, she has a bit of trouble too. In the integration options you can set the sensitivity to make it more sensitive, but it does increase false activations. I have it on the most sensitive and she can activate it first time most of the time.