If you’re using the Home Assistant voice assistant mechanism (not Alexa/Google/etc.) how’s it working for you?
Given there’s a number of knobs that you can use, what do you use and what works well?
- Wake word model. There’s the default models and custom
- Conservation agent and model
- Speech to text models (e.g. speech-to-phrase or whisper)
- Text to speech models
I’d like a PCB designed as a drop-in replacement for labotomized Echo or Nest devices so we can reuse their existing hardware and recycle millions of older units.
I would love this. Have several devices sitting waiting for me to figure out how to hack them
It’s not 🙁. Height of laziness: I ran out of usb-c cables and had to use that one for something else.
It’s not like I can get them dirt cheap overnight delivery ……
I built a docker wyoming version of pocket-tts a while ago because I like the voices and timbre of pocket-tts better than Piper. I’ve subbed it in for Piper.
Also been using the OVH Linux-voice-assistant which I dockerized for voice hardware on my desktop.
HA Voice Preview is cool but it’s a toy compared to Alexa. It’s not loud enough and doesn’t pick out the wake work well enough. Incredibly cool pipeline and ability to tweak it, once the hardware improves I’d love to replace all my Echo’s.
There seem to be some 3rd party music capable devices in the works but nothing ready made in production.
I am waiting for a better overall speaker and I need a new server that can at least run a small lllm to run the assistant as intents are too ridged
I agree that it’s not production ready and they know that too, hence the name. But in relation to your points, I plugged in some speaker as it’s not really that great of a speaker at all.
For the wake word, at some point they did an update to add a sensitivity setting so you can make it more sensitive. You could also ty donating your voice to the training: https://ohf-voice.github.io/wake-word-collective/
But all in all you’re spot on with the challenges. I’d add a couple more.
With OpenAI I find it can outperform other voice assistants in certain areas. Without it, you come up across weird issues, like my wife always says “set timer 2 minutes” and it runs off to OpenAI to work out what that means. If you says “set a timer for 2 minutes” it understands immediately.
What I wish for is the ability to rewrite requests. Local voice recognition can’t understand my accent so I use the proxied Azure speech to text via Home Assistant Clound, and it regularly thinks I’m saying “Cortana” (I’m NEVER saying Cortana!)
Oh and I wish it could do streaming voice recognition instead of waiting for you to finish talking then waiting for a pause before trying anything. My in-laws have a google home and if you say something like “set a timer for 2 minutes” it immediately responds because it was converting to text as it went, and knew that nothing more was coming after a command like that. HAVP has perhaps a 1 second delay between finishing speaking and replying, assuming it doesn’t need another 5 seconds to go to open AI. And you have to be quiet in that 1 second otherwise it thinks you’re still talking (a problem in a busy room).
Here are the specs on the HA voice Preview: https://www.home-assistant.io/voice-pe/
And a breakdown of one of the lowest-end Alexa devices: https://www.briandorey.com/post/echo-dot-5th-gen-smart-speaker-teardown
The HA device runs on an ESP32-S3 with two microphones. A Seedstudio variant uses a Raspberry Pi CM4, also with two mics: https://iotbyhvm.ooo/home-assistant-voice-kit-by-seeed-studio-a-comprehensive-guide/
The Echo dot runs on a custom AZ2 processor with a built-in neural edge processor and three microphones. Echoes used to have as many as seven mics, but advanced Digital Signal Processing techniques let them get away with fewer input signals (and lower BOM cost).
The difference is with three or more mics you can do what’s called ‘beam-forming’ (https://dspconcepts.com/sites/default/files/voice_ui_part2.pdf) to isolate outside noise. This gets you cleaner wake-word recognition and faster response. Also, having an on-device neural processor chip means you can offload a lot of the filter processing.
The upshot is, until HA switches to a more custom audio-processing system, it’ll always lag behind a cheap $30 Echo or Google device. The HA unit is good for dev experimentation, but for day-to-day use folks might want to consider something from the ReSpeaker line: https://www.seeedstudio.com/blog/2024/10/16/smart-home-assistant-speakers/
Interesting…
I have a Pi Zero2W with a (v1) respeaker hat and it’s pretty good, but doesn’t do well in a moderately noisy environment.
I was considering to get a voice preview instead… but… maybe I should consider the v2 respeaker instead? I already have the pi…
Mine stopped working and I can’t figure out how to fix it. I switched back and forth between cloud and local models and at some point it just died.
The biggest challenge, in my experience, is finding hardware that heats you well, and you can hear well.
On my PCs USB mic when I’m sitting directly on front of it, everything works quite well. Once I start stepping away, things start to get funky.
The biggest challenge, in my experience, is finding hardware that heats you well, and you can hear well.
Maybe a hairdryer?
Okay now answer: What gets wetter as it dries?
I don’t get it! What is it?
A towel!
Oh nice!
@CmdrShepard49
My towel?
Home Assistant sells units with duel microphones that aren’t too expensive and work relatively well. But the local voice recognition wasn’t great last I tried.
It’s probably a mono directional mic, or just narrow focus. Omni directional mics I only know about in the musical sense, and they aren’t cheap. But I’m sure there are inexpensive versions
Picked up a preview edition last year and it just kind of sits there.
I really need to get it running for basic automation tasks but finding the time to research good tutorials seems to be eluding me.
I also have a preview edition.
I moved HA from my server to a HA green to separate reliability (my server is a test bed and uptime isnt great, and home automation warrants better uptime than I was giving it).
The voice services don’t work as well on the green directly, but I view it as part of the HA ecosystem and I want it running on the same hardware, but it seems very much like not a great option for that. And even on my own hardware, it still seems like it was a bit slower than I’d want and not always accurate. I definitely need a lot of tweaking (just like OP) to make it worth while.
I only ever set timers and checked the weather when I had a Google home mini, Voice preview is able to do that pretty well.
We’ve been using the previews since they shipped. The Mycroft wake word has worked well enough for the whole family. Tried the chatbot fallback but the syntax of the intent parser is strict enough we were getting routed to the llm way more than we wanted. For example asking it to turn on a light and Claude telling us it couldn’t do that. It fails faster and more reliably with just the intent parser.
Our favorite use case is shopping lists. “Hey Mycroft add greens to groceries list” is great and won me some WAF. I also regularly use timers, some custom commands (hey Mycroft I fed the dog), and managing lights with scenes (hey Mycroft turn on Daytime).
I’m hoping to one day transition to a local llm that’s fine tuned for homeassistant specific tasks and it looks like some good ones will arrive soon. The existing implementations haven’t won me over yet.
Dunno, I’m a big fan and the wife doesn’t hate them, I’m really optimistic about the future of these. I think HA is going about them the right way and we’ll see good things in the future. It’s a little rough right now if you’re not willing to put up with the quirks probably but I think it’s just going to keep getting better.
Forgot to add, something I really want to figure out is how to do reminders with it. I’m stuck using Gemini on my phone for that and I’d really love to find a way to do that in HA if anyone has any tips.
Similar to the other user’s response, I use the calendar integration, then add the things on the calendar (say, putting the recycling out to be collected). Then I have an automation that will read out a reminder at the time it is scheduled for in the calendar.
So the evening before recycling pickup every fortnight, it pipes up and says “Reminder: Recycling” or whatever.
Works pretty well for these regular reoccurring things. I haven’t tried using it for one off reminders, and you can’t say “ok nabu, remind me to wish Steve a happy birthday on the 27th of February” or anything like that. Still, I’m pretty happy.
I seem to remember needing a bit of playing to get the notification working, I’m happy to look up and post what I have in my automation if needed.
Me too, so I started looking around. https://community.home-assistant.io/t/calendar-notifications-actions/612326
I don’t have time right now, but will this work?
that looks promising, I think I can see a path forward there, I still need to dig into the new calendar functionality, thanks for sharing!
I use the HA Voice Preview in two different rooms and got rid of my Alexa Dots. I’ve been trying both speech-to-phrase and whisper with medium.en running on the GPU for STT, tried llama3.2 and granite4 for the LLM with local command handling
I’ve been trying to get it working better, but it’s been a struggle. The wake word responds to me, but not my girlfriend’s voice. I try setting timers, and it says done, but never triggers the timer.
I’d love to improve operating performance of my assistant, but want to know what options work well for others. I’ve been experimenting with an intermediary STT proxy to send it to both whisper and speech-to-phrase to see which one has more confidence.
In Home Assistant, in the settings, if you go to Voice Assistants then click the … on your assistant and click Debug, you can see what it thought you said (and what it did).
Setting a timer on an up to date Home Assistant will repeat back what it set. E.g. If I say “Set a timer for 2 minutes” it will say “Timer set for 2 minutes”. It says “Done” when running some Home Assistant task/automation, so it’s probably not understanding you correctly (hence what the debug option is good for). I use the cloud voice recognition as I couldn’t get the local version to understand my accent when I tried it (a year ago). It’s through Azure but is proxied by Home Assistant so they don’t know it’s you.
The wake word responds to me, but not my girlfriend’s voice.
My wife swears it’s sexist, she has a bit of trouble too. In the integration options you can set the sensitivity to make it more sensitive, but it does increase false activations. I have it on the most sensitive and she can activate it first time most of the time.
I have setup the wake word as Hey Jarvis, but the issues I get… it usually gets it, however I also hear it bleeping and blooping randomly so that’s fun. Then HA is running on a N100 mini computer, and I found that the smallest Whisper model I can use reliably is the medium one (I’m sure in English it’d work well even with smaller ones) and the LLM is Qwen 3 4b running on a computer with a dedicated RX 6400. As in, that’s the second gpu and it’s doing only that. The end result is that I give a command, wait a few seconds (Whisper mostly), then hopefully it works out. I imagine with a known good mic and powerful local hardware it’d be noticeably better, but.
I’m interested in all of these replies because I have a preview edition coming in the mail. I’m tired of Google listening to everything despite claiming not to.
I think it’s a pretty cool toy to play with. It mostly gets used for setting timers and playing music, but you can add Home Assistant automations that trigger when you say certain things. Lot’s to play with if that’s your idea of fun!
I think itd be a great way to control everything rather than spending hours and hours making dashboards while trying to include the 20 different entities each device comes with without it looking like a jumbled, convoluted mess. “Computer turn off the TV” “Computer turn on the kitchen lights” “Computer open the blinds” “Computer set the thermostat to 73.” I dont really use voice commands anywhere else but this sounds like it could be really useful. I gotta start looking at how to enable all this.
did you try using tabs, or separate dashboard pages?
I have both but just basic options. Like with WLED lights (if you’re familiar), I’ll have buttons for turning them on and off and controlling the brightness, and ignore everything else light presets, color, patterns, segments, night light, etc because there isnt a great way to list them all in a compact space especially when you have 10 separate WLED devices. Typically I’ll just use the WLED app if I want to do anything outside of basic functionality.
Though, reading some of the other comments here, it sounds like voice control is nowhere near as far along as I imagined it was so maybe my laziness will prevail and I’ll attempt setting it up at some indeterminate time in the future.
I have an S3-BOX-3 and it works, sort of. But without a local LLM and better local speech to text it’s not super useful.
I have speech to text with whisper working on my phone. HA is set to the default assistant on my phone so I can control lights, timers and scenes from 1 action on my phone or watch. Works well once I set friendly names for each light, room, device and scene.








