If you’re using the Home Assistant voice assistant mechanism (not Alexa/Google/etc.) how’s it working for you?
Given there’s a number of knobs that you can use, what do you use and what works well?
- Wake word model. There’s the default models and custom
- Conservation agent and model
- Speech to text models (e.g. speech-to-phrase or whisper)
- Text to speech models


Here are the specs on the HA voice Preview: https://www.home-assistant.io/voice-pe/
And a breakdown of one of the lowest-end Alexa devices: https://www.briandorey.com/post/echo-dot-5th-gen-smart-speaker-teardown
The HA device runs on an ESP32-S3 with two microphones. A Seedstudio variant uses a Raspberry Pi CM4, also with two mics: https://iotbyhvm.ooo/home-assistant-voice-kit-by-seeed-studio-a-comprehensive-guide/
The Echo dot runs on a custom AZ2 processor with a built-in neural edge processor and three microphones. Echoes used to have as many as seven mics, but advanced Digital Signal Processing techniques let them get away with fewer input signals (and lower BOM cost).
The difference is with three or more mics you can do what’s called ‘beam-forming’ (https://dspconcepts.com/sites/default/files/voice_ui_part2.pdf) to isolate outside noise. This gets you cleaner wake-word recognition and faster response. Also, having an on-device neural processor chip means you can offload a lot of the filter processing.
The upshot is, until HA switches to a more custom audio-processing system, it’ll always lag behind a cheap $30 Echo or Google device. The HA unit is good for dev experimentation, but for day-to-day use folks might want to consider something from the ReSpeaker line: https://www.seeedstudio.com/blog/2024/10/16/smart-home-assistant-speakers/
Interesting…
I have a Pi Zero2W with a (v1) respeaker hat and it’s pretty good, but doesn’t do well in a moderately noisy environment.
I was considering to get a voice preview instead… but… maybe I should consider the v2 respeaker instead? I already have the pi…