If you’re using the Home Assistant voice assistant mechanism (not Alexa/Google/etc.) how’s it working for you?

Given there’s a number of knobs that you can use, what do you use and what works well?

  • Wake word model. There’s the default models and custom
  • Conservation agent and model
  • Speech to text models (e.g. speech-to-phrase or whisper)
  • Text to speech models
  • mr_tyler_durden@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    ·
    2 days ago

    HA Voice Preview is cool but it’s a toy compared to Alexa. It’s not loud enough and doesn’t pick out the wake work well enough. Incredibly cool pipeline and ability to tweak it, once the hardware improves I’d love to replace all my Echo’s.

    • AlternateRoute@lemmy.ca
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      There seem to be some 3rd party music capable devices in the works but nothing ready made in production.

      I am waiting for a better overall speaker and I need a new server that can at least run a small lllm to run the assistant as intents are too ridged

    • Dave@lemmy.nz
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 days ago

      I agree that it’s not production ready and they know that too, hence the name. But in relation to your points, I plugged in some speaker as it’s not really that great of a speaker at all.

      For the wake word, at some point they did an update to add a sensitivity setting so you can make it more sensitive. You could also ty donating your voice to the training: https://ohf-voice.github.io/wake-word-collective/

      But all in all you’re spot on with the challenges. I’d add a couple more.

      With OpenAI I find it can outperform other voice assistants in certain areas. Without it, you come up across weird issues, like my wife always says “set timer 2 minutes” and it runs off to OpenAI to work out what that means. If you says “set a timer for 2 minutes” it understands immediately.

      What I wish for is the ability to rewrite requests. Local voice recognition can’t understand my accent so I use the proxied Azure speech to text via Home Assistant Clound, and it regularly thinks I’m saying “Cortana” (I’m NEVER saying Cortana!)

      Oh and I wish it could do streaming voice recognition instead of waiting for you to finish talking then waiting for a pause before trying anything. My in-laws have a google home and if you say something like “set a timer for 2 minutes” it immediately responds because it was converting to text as it went, and knew that nothing more was coming after a command like that. HAVP has perhaps a 1 second delay between finishing speaking and replying, assuming it doesn’t need another 5 seconds to go to open AI. And you have to be quiet in that 1 second otherwise it thinks you’re still talking (a problem in a busy room).