Raspberry Pi Zero W as a media_player for Home Assistant

Building a DIY smart speaker!

1871 words
9 minutes

The goal

I’ve been interested in home automation with Home Assistant for quite a while now, and my goal has always been to have a smart home that is truly “smart”. For me, that means as little interaction with the whole system as possible – certainly not in the sense that I have to pull out my phone to turn on a lamp.

On the other hand, I find “interaction in the other direction” – that is, from the smart home system to me – super important. I want feedback from my smart home system and have always wanted to get a classic Daily Briefing right after waking up – properly with weather forecast and the current episode of the german news podcast Tagesschau in 100 Sekunden! That’s what today is all about.

This is where the media_player domain of Home Assistant comes into play: It abstractly represents a device that can play back sound – both text-to-speech (“TTS”) and media files. For prefabricated speakers and sound bars such as those from Sonos, but also for Amazon’s, Google’s and Apple’s smart speakers, there are some in-built integrations that expose a media_player entity to the Home Assistant. But this is about building the whole thing ourselves!

The plan

I had a Raspberry Pi 3 B+ and a Raspberry Pi Zero W (first generation) lying around at home. I chose the Zero W because – well, I really don’t know. Probably because it’s smaller and the big 3 B+ is just too much for this simple use-case. So Raspberry Pi Zero W it is!

For the output device, I had the idea of just using my old Bose SoundLink Revolve box. It has an AUX input, good sound quality and I rarely use it anymore. Most importantly, I only use it when I’m not at home, if at all, so I could still use it for both use-cases in parallel. Since the Raspberry Pi Zero W doesn’t have an AUX output, I still need a USB sound card.

Quick note for using the AUX outputs on the Raspberry Pi 3 or 4: I advise against using them! They are not of particularly high quality and generate a low background noise in silence. This is annoying for smart speakers, which also operate at night. Therefore, it is better to switch directly to USB audio, as I did!

But now the big question remains: How do I integrate the Pi Zero as a media_player into the Home Assistant?

First of all, what I want is a simple smart speaker. I don’t currently need multi-room support à la AirPlay 2 or want to stream anything via Spotify.

The first possibility would be to use the Universal Media Player: It allows the self-definition of a media_player entity, where all service calls (e.g. media_player.volume_up) can themselves call a self-defined service. This is certainly practical for infrared-controlled amplifiers, for example, where a service call simply emulates an infrared command. For our application, however, it is excessively over-powered and too complex.

So I looked around on the Home Assistant integration list for integrations that were not built for proprietary hardware/software. There were some that can control multimedia servers like Jellyfin, Plex or Kodi. But these are actually intended for web streaming – not as a permanent audio output. That’s why I’m leaving them out. The remaining ones are:

What I noticed during my research was that both Volumio and OwnTone have the Music Player Daemon (MPD) built-in! Moreover, both tools are fully-featured audio platforms, with web frontend and everything – that is, with everything I don’t need in addition (and which a Raspberry Pi Zero W (1st gen) probably can’t handle either). So MPD it is!

The realisation

Hardware

Almost all the instructions I have found online about this do some unnecessary things. As mentioned above, we need a USB sound card because the USB controller of the Raspberry Pi Zero W doesn’t have an audio controller on the USB bus. Also, we want AUX audio anyway, which the Pi Zero doesn’t have a jack for, but which we wouldn’t want to use otherwise, because of the noise issues mentioned above. So the other guides here rely on Pi Hat sound cards! These are usually quite expensive (starting at $14) and have horrendous delivery times, or don’t deliver to Germany at all. I still don’t understand why everyone uses Pi Hats. I just got myself a very cheap USB sound card from UGREEN.

As mentioned above, I simply use my Bose Soundlink Revolve in AUX mode as a speaker. It looks good, is inconspicuous and the sound is definitely good enough.

The Raspberry Pi Zero W is in its original red and white case, which you can’t see anyway because I just put it next to my music system in the cupboard where the Soundlink Box is on top of.

Raspberry Pi Zero W in its original case next to my amplifier with USB sound card
Raspberry Pi Zero W in its original case next to my amplifier with USB sound card


Bose Soundlink Revolve as a speaker for the Raspberry Pi
Bose Soundlink Revolve as a speaker for the Raspberry Pi

Installation of MPD

Now comes the exciting part… no, not at all. In fact, the installation is as trivial as it gets: MPD is in the official package sources of Raspberry Pi OS – which I assume is already installed. So a bit of apt update && apt install alsa-utils mpd and a bit of config here and a bit of config there…

For explanation: “Alsa” is the standard sound mixer on the Raspberry Pi (and many other Linux devices). It is used by MPD for output and volume control, among other things.

The following must be added to /etc/mpd.conf after installation (Append! What is already in the file should stay in):

audio_output {
    type            "alsa"
    name            "USB sound card"
    device          "hw:1,0"
    mixer_type      "software"
}

The value for name does not really matter to us and can be freely chosen. The value for device must be found out with the command aplay -l. Then look for the corresponding sound card in the output. The line you are looking for looks like this, for example:

card 1: Device [USB Audio Device], device 0: USB Audio [USB Audio]

The number after card and the number after device are now used to form the configuration value for MPD, in this case hw:1,0 for card 1, device 0.

After the config file has been adjusted, make sure that the MPD service is running and starts automatically:

systemctl enable mpd
systemctl start mpd

That’s it!

Integration into Home Assistant

Unfortunately, the MPD integration cannot be configured via the UI. But it is not difficult. This must be in the configuration.yaml (or another file that is included there via !include):

media_player:
  - platform: mpd
    name: "Leons Smart Speaker"
    host: 10.0.30.99

I haven’t changed the port, so I don’t have to specify it. Theoretically, you could still set a password for MPD, but I did not do that because I placed the device in my own subnet and everyone in my flat-sharing community can use the Home Assistant anyway – so a plaintext password at this point would be superfluous. So the only thing required to be specified is the IP of the Raspberry Pi!

Then restart Home Assistant again, et voila: media_player entity!

Final media_player entity
Final media_player entity

Since the media_player domain is comparatively complex, it may be that not all services are available for MPD. However, the ones I need and consider useful are fully functional:

  • Play media via URL (or playlist name)
  • Play text-to-speech audio
  • Start/Stop/Pause
  • Volume down/Volume up/Set volume
  • Seek
  • On/Off

Generally, I haven’t looked too much into the possibilities of MPD, but for this case it is exactly what I was looking for. So far, the service has been running for about a month without any problems and my morning Daily Briefing accompanies me every morning!

Appendix: Setting up “Tagesschau in 100 Sekunden” in Home Assistant

I already mentioned it at the beginning: My goal with the Smart Speaker was to be woken up in the morning after my alarm by an episode of the german news podcast Tagesschau in 100 Sekunden. That turned out to be not so trivial, especially since new episodes are released every hour. Besides, I didn’t really find much about setting this up online. That’s why I’d like to go into it here.

MPD’s media_player.play_media service takes either a playlist or an audio URL. The former would mean that I would have to maintain a repository of current episodes on the Raspberry Pi itself. I mean, theoretically this can be done – probably even relatively easily using cronjobs – but is kind of against the philosophy of the whole thing: The Home Assistant is supposed to be the “single point of truth” and decides what the smart speaker – which is supposed to act only as an audio satellite – plays. That’s why I would like to do it without the complete playlist feature of MPD.

So the next question is: How do I get the URL of the latest Taggesschau episode into Home Assistant?

I have found two ways to do this:

I consider the former to be simply illegal. RSS feeds have a history of years of standardisation behind them for a reason. However, it has to be said that dealing with the Scrape integration was a bit easier than the feed reader: Scrape can 1. be configured via the UI and 2. exposes a text sensor that contains the scrape content (optimally our MP3 URL to the current episode). This makes sense in an automation scenario: We only have to pass the state of this sensor to the media_player.play_media service and our smart speaker plays the episode.

Feedparser does it a bit differently: We have to configure it via YAML. My configuration looks like this:

feedreader:
  urls:
    - https://www.tagesschau.de/multimedia/sendung/tagesschau_in_100_sekunden/podcast-ts100-audio-100~podcast.xml

The URL is now regularly queried (the interval is customisable, see here) for new RSS entries. These are sent (inconveniently, in my opinion) as an event to the Home Assistant Event Bus, on top of which I have to build an automation:

Feedreader trigger
Feedreader trigger

The automation filters the <enclosure> tag with the attribute type=audio/mpeg from the feed item (transferred as event payload in trigger.event.data) and stores the url= attribute in the same tag inside an input_boolean helper. How exactly such a RSS entry of the Tagesschau looks, can be seen directly in the feed.

Feedreader action
Feedreader action

As a result, we now have an input_boolean which always contains the latest episode and which we can use just like the text sensor from the Scrape integration to transmit it to our smart speaker. A service call then looks like this:

Service call for the smart speaker to play the current episode of Tageschau in 100 Sekunden
Service call for the smart speaker to play the current episode of Tageschau in 100 Sekunden

I hope I could inspire with this article to tinker. I myself have found quite little on this subject, which is why it took me a while to solve this actually quite simple problem myself. Maybe this article has been a good starting point :)


References