New Version: https://www.thingiverse.com/thing:2951090 which is much more polished.
I also made an audio recognition software which can recognize keywords, simple commands, and numbers, baby crying and more to come (light control...). This software runs even on the Pi Zero. Check it out here
An alternative to the hard to get - Google AIY kit. This is basically a speaker box with space for a PiZero (and a bit more).
The speaker is a common 3 inch 4 ohm which is delivered with the Google AIY kit, and readily available from other sources (eg. Adafruit).
See it in action: https://www.youtube.com/watch?v=eeX9NlU-ESQ. I'm using the Zenbu-Hat which will be available here for amp and mic. But using breakout boards (MAX98357A for amp and SPH0645LM4H for mic) should work too but this can be a bit fiddly.
The new version features a nicer speaker grill and enough space for the Raspberry camera cable.
The ugly gap from the previous version is now gone.
For contributions drop me a pull request at github: https://github.com/yodakohl/PiZeroSpeaker