Hey, Listen!

HeyClock is the current app “on the workbench” for HMB Tech. I’ve got the basic layout finished (which was a lot more frustrating than it should have been; if Apple thinks their app development process and interface is intuitive, they need to go check out a few dozen other languages and IDEs first to see what “standards” look like) and have started working on the “tidying up” of small features before I move on to the next major phase- the timing and recall of events.

One thing I’ve spent far too long on is the actual speech itself. iOS provides the AVSpeechSynthesizer class to let you convert speech to text without any real effort- you simply instantiate an instance of the class, and start passing it AVSpeechUtterance objects (basically strings with a few extra options baked in so you can tweak things like playback speed and pitch) with the speak() function. And that’s it.

Or so the documentation claimed. I had a hard time getting the “Hey” part of the speech to sound natural- rather than what sounded like a lady robot disinterestedly reading the word off of a cue card, I wanted it to sound more like someone trying to legitimately get your attention. Hey! The first thing I discovered was that you can use a NSMutableAttributedString instead of a normal string when creating an utterance, and give it AVSpeechSynthesisIPANotationAttribute as an attribute. This lets you assign pronunciations in IPA format to describe exactly how you want it to sound. Unfortunately, (per usual, as I’m slowly discovering) Apple didn’t think it’d be important to document any specific information when it came to this feature, notably regarding which unicode symbols it will actually accept when it comes to IPA. After spending close to an entire day trying out hundreds of different combinations, I discovered that the answer is “not many.” I was really hoping to use some of the tonality IPA symbols to make the “Hey” sound a little more urgent, but the AVSpeechSynthesizer seemed to mutate all of my attempts that contained unrecognizable symbols into “Hurr.” How appropriate.

After giving up on the IPA pronunciation route, I started messing around with playback speeds and pitches, until I found something that sounded slightly more realistic than the chipmunk-esque results of my first attempt. That was all well and good, but you can’t change pitch or rate partway through an utterance; it’s all or nothing. So now I had to break the message up into two utterances- the “Hey!” and the actual message itself.

I then spent the next several days trying to figure out why, despite everything the Apple documentation and plenty of online tutorials and forums claimed, the AVSpeechSynthesizer object wouldn’t queue up a second utterance.

let synthesizer = AVSpeechSynthesizer()
let utterance1 = AVSpeechUtterance(string: "first")
let utterance2 = AVSpeechUtterance(string: "second")

synthesizer.speak(utterance1)
synthesizer.speak(utterance2)

This, pasted inside a button push, resulted only in “first” being spoken. If I commented the first speak() out, I got the second one. As far as I could tell, the documentation just straight-up lied about the queuing functionality. Nobody else seemed to be having my problem, though. Of course, it didn’t help that there were hardly any hits in the first place for use cases similar to mine; just about everyone else out there was using the delegate feature to sense when an utterance was finished before doing something else. I wanted to stay away from that, though, since my ViewController is already a delegate for a dozen other things, and I’m trying to cut down on its complexity.

Have you spotted my error yet? Maybe not, because you might be making the same sort of assumptions I did about garbage collection here. I don’t see why you wouldn’t; they’re perfectly intuitive assumptions to be making. But, as I’ve already complained about several times, “intuitive” and “iOS development” don’t really intersect anywhere in this reality.

I was assuming that, as long as there were still utterances left in the AVSpeechSynthesizer queue, the garbage collector would keep from discarding it even after the program execution left the scope of my button push. It finishes speaking the first utterance, doesn’t it? I figured there would be only two options- either the synthesizer gets trashed as soon as the button-press event is over (in which case I probably wouldn’t even hear the first utterance) or it would wait until the synthesizer’s queue was empty before discarding it. In reality, it seems to wait for a “break” between utterances before trashing the synthesizer. As soon as I moved my AVSpeechSynthesizer out of the local scope of the button-push function and created it as a global in the view controller, it worked just fine.
Scope. It’s more than just a mouthwash.

This brings my “tidying” phase to a close, and now it’s time to get serious about finding a good method for scheduling an “alarm clock” style of notification, which I’m sure will lead to another blog post at a later date.