Speech and Gesture Input

I would really like to be able to interact with applications in a simultaneous multi-modal fashion, taking advantage of pointing device, keyboard, voice, and gestures. In some cases, I'd like to direct speech commands to a background app while pointer and keyboard focus remains in the front app. Sometimes you need information from an app, not necessarily to use it.

Say I’m chatting with a friend to schedule dinner next week, but I don’t remember my schedule. Calendar is open and visible in another region of the screen, but it’s showing the current week. Assuming the Mac is always listening for the Siri invocation command, I just say “Hey, Siri, go to next week in Calendar” rather than switching to Calendar, navigating to next week, then switching back to the chat. This scenario is depicted in the video below.

SiriKit for macOS Flowchart
A flowchart for Siri interactions with background apps on macOS.

Speedy interpretation of speech commands is crucial to these interactions feeling fluid and natural. Incorporation of a speech co-processor (as with the iMac Pro) allows Macs to always listen for the “Hey, Siri” prompt rather than having to invoke Siri through the menubar item or keyboard shortcut.

If Apple builds a display (still hoping for a 40-inch 8K) to pair with the forthcoming Mac Pro, they should include the dot field hardware that enables Face Unlock on the iPhone X. Provide that on the Mac, but go further by using it to recognize hand gestures made in the space between the user and the display. It could also potentially be useful for those with motor impairments as a way to use blink patterns to execute commands. Maybe it could even be used as a way to translate sign languages to text, without having to wear a special glove.


  1. Hold up a thumbs up or down, or one or more fingers to star rate the playing song
  2. Raise or lower your hand to control audio or video volume
  3. Make a pinch in or out gesture in mid-air to zoom in mapping or graphics apps; not everyone has a trackpad
  4. Make a “holding a camera, pressing the shutter button” gesture to take a screenshot

I wrote years ago about visual gesture interpretation, for which there now seems to be capable hardware. More recent thoughts on gestural interaction from David Rose and IDEO.

Tools and Resources

Consequences of Free Speech

“The tech companies should stop censoring users that they politically disagree with or governments should regulate them as public utilities,” Torba’s spokesman Utsav Sanduja said. Last year, Sanduja and Torba founded Gab.ai, an alternative social network for free speech advocates. “Imagine if a private corporation owned all the highways and they decided to close them down whenever they feel like it — that is what it’s like. You cannot deny people a fundamental staple of the Internet.”

The Washington Post: In Silicon Valley, the right sounds a surprising battle cry: Regulate tech giants

It’s more like you claiming that you should be able to take the public highway (the Internet) to a privately owned restaurant, where you spew racist and misogynist bile, and the restaurant staff and other patrons just have to accept your presence and cannot kick you out.

The article goes on to conflate the liberal preference for legally enforced net neutrality (anyone can drive on the highway, but the restaurants along the way can make their own rules) with the desire of these morons to legally prevent private social, payment, and infrastructure networks from kicking people off for using their platforms to espouse hate (the above scenario).

The most hilarious outcome of all this would be if conservatives finally decided to abandon their preference for not using federal government power to break up monopolies, all because some Neo-Nazis got kicked off twitter.