Speech and Gesture Input

I would really like to be able to interact with applications in a simultaneous multi-modal fashion, taking advantage of pointing device, keyboard, voice, and gestures. In some cases, I'd like to direct speech commands to a background app while pointer and keyboard focus remains in the front app. Sometimes you need information from an app, not necessarily to use it.

Say I’m chatting with a friend to schedule dinner next week, but I don’t remember my schedule. Calendar is open and visible in another region of the screen, but it’s showing the current week. Assuming the Mac is always listening for the Siri invocation command, I just say “Hey, Siri, go to next week in Calendar” rather than switching to Calendar, navigating to next week, then switching back to the chat. This scenario is depicted in the video below.

SiriKit for macOS Flowchart
A flowchart for Siri interactions with background apps on macOS.

Speedy interpretation of speech commands is crucial to these interactions feeling fluid and natural. Incorporation of a speech co-processor (as with the iMac Pro) allows Macs to always listen for the “Hey, Siri” prompt rather than having to invoke Siri through the menubar item or keyboard shortcut.

If Apple builds a display (still hoping for a 40-inch 8K) to pair with the forthcoming Mac Pro, they should include the dot field hardware that enables Face Unlock on the iPhone X. Provide that on the Mac, but go further by using it to recognize hand gestures made in the space between the user and the display. It could also potentially be useful for those with motor impairments as a way to use blink patterns to execute commands. Maybe it could even be used as a way to translate sign languages to text, without having to wear a special glove.


  1. Hold up a thumbs up or down, or one or more fingers to star rate the playing song
  2. Raise or lower your hand to control audio or video volume
  3. Make a pinch in or out gesture in mid-air to zoom in mapping or graphics apps; not everyone has a trackpad
  4. Make a “holding a camera, pressing the shutter button” gesture to take a screenshot

I wrote years ago about visual gesture interpretation, for which there now seems to be capable hardware. More recent thoughts on gestural interaction from David Rose and IDEO.

Tools and Resources

Published by

Daniel J. Wilson

I am a designer, drummer, and photographer in Brooklyn, NY.