Speech and Gesture Input

I would really like to be able to interact with applications in a simultaneous multi-modal fashion, taking advantage of pointing device, keyboard, voice, and gestures. In some cases, I'd like to direct speech commands to a background app while pointer and keyboard focus remains in the front app. Sometimes you need information from an app, not necessarily to use it.

Say I’m chatting with a friend to schedule dinner next week, but I don’t remember my schedule. Calendar is open and visible in another region of the screen, but it’s showing the current week. Assuming the Mac is always listening for the Siri invocation command, I just say “Hey, Siri, go to next week in Calendar” rather than switching to Calendar, navigating to next week, then switching back to the chat. This scenario is depicted in the video below.

SiriKit for macOS Flowchart
A flowchart for Siri interactions with background apps on macOS.

Speedy interpretation of speech commands is crucial to these interactions feeling fluid and natural. Incorporation of a speech co-processor (as with the iMac Pro) allows Macs to always listen for the “Hey, Siri” prompt rather than having to invoke Siri through the menubar item or keyboard shortcut.

If Apple builds a display (still hoping for a 40-inch 8K) to pair with the forthcoming Mac Pro, they should include the dot field hardware that enables Face Unlock on the iPhone X. Provide that on the Mac, but go further by using it to recognize hand gestures made in the space between the user and the display. It could also potentially be useful for those with motor impairments as a way to use blink patterns to execute commands. Maybe it could even be used as a way to translate sign languages to text, without having to wear a special glove.


  1. Hold up a thumbs up or down, or one or more fingers to star rate the playing song
  2. Raise or lower your hand to control audio or video volume
  3. Make a pinch in or out gesture in mid-air to zoom in mapping or graphics apps; not everyone has a trackpad
  4. Make a “holding a camera, pressing the shutter button” gesture to take a screenshot

I wrote years ago about visual gesture interpretation, for which there now seems to be capable hardware. More recent thoughts on gestural interaction from David Rose and IDEO.

Tools and Resources

Tomorrow is the Question

Rescheduling an event to another day using the mobile Calendar app on the iPhone requires at least six taps across four screens. The method depicted below reduces the minimum taps to two with the number of screens depending on how many days the event is moved.

  1. In Day view, tap and hold the event.

    Event pressed in Day view

  2. With your other hand, tap the forward (or back) triangle button in the date bar.

    Tap forward button while holding event

  3. Alternatively, while still holding the event, swipe from right to left to go to the next day (or vice versa for the previous).

    Swipe while holding to move forward or backward

  4. The event is moved to the next (or previous) day at the same time. The event box would always remain beneath the tapped spot, nudging overlapping event boxes aside if necessary.

    Event displayed on new date

Problems with the Method

  1. Not easily discoverable
  2. Requires two hands or Evgeny Kissin-level finger dexterity
  3. Only works cleanly in Day view, though variations for List and Month could work

Indra Raj on MySpace