Speech and Gesture Input

I would really like to be able to interact with applications in a simultaneous multi-modal fashion, taking advantage of pointing device, keyboard, voice, and gestures. In some cases, I'd like to direct speech commands to a background app while pointer and keyboard focus remains in the front app. Sometimes you need information from an app, not necessarily to use it.

Say I’m chatting with a friend to schedule dinner next week, but I don’t remember my schedule. Calendar is open and visible in another region of the screen, but it’s showing the current week. Assuming the Mac is always listening for the Siri invocation command, I just say “Hey, Siri, go to next week in Calendar” rather than switching to Calendar, navigating to next week, then switching back to the chat. This scenario is depicted in the video below.


SiriKit for macOS Flowchart
A flowchart for Siri interactions with background apps on macOS.

Speedy interpretation of speech commands is crucial to these interactions feeling fluid and natural. Incorporation of a speech co-processor (as with the iMac Pro) allows Macs to always listen for the “Hey, Siri” prompt rather than having to invoke Siri through the menubar item or keyboard shortcut.

If Apple builds a display (still hoping for a 40-inch 8K) to pair with the forthcoming Mac Pro, they should include the dot field hardware that enables Face Unlock on the iPhone X. Provide that on the Mac, but go further by using it to recognize hand gestures made in the space between the user and the display. It could also potentially be useful for those with motor impairments as a way to use blink patterns to execute commands. Maybe it could even be used as a way to translate sign languages to text, without having to wear a special glove.

Gestures

  1. Hold up a thumbs up or down, or one or more fingers to star rate the playing song
  2. Raise or lower your hand to control audio or video volume
  3. Make a pinch in or out gesture in mid-air to zoom in mapping or graphics apps; not everyone has a trackpad
  4. Make a “holding a camera, pressing the shutter button” gesture to take a screenshot

I wrote years ago about visual gesture interpretation, for which there now seems to be capable hardware. More recent thoughts on gestural interaction from David Rose and IDEO.

Tools and Resources

Math Field

Being annoyed with having to constantly open and close Fireworks’ modal Numeric Transform dialog, I thought it would be particularly useful in design applications to scale and reposition objects based on relative calculations like “this box should be 25 percent taller”.

Assuming this had been implemented somehow, I found the current method available in Mac applications described by Dave Mark, who learned of it from Mike Ash. The process is as follows:

  1. Enter a formula in a text field such as 10/2.
  2. Select the formula.
  3. Press Shift-Command-8.

Easy enough, but totally invisible unless you know it’s there (not necessarily a bad thing) and yet another keyboard shortcut to remember. The fact that Script Editor pops open is also mildly surprising and irritating.

Proposed Improvements

Keeping the existing method (though not requiring Script Editor to perform the calculations) is fine for free-form text, but a design specific to number boxes would be very helpful in many applications. It reduces input repetition (a starting value is always available), application switching, and would be a great help to me as I have trouble doing math in my head. As Dan Saffer stated in his Designing Smart and Clever Applications presentation: “Do what humans have trouble doing but computers can do easily.”

  1. Giving focus to a number input field displays a calculator pop-out.

    Pop-out calculator

  2. As the user inputs a formula using the keyboard or pop-out, it is written into the field. This provides a hint to the user that they can type in formulas directly.

    Formula input by pop-out calculator

  3. The formula is executed when the user clicks the Equals button, presses the Equals or Return or Enter key, or moves focus away from the field. The pop-out calculator disappears whenever focus moves away from the field. In the image below, Enter was pressed, moving focus back to the object in the document window.

    Pop-out calculator formula executed