Pretty image
Mike Riley takes a look at some exciting new paradigms of interaction. Don’t misinterpret Mike’s enthusiasm. He’s not cheerleading. He just wants you to realize how radically these paradigms could change your work and life.

There is a groundswell of change happening today in the world of client computing. Just as the mouse forever changed the way people interacted with graphic user interfaces, new models for interaction are disrupting the way programmers will develop for and test their applications. Let’s take a brief look at these new-world interface paradigms, and consider how they are changing the face of computing for the masses—and begin to think about what they could mean to you as a developer.

interfaces1.jpg

See Me

Visual input, usually referred to as Computer Vision or CV for short, has been in development for years. But now that processors are fast enough to interpret the huge stream of incoming visual data, the technology implied by Hal’s eye in 2001: A Space Odyssey is now a reality.

Game consoles like the Playstation 3 or the XBox 360 have motion cameras with APIs designed to capture and interpret gross-level gestures. While these recent attempts are still crude in their motion detection and tracking algorithms, improvement is accelerating at a faster pace thanks to the market acceptance and fascination with synthetic eyes connected to a computer brain.

With Microsoft’s formal release of Kinect for Windows and their further refinement of motion tracking and gesture interpretation, this modality of user-computer interaction will continue to evolve at a much faster pace than it has in the past. It isn’t inconceivable that such sensors will be embedded into tablets one day, perhaps sooner rather than later. Imagine the tracking optics on the face of a tablet watching a presenter for gestures to advance slides in a presentation, trigger video playback events, or move graphic primitives on screen without the need to physically touch the screen.

From a developer’s perspective, we’re talking about a radical change here. Developing interfaces for such a disembodied experience is far different from using a mouse or finger to click on a graphic. Yet as more and more of these interactive means are used in everyday applications, you will be expected to know how to develop, autonomously test, and debug this computer’s eye-CPU connection.

Two good references for further research in this area are Gestures and Tools for Kinect and How to Do Gesture Recognition With Kinect Using Hidden Markov Models (HMMs) by Jonathan C. Hall.

interfaces2.jpg

Hear Me

Think voice input today and you probably think of Siri. Apple made a big splash in the voice input space with their Siri beta on the iPhone 4S. Even though Google had text-to-speech results on the Android platform prior to the Siri introduction, it was not as prominently featured or sexy as Siri. Google’s implementation also suffered from a grating, raspy TTS engine that sounds uncomfortable and unnatural. Compare this to the TTS engine used by Apple with its more human-sounding tone, inflections, and even breathing cadence patterns, and it’s no surprise that Apple’s cloud-based Siri application has been embraced by early adopters.

In sober fact, when someone combines the current state of the art in computer vision seamlessly with voice recognition and current remarkably human-like TTS, HAL9000 will have arrived. Perhaps someone at the University of Illinois is currently working on such as system as I write this, and teaching that system how to sing a song.

But let’s look at a more practical implication: What does such a system mean for search? If Google has become Apple’s new Microsoft, Siri must be seen as a Google search replacement on iOS devices.

And Apple actually has a jump on the competition, because it is amassing a huge refined database of vocalized queries from around the world. This Siri-aggregated global inference engine they are building could be a way to make Google’s interface (and all their AdWords advertisements) irrelevant. Basically, there is a new, conversational way to ask a computer for information. Combined with agents and preferences, this dynamic level of interaction will eventually offer to be a driving companion, a running coach, a social assistant—and a pair programming mentor.

Let’s look more closely at what’s already happening in one of those areas: the driving companion. Automobile maker Ford has been running commercials showing non-technical consumers boasting about how they can talk to their car and “it talks back!” Having set the industry standard expectation for voice-activated in-car services, Ford’s SYNC system, based on Microsoft’s Windows CE OS, is motivating other automobile manufacturers to at least be at parity with Ford’s system. Some of these competitors are Android-based while others are a mishmash of proprietary embedded systems. Still others assume that the car owner will be an iPhone user and thus will hook into whatever services are available to the iPhone via an app. Once Apple releases a Siri API to developers, these iOS-friendly auto makers will undoubtedly create and provide voice-enabled apps for their appropriate line of cars.

It’s easy to imagine that when a customer buys a Gran Turismo from BMW, there could be an adhesive QR code label on the car’s steering wheel that takes the customer to the appropriate Apple App Store or Android Market location to download and install the custom app for that car. After entering a series of security codes to pair the phone’s app via Bluetooth with the car’s on-board hardware, drivers will be able to interact with a respectable “eyes-on-the-road” experience while asking for directions; being informed about the car’s fuel, charge, or temperature conditions; scheduling a car maintenance appointment with a local certified service center; and who knows what else.

Here are two good references on voice input: Android: How to implement voice recognition, a nice easy tutorial, and Ford SYNC Applink Mobile App Developer Network.

interfaces3.jpg

Touch Me

Apple (yeah, I know, them again) changed the computing world with the commercialization of the computer mouse, and they have done it again 30 years later with the multitouch interface. Just as the mouse proved to be the catalyst of the personal computing revolution, the multitouch interface is the catalyst of the mobile computing revolution. Tablets, phones, interactive whiteboards, multi-point touch pads and displays are all growing examples of this new world. Instead of the PC’s disembodied mouse-screen connection, touch interfaces provide immediate spatial feedback, which explains why people from 8 months to 80 years can usually interact with touch computing interfaces with little or no training.

And we’re just starting to see what can be done with multi-touch. Things get more interesting when more than just one digit is involved. Multi-touch is where people’s eyebrows raise in surprise and—if the interface is designed correctly—delight. More sophisticated touch capabilities, exceeding support for four simultaneous touch points, bring a whole new dimension to graphic manipulation, input, and acknowledgement.

Current iOS and Android devices have only just begun to utilize the full potential of what touch has to offer. Add a third dimension, depth, to the touch experience, and you get the ability to quickly flip through screens, but even more—to navigate through complex molecular assemblies, rapidly sort through a virtual stack of documents, or video-capture a three-dimensional world for playback manipulation that would make the VR gloves in science fiction movies look downright primitive.

Three good web references for multitouch are Multi-touch GIS API for TableTops (click on “Learn about the design of the API”), Python Multitouch Toolkit, and Gesture Toolkit.

interfaces4.jpg

Heal Me

Each one of these interfaces has the opportunity to forever change the way we use and develop applications for computing devices in the near future, and they all deserve your attention. Think about what’s involved in abstracting your own rich client applications to an MVC design to take advantage of these technologies by mapping inputs and outputs to the optimal interface of choice.

I know, we haven’t completely mastered testing on our traditional two-dimensional X/Y coordinate GUIs, let alone the complexity that voice and gesture interaction entail. But the future is ready for you to hack, and any one of these interface choices holds great potential for the ones who best master developing for them.

Mike Riley is the author of Programming Your Home, published by Pragmatic Bookshelf.

Illustration created by Marielle Riley with MyPaint running on Ubuntu Oneiric Ocelot.

Like this article? Hate it? Send the author your feedback or discuss the article in the magazine forum.