“Siri, do you like movies about gladiators?”
Voice recognition, in theory, is a remarkable innovation. In my experience, and probably the experience of frustrated millions across the world, the reality of how it’s been utilized so far has been (to put it kindly) disappointing. We’re expecting hands free, cutting edge interaction between human being and computer — as seen in Star Trek and Buck Rogers in the 25th Century — but what we’re served is usually closer to a concoction of Red Dwarf‘s deranged Holly, 2001: A Space Odyssey’s HAL 9000, and a broken electric pencil sharpener; ask, and ye shall not receive exactly what you were expecting. One needn’t go far to witness the ineptness of a computer’s ability to effectively discern the nuances of human speech without botching the job; being Scottish isn’t necessary to get truly dismal results, but it sure helps (apologies to LockerGnome’s John McKinlay, who probably has his own devious ways to outwit smart alec elevators in spite of being a son of Caledonia).
A few problems faced by voice recognition developers go beyond the obvious fluctuations in human dialogue and dialects. For instance, ambient noise that a microphone may pick up isn’t always easy to filter out of the equation. When a human being is trying to pick out a conversation in a crowded room, he or she will usually have the benefit of visual and other sensory cues to aid in the filtering process. Most computers in 2011 can’t claim this advantage, so voice recognition developers rely on mathematical methods to ascertain which incoming signals are legitimately intended for interpretation, and which ones are just noise.
Microsoft’s Kinect for its Xbox 360 gaming and media console is probably the most successful contender in this round of voice recognition efforts; it goes further than most other such systems and actually is able to integrate visual cues into its repertoire for a more complete, hands-free user interface. Microsoft has announced that, while Kinect has surpassed the company’s and consumers’ hopes and expectations with the original unit that shipped, new developments in its Tellme technology (that also powers Windows Phone 7 and other devices) will expand dramatically upon current voice recognition potential.
“We are laying a foundation that will transform how people interact with devices,” says Thomas Soemo, Microsoft’s principal program manager lead for the Xbox platform. “We are at that cusp. With Kinect, we’ve put speech into the living room. Now, Microsoft will continue to push the boundaries of NUIs [Natural User Interfaces] to enable seamless experiences that span devices and platforms.”
Adds Keith Herold, a senior Tellme program manager lead: “What are the most amazing experiences with speech we can imagine? Can we create technology that is as natural as talking to a friend? This is where we want to go, and it’s happening in front of our eyes.”
What are your experiences with voice recognition technology? Do you see Microsoft getting closer to perfecting it with Tellme? I own a Kinect and I think it’s an overall cool idea, but I see room for improvement. Then again, I live on a busy street, so I admit that the ambient noise we were talking about earlier makes voice recognition technology a lot more challenging for such a device. What improvements, if any, would you suggest to Microsoft’s Tellme team if you could bend some ears there?