


e-Mail UpdatesWe will send you the latest program details when available. ContactsInquiries:
Sponsorship: sponsor@voicesearchconference.com
Organizers:
|
Notes on the last Voice Search ConferenceFrom Bill Meisel's Speech Strategy NewsThe inaugural Voice Search Conference was held in San Diego, March 10-12. The coorganizers—the Applied Voice Input Output Society and your editor (Bill Meisel)—created the conference. This editorial summarizes the categories covered by the conference, and why they fit a broad definition of “voice search.” The conference was motivated by the following observations:
“Voice Search” seemed to us to summarize these observations. The idea received strong industry support, with primary sponsors Call Genie, IBM, Nuance Communications, Vlingo Corporation, and VoiceBox Technologies; and supporting sponsors BBN Technologies, Convergys, Genesys, Loquendo, Nexidia, Novauris, and West Interactive; and by broad participation by respected industry experts in delivering conference content. Defining Voice Search Not everyone agreed with the very broad interpretation of “Voice Search” used by the conference organizers, and some discussion on panels (plus comments during Q&A sessions) revolved around how the term should be defined. Because of the implied analogy to Web search, the most obvious interpretation fit mobility applications on wireless phones where a spoken utterance was used to search a long list, including directory assistance services and specifying a location. A common comment was that speech technology doesn’t stand alone; other than obvious cases where its use could improve safety (while driving, in particular). Applications should take advantage of other modalities where available and appropriate, for example, to deliver long replies as text. Voice Search is intended as a unifying concept behind a number of application categories. The following sections of this article discuss major areas addressed at the conference. Audio response Directory assistance and voice portals In addition, there are specialized services, such as those that provide driving directions. A clear trend suggested by conference talks is the expansion of the directory assistance services into “voice portals” with multiple services accessed through a single, familiar number. (The term “voice portal” wasn’t popular because of past abuse of the term, but the concept is still useful.) Long-list search Unifying the mobile user experience Beyond the obvious limitations of the devices, vendors of unifying voice interfaces for mobile phones or automobile systems emphasize “feature creep,” the proliferation of what can be done on the devices themselves and with network-based services. Although approaches to achieving a unifying experience differ among vendors, they all endeavor to make it possible for the user to get what they want without knowing specific commands or navigating long menus—the core power of a well-executed speech interface. Part of the solution to the problem in some cases is learning preferences of each user to shorten the interaction. A typical point made by vendors is that speech can provide what appears to the user as a single interface to diverse applications. The appropriate use of multimodality is also a typical theme. Audio search and speech analytics Speech analytics in call centers can be used for conventional monitoring that goes beyond the very limited sampling often done by call center managers. It can give a more accurate statistical picture of what callers are doing and how well calls are being handled; and can allow finding specific calls that reveal issues in design of an IVR system or in agent training. A number of vendors offer increasingly sophisticated analytic systems. As audio/visual sources on the web proliferate (e.g., youTube), there is increased need to search the content and not just the metadata. Think, for example, of a company that wants to know what is being said about it in blogs, newscasts, or consumer-deployed videos. This is a difficult task, but several companies at the conference were attempting to address the need. Contact center automation Customers will be exposed to voice search interfaces in mobile services such as directory assistance and driving directions. They will view these services fondly, as time-and money-saving alternatives. Finding an equivalent experience at a call center may avoid the typical response of fighting the system to be connected to an agent. In addition, ad-supported telephone services will almost always include an option to be connected to the advertiser. This will create a high-volume application that call centers must address, and which may require speech automation to be economically feasible. Such calls must be answered promptly, lest the caller’s impulse to buy evaporates. Many existing call-center technologies support the voice search paradigm. One is “naturallanguage” call steering. After a general prompt with examples of what can be said, the caller just states his or her problem or objective. The semantic model within the call steering software has learned from examples what should be done with the call, and handles it appropriately with another level of automation or by transfer to an agent with the proper skills. There are other techniques that reduce navigation that are offered by vendors today and which are in use by some companies. Some of those mentioned at the conference include:
Unified Communications The downside of this unifying approach is that, fully accepted, it implies the daunting challenge of replacing all the communications in the enterprise—even those that seem to work just fine, thank you—with new platforms. And the concepts of UC go beyond the enterprise, extending to the enhancement of subscription-based network services. Where do UC and Voice Search overlap? Voice interfaces in auto attendants are an obvious case—just say who you want and be connected. Dialing or setting up conference calls by name is another long-list application. Converting voicemail to text for search and storage is an increasingly popular application, although not currently on the feature list of the major UC vendors. A more fundamental view is the use of a voice interface to manage the many features of UC. The more functions one bundles into a single system, the more challenging the user interface. Speech has the potential to become a “communications assistant” that lets users just say what they want (“If John Doe is available, add him to this call.”) No vendor has fully embraced this model, perhaps because of past technology limitations, but its time is due. Microsoft includes its Speech Server as a built-in part of the Office Communications Server, and Avaya discussed its Speech Access option at the Voice Search Conference. The latter allows commands such as “Read my messages,” “Call the sender,” “Find free time tomorrow,” “Give me a wake up call,” “Read my urgent messages from my boss,” and “Connect all calls,” according to the presentation. Certainly this fits the paradigm of “just say what you want and get it.” Unifying communications versus Unified Communications If communicating with oneself is communication, then services which allow the equivalent of voice notes (transcribed to text and often with categorization of the note) are another application in this category. Some services try to rise to the level of personal assistant, allowing commands such as “Create an appointment on Friday at 2 PM with Dan Smith” to create an entry in a calendar application. The utility of such applications, some of which use human agents to do the speech recognition, is clear if well designed. So… To return to the introductory theme, Voice Search simply summarizes a new era in the use of speech technology. Speech technology has passed a “tipping point.” It’s not the computer in StarTrek yet (and may never be), but it can solve user interface problems that might otherwise require many confusing steps to achieve a result. TMA Associates (www.tmaa.com) |