User:Tone/Wildfire (virtual assistant)

From The Dreadnought Project
Jump to: navigation, search

Wildfire was an anthropomorphic virtual assistant created in the early 1990s which aspired to permit users to tell the phone network whom they wanted to speak with rather than the number of a phone they wished to cause to ring in order to satisfy their goal. The user interface was primarily based on automatic speech recognition with a parallel touchtone equivalent and invited users to think of the virtual assistant as a real person akin to a secretary or a telephone exchange operator. First made commercially available in 1994, Wildfire received considerable attention, but never achieved the fundamental vision of its creator - to be so wildly popular that it would spread "like wildfire."

Genesis

Wildfire was invented by William J. Warner, an inventor from Boston, Massachusetts who had very recently created the first commercially viable non-linear video editing system, Avid. While his fledgling company, Avid Technology was releasing and later refining the first generation of its video editing suite, Bill became fixated on an unrelated, transformative idea. He worked on presentations and slides which outlined a radical rethinking of how people communicated by telephone, proposing to build systems that would respond to spoken commands and use the underlying telephone networks to meet their high-level intention: whom did they want to contact, and in what manner? The system would then seek to either direct their communication accordingly or take a message, depending upon the availability of the person being sought. It would even deliver calls to you if you were checking your messages from a pay-phone at an airport — the caller did not even need to be aware that you were traveling.

As his thinking developed, the tension between the product vision and that of the video editing console that was the charter of Avid Technology became obvious. Warner was determined, however, and convinced the board and the investors that the idea had value. But if Avid were to realize the potential in its promising new field, the Wildfire concept would have to be pursued under a dedicated corporate entity and not under the auspices of Avid Technology.

Warner started working with Nicholas d'Arbeloff, whom he'd known previously when both had worked at Apollo Computer. Nick was then working with Precision Robots, Inc (later Brooks Automation) as a marketing executive, and saw both the promise of Warner's idea and his zeal to see it realized. Development moved from Avid to Warner's home in Weston, Massachusetts. Rich Miner and Tony Lovell were added to the team to start developing a prototype. Miner had already worked with Warner, running a lab at UMass Lowell that incubated early work on Avid, and Lovell had worked with d'Arbeloff, animating prototype robotic systems for hardware evaluation and trade show purposes.

Development

Initial Work

Warner provided initial capital, and work soon moved to offices in Waltham. Lovell crafted a simple prototype system on a 80486 system running MS-DOS drawing upon ideas d'Arbeloff and Warner had sketched. This system used a single phone line and provided voice dialing, voice messaging and call routing functions under a fairly Spartan user interface. With just a single phone line, the idea of voice dialing seemed a future prospect, but Warner realized that the residential subscriber feature of three-way calling could be coerced to deliver the functionality, albeit with some instability, and this allowed a more compelling demo to be created affordably.

The speech recognition was hardware-based on a Voice Processing Corporation VPC-100 card with a '386 support processor, as well as an analog telephone line interface daughter card. The single-channel, demonstration quality of the system avoided the need for a database.

Productization

The prototype system gave way to Unixware systems running on early Pentium hardware. The single-channel voice recognition gave way to one or two VPro-8 cards, each capable of delivering 8 channels of discrete (word- or phrase-based) recognition. These cards connected over an in-chassis MVIP bus on a ribbon strip to the telephone line interface cards from Natural Microsystems, which could variously connect to several RJ-11 or RJ-14 analog modular plugs or to a T1-based telephony trunk. User account data was stored in an ObjectStore object-oriented database system from Object Design, Incorporated. A hard drive of 2 GB (?) provided the persistent storage for the database and data such as voice messages (which were usually the majority of data).

The Voice

The initial system spoke in Tony Lovell's voice, as the designer and sole (or, later, application-level) code always guaranteed to be available when new prompts were needed. For productization, however, something more polished was needed. Bill Warner felt that a British woman's voice would be nice, and two British woman suggested by the local consul were recorded. Internal impressions to these were tepid. On a whim, Lovell recorded his girlfriend, Patti Davis, whose voice was immediately found evocative and unique: she was to be the voice of Wildfire.

Evolution of the Interface

Bill Warner's initial vision was that a Wildfire user should be able to place and receive a series of calls and perform messaging tasks in a single phone call to the system. This gave rise to one of the primary innovations of the system: a persistent user "session". With this interactive model, when the user placed a call through the system, Wildfire would stop listening to commands and "go into the background" and listen only for a distinctive "wake up" command. This allowed the system to silently ignore the telephone conversation until the user decided Wildfire's assistance was again needed, such as to hang up the call on the third party before possibly continuing with other tasks.

Discrete Speech Recognition

Until about 1999, Wildfire systems exclusively used hardware-based speech recognition which was restricted to discrete word and phrase recognition. Until later advances would permit grammar-based command structures, this meant that a user's discourse with Wildfire entailed more interaction than most would have chosen. For instance, one could not ask Wildfire to "Call Bill Warner at home". Rather, this became a dialog in the form:

User: Call
Wildfire: Call who?[1]
User: Bill Warner
Wildfire: (Bill Warner)[2] at which place?
User:  Home
Wildfire:  Dialing...

The sole exception to the discrete word and phrase recognition was Wildfire could recognize arbitrary strings of digits for dialing literal phone numbers or adding them to contact data, e.g.,

User: Call phone number
Wildfire: What's the number?
User: 617 555 1212
Wildfire:  617 555 1212 -- is that correct?
User:  Yes
Wildfire: Dialing...

Tony Lovell designed Wildfire's dialogs with a mind that when affordable speech recognizers could allow continuous expressions that the users would succeed without having to rethink the basic interface, and if they did not express a command thoroughly enough that the earlier interactions could permit Wildfire to collect any information that had been omitted in the initial command.

Organization of Messages, Contacts and Similar

As the team sought funding for continued development, early incarnations of the prototype interface had a very modal, application-based feel. Switching between these applications was done by saying "Jump to (application name)", e.g., "Jump to Calendar". When "in" an application, the data for that application would be available and some commands might change accordingly. While this working model was familiar to computer users, it seemed to require needless juggling and explanation. The applications seemed like just the sort of underpinnings no one would allude to when asking a person to help them over the phone.

At some point, for a brief while, Lovell decided to replace the "Jump to" command with the use of the word "Wildfire" -- almost as an incantation like "Shazam" -- thinking that overheard use of the system would thereby foster a strong brand awareness. It sounded stilted, however, and within days, he realized "Wildfire" was more naturally employed as the system's "wake up" command. In making this change, he decided to make Wildfire's response a playful "Here I am!" The psychological effect of this choice was profound: it humanized the interaction. "Wildfire" became a person in the same instant, and was generally referred to as "she".

A gradual process of evolution in the command structure resulted which generally followed this pattern. Commands that resembled instructions to a computer gave way to spoken commands one might issue to a person offering remote help over the phone. The notion of jumping between applications was replaced by general purpose commands that worked with "items" the user asked Wildfire to "find" by describing them. This was more natural and obviated the need to conceive and educate the user about an arbitrary organizational model for the contacts, messages, and calendar items.

The user could ask Wildfire to find something, such as "find new messages" or "find contact for Jane Barrett", and Wildfire would instantly grab the matching items and respond with their number and some details of the first one, e.g., "I found 5 new messages. The first is from Eric Snider" or "Contact for (Jane Barrett)." The user could work with that first message or single contact by using generic commands such as "Describe it", "Give them a call", "Update it", "What's it say?" or "Throw it away", or he could reorder the set or iterate through it via "Sort them newest first", "Next item", "Previous item."

This interactive model was both natural and tool-like, allowing users considerable flexibility when dealing with large amounts of data and promising to accommodate the addition of new types of data with minimal strain or growth in the interface itself.

See also

Notes

  1. Wildfire was not always grammatically correct. These choices were usually by design. "Call whom?" was deemed to sound persnickety.
  2. Personal names would be spoken back for passive confirmation in the user's own voice. Early Wildfire systems did not use text-to-speech.