音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

UTILIZING THE POWER OF VOICE RECOGNITION

Kevin Price, Technology Specialist, University of Missouri-Columbia

Web Posted on: December 12, 1997


The promise of voice recognition technology has been one that many feel has been unfulfilled. Science fiction has portrayed voice recognition technology as people having conversations with machines. The reality is much less but for many people with mobility impairments, voice recognition technology creates freedom that is unsurpassed by other technologies. When applied correctly, voice recognition technology can be a tool that will increase a users productivity and independence. This paper's purpose is to show how powerful voice recognition can be when customized for an individual's needs. By evaluating the user's needs and then creatively applying voice recognition, users can independently work in the office as well as live more independently at home.

To effectively apply voice recognition there are many questions that need to be answered. The first question is whether voice recognition is the best solution. Voice recognition can decrease productivity when other solutions work better for the user. When a user can access a keyboard and type productively, and environmental controls are needed, no added voice technology may not be the perfect solution. When an individual is a quadriplegic and has no access to a keyboard and needs environmental controls, then voice recognition can be the perfect solution. Many users interested in voice recognition fall somewhere in between and effectively evaluating these users needs will help determine whether voice recognition will increase productivity for the user. Even if a user gets voice recognition that fit his or her needs, without the appropriate customization, setup, and training of the technology, the successful implementation of the technology will be compromised. Paying for a computer system that can handle voice recognition can be expensive for many users, but not following it up with appropriate training and setup can make the whole system a waste for the user.

If voice recognition is the effective solution for the user, a choice has to be made on what voice recognition technology product is needed. Full dictation systems allow users to enter text into word processors and other applications. Kurzweil Voice for Windows, Power Secretary for the Macintosh, DragonDictate for Windows and DOS, and IBM VoiceType for Windows and OS/2 are the full dictation systems available. The Power Secretary is designed for the Macintosh and is the only solution available for users who need to use applications written for the Macintosh. For the DOS only operating environment, the Dragon Dictate for DOS is the only available solution for dictation. For the OS/2 operating environment, IBM VoiceType is the dictation program available here. If a user is using Windows 3.1 or Windows 95, there is a choice between three different dictation programs, Kurzweil Voice for Windows, Dragon Dictate for Windows, and IBM VoiceType for Windows. IBM VoiceType is the voice recognition system that allows the user to type the fastest with the voice but it doesn't allow users to correct the mistakes with their voice. Kurzweil Voice for Windows is the system that is most speaker independent. Speaker independence is a system that allows the user to do productive voice recognition with no voice training and exercises to be productive. Kurzweil Voice is more hands free then the IBM VoiceType but it fails to have some features of the DragonDictate. The DragonDictate is easier for a user to both correct text with their voice and move the mouse pointer on the screen.

Because of its superiority in its ability to work hands free for people with mobility impairments, the DragonDictate for Windows is the product that will be discussed the rest of this paper. The other products can do things for people who have some mobility impairments but do not adequately help those who are quadriplegic and need total hands free access to the computer.

Dictation of words into applications allows users to be productive at creating documents. To maximize a users effectiveness and to minimize frustrations that can occur with using voice recognition, voice macros allow users to automate tasks that would usually take many voice commands to complete. DragonDictate and other voice recognition programs include many predefined voice macros for many common tasks. These predefined macros have their uses but for maximum effectiveness, customized macros are important in reducing the effort when dictating. The complexity of voice command varies on the task that needs to be completed. A common simple macro that many people create is one that inputs the users home address for their letters and other documents. Just by saying "Home Address" several lines of text are placed in the document. It might take 12 voice commands to usually enter one's address, but with a simple personalized voice command it will take only one. The DragonDictate has two standard types of macros, Keystroke macros and Scripting Macros. Both types of macros can be placed in different vocabulary groups or in a general vocabulary group. A user who uses the "Home Address" macro in WordPerfect would only need to have this macro available when the WordPerfect vocabulary is loaded. Putting this macro in a global vocabulary would be unnecessary. Keystroking macros can place virtually and unlimited number of key strokes and key combinations into applications. Keystroking macros are easy to create and no programming commands are used. Scripting macros are more complex and take special commands from a scripting language. Once one masters scripting, users can complete a sequence of commands that automates a task that controls DragonDictate and Windows. An example of this would be a scripting command that opens up WordPerfect, sets the default directory to c:\wpdocs, loads a specific file such as letter.wpd and then maximizes the window. All of this can be done using one scripting command that can be activated by a voice macro. Scripting is for users who have specific customized needs and can deal with the basic scripting language or have other people who can assist the user in this customization.

Voice recognition macros can be used with other technologies that allow users with severe mobility impairments the freedom to do many things independently. Controlling an individual's environment by doing such tasks as turning on and off lights, changing channels on the television, controlling all aspects of a stereo, answering the phone, dialing phone numbers, and controlling a thermostat are just a few things that can be combined with voice recognition technology. The technologies used to create such independence are telephony, infrared technology, radio frequency technology, and X-10 technology.

Telephony is a concept that refers to computer technology controlling telephone actions. In order to use telephony with voice recognition, a user needs some type of modem and a speaker or microphone that one can speak into. A telephony computer program is also needed. Once all the hardware and software components are in place, a user can make and receive phone calls without having to lift a receiver or dial by hand. A user with telephony and DragonDictate voice macros can say "Dial Bob". Once the user says "Dial Bob", the DragonDictate brings up the telephony program, switches to the dialing directory, finds the proper phone number, and enters the command that will dial that phone number through the modem. Once it begins to dial, the DragonDictate is put to sleep so no more DragonDictate commands will be recognized and the speaker phone or microphone is enabled for the user to talk through. The DragonDictate can be waken by saying "Hangup phone" which hangs up the phone and allows the user to continue dictating. Also the user can answer phone calls by using a macro such as "Answer phone". Other telephony features such as redialing, sending touch tone codes for voice mail, and call waiting can be automated by using the DragonDictate with voice macros. The telephone now has a hands free solution with this customized setup that gives phone users a chance to be more independent.

Infrared is another technology that creates hands free use when customized with voice recognition technology. Infrared technology remotely operates devices by sending out light wave signals that can be interpreted by a stereo, television, VCR or other electronic equipment. In order for infrared to work with a personal computer, an infrared device needs to be connected to the computer. An example of this would be having a serial port connection for the infrared transmitter. The infrared signals would be transmitted according to what the computer program sends to the transmitter. The computer program can then interact with voice recognitions macros. A macro can then be created that allows a user to turn up their television by saying "Television Turn up". Anything that can be transmitted over a infrared remote can be customized to be used with a macro. A major weakness of infrared technology is its inability to go through objects. The light goes out in straight lines from the transmitter and will not go through walls or anything in front of the transmitter.

Radio frequency(RF) technology can also remotely operate devices but instead of infrared light it uses sound waves to transmit its signals. RF technology does not have the limitations of infrared. Radio frequencies can go through walls and does not travel in straight lines from the source. Radio technology can be used in voice recognition by allowing a user to speak through a cordless microphone to a voice recognition program. A user can be several feet a way and entering in voice commands without worrying about a cord that would tether the user to the computer. Another use of RF technology is sending infrared commands with a RF Transmitter. A user cannot send infrared commands to a stereo in the another room because infrared will not go through walls. If a infrared light is beamed at an RF transmitter, the light signal can be sent to another RF receiver in another room. The radio frequency technology will carry the infrared commands through walls and then the RF receiver will send out the appropriate infrared signal to the device such as a television or stereo. Using a cordless RF microphone and using RF transmitters with receivers, it is possible for a user to turn on and off their stereo in their house no matter where they are and no matter if the stereo and computer are in the same room or not. This truly gives a user freedom to control their environment where they are located.

X-10 technology is a communications protocol that is designed for remote control of electrical devices. X-10 transmitters and receivers communicate over standard household wiring. The transmitter sends commands such as "turn off", "turn on" or "dim" preceded by the identification of the receiver unit. Each receiver has its one unit ID and only reacts to commands addressed to it and it ignores other commands. X-10 has the capabilities of controlling 256 addresses over a houses wiring when using multiple transmitters. It is also possible to turn on and off several electrical appliances at once by programming each of the receivers to the same address. When this X- 10 technology is integrated with voice recognition, a user has the ability to turn on and off any electrical device with a voice macro. Transmitters are connected to a computer through a serial port connection. The computer would have a program that would send the appropriate commands to control the X-10 transmitter. A voice macro could be made to turn off the radio by saying "off radio". The voice macro would send the appropriate command to the software program, the software program would send the command to the X-10 transmitter, and the command would be transferred through the wiring and each X-10 receiver until it reaches the receiver that has the appropriate address. That receiver then turns off the power for the radio. X-10 technology is an important technology solution when combined with voice recognition technology for hands free control of electrical appliances.

The technologies mentioned above can help users be more independent but when one hears of real life situations, the real power of these technologies becomes more apparent. Martha had polio when she was eleven. She was paralyzed and confined to an iron lung. She has always wanted to be a writer. By utilizing DragonDictate, she can not only write but also use environmental controls to be more independent. She uses Cintex2 environmental control hardware and software from Nanopac. Cintex2 combines the technologies of telephony, infrared, radio frequency and X-10 that have been discussed for a complete solution to work in her home. Cintex2 is a software program that interacts with those technologies and has already built the specialized voice macros for the DragonDictate so Margaret can quickly be productive. Nanopac helped Margaret appropriately setup a complete system for her needs with training. Nanopac gives ongoing support of their products. Nanopac and their Cintex2 environmental control software are not the only software available in this area. PROXi from Madenta communications also is an environmental control system similar to Cintex2. The technologies mentioned above can be integrated into a voice recognition system without PROXI or Cintex2, but users who do this on their own will need some technical expertise in installing the technologies and creating complex voice macros to work with these technologies. It is not an easy task to integrate the technologies completely. Nanopac and other companies make it their business to create, setup, teach, and support voice recognition systems that serve as a tool to meet the user's needs.

As discussed, voice recognition technology is a powerful tool for people with disabilities. Consequently, as voice recognition improves, the future may even be brighter for those with impairments. Continuous speech technology is being developed so users can speak naturally to work with computers. Many people in the world of technology believe in the not too distant future, the keyboard will be a thing of the past. To many people this concept seems like science fiction, but to those with disabilities it sounds like a more independent life with fewer barriers.


References on Speech Recognition and related topics discussed

21st Century Eloquence: Speech Recognition / Voice Recognition Specialists, http://www.voicerecognition.com

Bordon, Peter, Jaime Lubich, Gregg Vanderheiden. Trace Resourcebook 1996-1997 Edition: Assistive Technologies for Communication, Control and Computer Access. Madison,WI: Trace Research and Development Center, 1995.

DragonDictate / Speech Recognition FAQ, http://www.cl.cam.ac.uk/a2x-voice/dd-faq.html

Infrared Data Association, http://www.irda.org/

Markowitz, Judith. Using Speech Recognition. Up Saddler River, New Jersey: Prentice Hall PTR, 1996.

NanoPac Inc., http://www.nanopac.com, Silvio Cianfrone, Phone 918-665-0329.

Udell,Jon. "Computer Telephony." Byte 19(7) (1994): 80-96.

X-10 FAQ: Frequently Asked Questions about X-10 Products and Technology, http://www.homation.com/x10faq

X-10 USA: Description: Other Handy Products, http://www.x10.com/x10othe.htm#sr731