Web Posted on:

RECOMMENDING THE OPTIMAL SPEECH RECOGNITION SOLUTION

Kevin Price
Technology Specialist University of Missouri-Columbia
EMAIL: pricek@missouri.edu

The capabilities of speech recognition technology are developing at a remarkable pace. Speech recognition technologies that were only dreamed of a few years ago are now available for everyone at a reasonable cost. Due to how quickly speech recognition is being developed, it can be confusing on what makes up an optimal speech recognition solution. The Adaptive Computing Technology Center of the University of Missouri-Columbia keeps on the cutting edge of technologies available for people with disabilities and provides recommendations for successful implementation of speech recognition.

This paper is going to take an overview of speech recognition on personal computers as an aid for people with mobility impairments. It is going to explore the current state of speech recognition and try to give you insights into what it takes to implement this technology. Although the focus of this paper is on the benefits of speech recognition for users with mobility impairments, this paper will conclude by making some predictions of coming trends in speech recognition for users with a variety of disabilities including those with visual impairments and learning disabilities.

Speech recognition technology is different from many technologies that can assist users with disabilities. The driving force behind the growth of speech recognition is not the disability market but how everybody including professionals are embracing the capabilities of speech recognition. Doctors, lawyers, businessmen, and numerous other professionals are trying to find ways to record information using speech. These users want to eliminate the barriers of getting ideas down into an electronic format such as having to use a secretary or having to interpret one's own handwriting.

People with disabilities use the technology to overcome barriers, but the advantages of speech recognition are not convenience but out of necessity. Many people with disabilities who use the technology go from complete dependence on others, to having the independence necessary to complete their work hands-free. No longer is it necessary for the user to struggle with slow keyboard commands. The growth of speech recognition as a mainstream technology has both potential benefits and drawbacks for people with disabilities. Depending on the nature of an individual's disability or disabilities and the way the technology is implemented with that user, speech recognition may be the best solution leading to productivity or it could be a frustrating waste of the user's time.

Continuous speech recognition is a technology that has been sought after by many including people with disabilities. In the recent past, users had only one choice and that was discrete speech recognition to enter information using a computer. The discrete speech systems make you put a definite break between each word and command. This unnatural way of speaking can at times be frustrating and time consuming. The advent of continuous speech by Dragon Systems for the general population in April of 1997 has changed speech recognition and is the future of further development.

People can be more natural when they are dictating into a computer word processor. Dragon Systems claims that a user can dictate 160 words per minute with their product. Even if it is actually a fraction of that speed, it is obvious that it could assist many individuals with disabilities who are used to very slow input of text. One doesn't have to put a definite break between words for this high speed of dictation, but there are still problems with the continuous speech technology for certain people with disabilities. Continuous speech works the best only within certain computer programs and relies on the user to enunciate very clearly.

Continuous speech systems do not work well with the complete hands-free control of a computer and within the many programs available for a computer. New speech recognition programs are being developed that have both the capabilities of command and control of the operating system and continuous dictation capabilities into certain applications, but no recognition system has provided an integrated solution where a user does not have to click or use the keyboard in anyway.

The only way for a user with a disability that needs the complete hands-free solution is to use discrete speech recognition along with the continuous speech recognition. For example, Dragon Systems recommends using the Dragon NaturallySpeaking (TM) continuous speech software in conjunction with DragonDictate (TM) discrete speech software for a complete hands-free solution. This combination is not well integrated together and has separate commands that can be confusing when you switch from one to the other.

Besides evaluating the need for hands-free access, there are other considerations when working with a person with a disability. There needs to be an evaluation on how well they enunciate words and how they are able to handle all the commands cognitively. Slurring words together is a problem with continuous speech systems. Some people with cerebral palsy and other disabilities may benefit by having to pause between each word using a discrete speech recognition system. Continuous speech system for those having problems with enunciating may be too frustrating to be of any value. Also for those who have problems memorizing the many needed speech recognition commands and correcting the mistakes that all speech recognition systems make, the speech recognition system may be to hard and frustrating to be of any value.

Even with all the advancements of speech recognition, the reality of these continuous speech systems, is that they make a lot of mistakes and take a lot of concentration to work with. Other solutions may include using a head pointer, a mouth stick, trackball, and/or on screen keyboard could be more beneficial to the user. For those users with disabilities that have some access to the keyboard, ability to enunciate clearly, and have the cognitive capabilities to work with the many speech recognition commands, the continuous speech system will be of much value at making them productive.

In order for speech recognition to be available for people with disabilities, the technology has to be affordable. In 1990, a discrete speech recognition system cost ten thousand dollars. Along with a computer, monitor, printer, software, etc., a system would cost around twenty thousand dollars. The price of speech recognition has now been reduced so a user can purchase a complete system for around two thousand dollars. Even though prices have been reduced for speech recognition, there are a few areas that it is better not to scrimp on when you are purchasing a system.

The amount of memory (RAM-Random Access Memory) that a computer system has is critical for speech recognition. This allows the speech recognition program to run quickly without having to access the computer's hard drive. The new continuous speech systems are demanding more memory to increase their capabilities. For example, Dragon Systems has developed a new technology called Best Match (TM) technology, which is designed to increase accuracy of dictation. Dragon Systems added 16 megabytes of memory to its system requirements when using this new feature. The recommendation of many people who are familiar with the demands of speech recognition is to get a computer system with 128 megabytes of memory and have room to expand the amount of memory in the future. As speech recognition improves it will be necessary for even more memory on a speech recognition system.

Besides memory, a very good sound card is recommended for speech recognition. Without a good sound card, the speed and quality of the dictation will be reduced making it more frustrating then necessary for a user with a disability. The best way to find out if a sound card will work well is by either contacting the company that makes the speech recognition technology or by talking to a user who has used the sound card with speech recognition successfully. After you have considered buying a computer system with plenty of memory and a good sound card, your next consideration should be the speed and capabilities of the computer's processor.

A fast computer processor can help speed up speech recognition. Currently a 300-MHz processor is recommended for speech recognition systems. The actual speed of the speech recognition system is also affected by whether the computer processor has "MMX" capabilities and has a built in "L2 Cache". MMX and built in L2 Cache allow the processor to process speech files more quickly which eliminates slow and possibly frustrating recognition performance.

The main other critical item not to scrimp on is a good microphone that meets the users needs. Many speech recognition systems include a microphone, but the best microphones have noise-canceling capabilities. Noise canceling microphones filter out extraneous background noises to allow the user to dictate without other noises in the environment disrupting the speech recognition process. Besides noise-canceling capabilities, the user with a disability may have problems with putting a headset microphone or being "tethered" to the computer.

There are two choices that one can make in this situation. One choice is to purchase a desktop or gooseneck microphone that would mount on a user's desk in front of the computer. This would eliminate the need to have someone place the headset on the user. Another choice is to purchase a wireless microphone that would allow the user to dictate untethered to the computer.

Currently when you are purchasing a computer system, Microsoft Windows (TM) based computers have the most speech recognition software available for the user. Macintosh computers do not have any developed continuous speech software available at the time of this paper. Recently, Dragon Systems discontinued their development of the PowerSecretary (TM) discrete speech recognition for the Macintosh. For users who need mainframe, UNIX, and other operating system access, there are very few choices.

The only product that gives access to these operating systems with speech recognition currently is the Synapse TAP Workstation by Synapse. This product connects a Windows-based machine with another UNIX, Mainframe, or other computing systems through a specialized connecting device. The dictated speech recognition information is transferred from one computer to the other. A disadvantage to this solution is the cost of having to have two computer systems and the specialized connector. For many that need the speech recognition in those computing environments, it is the only solution available currently.

Once a user has been evaluated carefully on whether speech recognition is the correct accommodation and the proper hardware and software has been purchase, the training and setup of the system is a critical component to successful implementation of the optimal speech recognition solution. The more hands-free options that a user needs to work with the computer, the more speech recognition commands have to be learned. The more specialized needs of the user, the more important it is to setup the machine to that person's individual needs for successful implementation.

When first setting up the speech recognition system, the environment that the user will be dictating will need to be evaluated. If there are too many background noises where the system is setup, the more misrecognitions will occur with the system. The environment does not have to be completely silent but if there are frequent loud or sudden noises, this will affect the recognition of the system. Once the proper environment has been chosen, the computer must be setup so the desktop or headset microphone is easy to access. Users may need an adjustable table to allow their wheelchair to get close to the system.

A user may need a special switch that will turn on the computer because many computers have power switches that are hard to access. A surge protector that is connected to another surge protector is a possible solution. One surge protector could be used to turn on and off the computer system and the other surge protector can be used to maintain the safety of the system from electrical surges. Once the user has access to the system, the speech recognition system may be setup so that it will automatically load the speech recognition software. This is easy to do by placing the speech recognition program in the Startup group when using Microsoft Windows. This setup accommodation should be implemented after the user has had some training with the speech recognition system and is comfortable with the system.

Having a trainer to help the user begin using the speech recognition system is an essential component for the optimal solution. Most users will need the training to be able to work through the initial frustrations of using the system. The trainer should be able to help the user relax and take frequent breaks to drink water when using the system. Especially when a user needs complete hands-free use of the computer, there will be a large amount of learning that will need to take place. A trainer can give the user tips and handouts that will benefit the user in the training process.

This paper has discussed the current optimal system recommendation but now we are going to turn to the future. What is the future of speech recognition going to be like for people with disabilities? For one thing speech recognition is not just going to affect computers. Speech will be integrated into a lot of household appliances as the technology improves and expands. Some Videocassette Recorders have already implemented speech recognition technology into their systems. Most likely the technology will have to be very easy to use and recognize a wide range of voices with minimal training before it will be used in household appliances.

A very interesting trend that will continue is speech output being used in combination with speech recognition. Most speech recognition systems are adding speech output so that one can hear back what is dictated. Expect this trend to continue until both the monitor and the keyboard are not as necessary. For people with learning disabilities this is helpful to giving the user extra feedback to the information that has been dictated. For those who are blind, this will allow them to get information and feedback to what was entered without having the need of the monitor. Speech recognition currently does not work very well for those who are visually impaired and need speech output.

In the future, there will be a reduction of training time and frustration for speech recognition systems, so a person will be able to go to any computer and be able to dictate quickly. As computers become faster and faster with plenty of memory to work with, training will only become easier. Many of the speech recognition programs have increasing system requirements, which may add expense but will allow for better processing of speech and reduced training time. Users will still need a very high-end computer to take advantage of all the new capabilities of speech recognition when they are developed.

A tighter integration of the discrete and continuous speech systems is a beneficial trend. The user with a disability will be able to dictate quickly and also have the ability to work in any application, menu, etc. Total hands-free access will be developed not just for people with disabilities but for the mainstream users that don't want to type at all.

A trend that may not be beneficial for speech recognition, is an increased number of people who have voice strains due to dictating. Speech recognition will reduce carpal tunnel problems but the voice strain issues cannot be ignored. A user will have to take frequent breaks when using speech recognition, have liquids available to lubricate the throat, and train their speech to be natural and not exaggerated when dictating into a system. A user with a disability need not take on this problem if they are careful and take precautions to avoid voice strains.

The optimal speech recognition system is an evolving target. The trends in speech recognition will be exciting as well as have pitfalls for users with disabilities. The critical variable will not be focusing on the technology but the needs of the individual user. Companies who manufacture speech recognition technologies need to be made aware that it is critical that they develop with the needs of a person with a disability in mind. Being aware of the issues regarding the current state of speech recognition and using that information to customize recommendations for a particular user, will only help the user with a disability to be productive and more independent.

References on Speech Recognition and related topics discussed:

21st Century Eloquence: Speech Recognition / Voice Recognition Specialists, http://www.voicerecognition.com Phone 800-245-2133.

Dragon Systems, Inc.,http://www.dragonsys.com, Phone 617-965-5200.

NanoPac Inc., http://www.nanopac.com, Silvio Cianfrone, Phone 918-665-0329.

Olive Tree Software, EMAIL: info@olivetreesoftware, David Arnold, Phone 314-209-7717.

Synapse, http://www.synapseadaptive.com, Phone 888-285-9988.

Typing Injury FAQ: Speech Recognition, http://www.tifaq.com/speech.html

Go to the top of this page. | Go to the upper category.