音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

POTENTIAL PROBLEMS ASSOCIATED WITH USE OF SPEECH RECOGNITION PRODUCTS

Dona Kambeyanda  M.S., Stan Cronk  Ph.D., and Lois Singerà DSPA   University of Tennessee, Memphis, Rehabilitation Engineering Program à Voice Laboratory and Treatment Centre of Ontario Clinic of Injury and Disease Response, Toronto

Abstract

Commercial speech recognition dictation products are increasingly being used as alternate input devices for computers, particularly by persons with physical disabilities. These discrete speech recognition products require the user to insert brief but distinct pauses between each spoken word. The need to isolate each word while dictating text causes the vocal folds of the untrained user to slam open and shut, resulting in glottal attacks. The tendency to maintain constant pitch, volume, and inflection while dictating to the computer results in keeping the musculature in a fixed position. Maintaining this musculature in a rigid position for extended periods of time could eventually result in injury. The growing use of speech recognition products by persons with and without disabilities indicates an urgent need to determine potential problems associated with the use of these products. Preliminary studies indicate that persons with Repetitive Strain Injuries (RSI) may be most vulnerable. Common-sense strategies (take frequent breaks, drink plenty of water) may postpone or minimize problems.

Background

Several powerful speech recognition systems for dictating text to computers are now commercially available. Systems for IBM-compatible computers include DragonDictate from Dragon Systems, Kurzweil Voice from Kurzweil Applied Intel-ligence, and VoiceType Dictation from IBM; PowerSecretary from Articulate Systems is available for the Macintosh.

Each of these systems has several features in common with the others, including:

  • Large vocabulary size -Each system has active vocabulary sizes of 30,000 to 60,000 words, allowing the user to enter text by speaking entire words, rather than entering letters individually.
  • Speaker-dependence -The user of each system must train that system to recognize how he or she produces different phonemes.
  • Isolated word - Each system requires users to pause between each word so that the speech recognition system can determine the boundaries between utterances before processing the utterance.

Speech recognition systems are popular alternate computer input methods for two principal reasons: speech is a natural form of communication, and speech recog-nition systems can recognize speech at a rate faster than many persons can type (typically 30 to 50 words per minute (1,2,3)). The fast input rate using speech is especially useful for persons with physical disabilities limiting their ability to operate a computer keyboard.

The sales of speech recognition systems are ex-pected to grow at an annual rate of 26%, with the market for speech-recognition applications reaching $750 million by 1997 (4). This growth is expected partly because of the rapid increase in the incidence of RSI injuries. Occupational and Health Safety (OSHA) statistics for 1992 showed that 56% of all work place injuries during the year were due to RSI, up from 18% in 1981 (5).

Repetitive Stress or Strain Injury (RSI), also known as Cumulative Trauma Disorder (CTD) or Occupational Overuse Syndrome (OOS) can be defined as a physiological condition that develops due to long-term trauma or stress to the body. Repetitive activity of the musculoskeletal system may lead to several symptoms, particularly pain. These symptoms could be attributed to local pathology directly associated with repetitive motion or forceful movements of the musculature, or may be part of referred pain which relates to postural effects (6) . Several factors give rise to RSI, including (a) rapid, repetitive movement, (b) less frequent, more forceful movements, and (c) static load (7).

Some anecdotal evidence of problems believed to be caused by improper use of the discrete speech recognition systems is available (8,9). Persons using the systems have reported severe problems with their voices, such as hoarseness, sore throats, and even a complete loss of their voice.

Research Questions

  • What, if any, are the types of problems caused by the improper use of speech recognition products? What are the extent of these prob-lems and their repercussions? What are the initial symptoms that can lead to a clinical prognosis and appropriate intervention? What precautions can be taken by the user in order to avoid or minimize these problems?
  • Can certain risk factors be identified? Are risk factors independent of either the technology or the user? Do the studies suggest that certain populations, categorized by physical or psy- chological differences, are more susceptible to voice problems than others?

Method

Pre-therapy baseline assessments were performed with five 5 clients using Kay Elemetric Computerized Laboratory and Aerophone equipment. These tests include spectrographic, spectral, LTAS waveform, LPC pitch and energy analysis, phonatory air flow studies, and audio tapes.

A survey was posted on the Internet. Respondents were asked to provide brief answers to questions regarding their usage of speech recognition products. They were also requested to indicate if the researchers could make follow-up contacts for more detailed information.

The clients of the UT Rehabilitation Engineering Program who have received speech recognition systems over the last 3 1/2 years were contacted to determine what problems, if any, they are having with their systems.

Results

For the five clients tested extensively, the following symptoms were prevalent:

  • inappropriate low pitch
  • monotone speech
  • weak, barely audible voice
  • inability to modulate voice
  • uncontrollable coughing bouts
  • chronic hoarseness
  • frequent aphonic episodes

The otolaryngology diagnoses for these individuals are shown in Table 1.

Table 1. Otolaryngology diagnoses of five persons undergoing extensive clinical testing.

diagnosis # of participants
bowed vocal cords 1
vocal fatigue 2
chronic hoarseness 1 .
vocal abuse 1

The participants in the preliminary study included the five individuals who underwent extensive clinical testing, fifteen persons who responded to a survey posted on the Internet, and five persons who responded to a phone survey (a total of twenty-five participants). The responses were categorized into 3 groups: Group A reported no problems with the use of their speech recognition systems; Group B reported the development of problems with their voices; and Group C reported that they had discontinued use of the speech recognition system for reasons other than health. The results of that survey are summarized in Table 2.

Table 2. Summary Results of Preliminary Study on Use of Speech Recognition Products

# of persons Group A Group B G roup C
RSI 3 10 0
non-RSI 1 1 3
disability not reported 3 4 0
Totals 7 15 3

Of the twenty-five participants, fifteen claim to have experienced problems with their voices. Three of these individuals reported that they have stopped using speech recognition altogether because of the severity of the problems they experienced with their voice.

A high percentage of participants reporting a disability indicated that their disability is RSI-related (13 out of 17). Three of the four persons in Group A reporting a disability indicated that their disability is RSI-related. Ten of the eleven persons in Group B reporting a disability indicated that the disability is an RSI-related injury.

The persons who experienced problems such as hoarseness, sore throats, etc. reported using the speech recognition systems from 45 minutes to 3 hours at a stretch, without taking breaks. Discussion These preliminary results indicate that improper use of these discrete- word speech recognition systems may cause moderate to severe problems in the voices of the users. This initial study also indicates that persons with RSI may be more susceptible to vocal injury. It has been hypothesized that these persons have a tendency to work longer and harder, thereby in-creasing the probability of stress-related injuries.

Based on preliminary information, the following recommendations (10) may allow users of speech recognition systems to protect their voices:

  • Take frequent breaks.
  • Perform warm-up and cool-down voice exercises.
  • Limit the amount of time using the speech recognition system.
  • Drink plenty of water.
  • Avoid clearing the throat.

Also, clinical professionals who evaluate others for appropriate assistive technology should consider the recommendation of alternate methods of access in addition to speech recognition. Clinicians may want to avoid the recommendation of speech recognition systems for persons who have a previous history of vocal problems.

A number of questions have not yet been addressed, including the following:

  • Would initial training on a speech recognition system by a speech- language pathologist minimize problems?
  • Are there psychological or physiological factors that contribute to someone with RSI having a higher probability of incidence of voice problems?

Though extensive use of isolated-word recognition systems may lead to problems with one's voice, upcoming continuos-speech recognition dictation systems will hopefully ameliorate or even eliminate these problems, particularly if the systems are designed to recognize speech with natural inflection patterns.

References

1. Olsen, Florence, Government Computer News. Speech recognition software responds to users' demands. 10(13):18-20 June 24, 1991

2. Labriola, Don, Windows Sources. Straight Talk on Speech Recognition. 3(2):144-120, 1995

3. Tynan, Daniel, PC World. How to make your PC listen. 12(9):62-63, 1994

4. Caruthers, Frank, Computer Design. Speech recognition: the final frontier. 33(3):OEM7-OEM15, 1994

5. Furger, Roberta. PC World. Danger at your fingertips. 11(5):118-124, 1993

6. Littlejohn, G.E., Journal of Rheumatology. RSIÑAn Australian Experience. 13:1004-1006, 1986

7. Stone, W.E. Medical Journal of Australia. Repetitive Stress Injury. 2:616-618, 1983

8. Arnaut, Gordon, Globe and Mail [Toronto, Ontario]. Talking to computers has its hazards. Sept. 15:C4, 1995

9. Chao, Julie, Wall Street Journal. Talking to a PC May Be Hazard to Your Throat. Aug. 17:B1, 1995

10. Goodham, Mary, Globe and Mail [Toronto, Ontario]. High-tech RSI aid creates new problems. Sept. 14:A12, 1995 Acknowledgments The authors gratefully acknowledge the continued support of the Tennessee Division of Rehabilitation Services.

Dona Kambeyanda Rehabilitation Engineering Program University of Tennessee, Memphis 682 Court St. Memphis, TN 38163 901/448-6479

Potential Problems Associated with Use of Speech Recognition Products