音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

VOICE RECOGNITION: AN ASSISTIVE TECHNOLOGY WRITING TOOL FOR STUDENTS WITH PERSISTENT PHONOLOGICAL AWARENESS DEFICITS

Joe Reid
Roosevelt High School
456 S. Mathews St.
Los Angeles, CA 90033
(213) 268-7241
FAX: (213) 269-5473

Web Posted on: December 12, 1997


As computer hardware and software become more affordable and more powerful, they increase the possibilities for assisting individuals with disabilities. Voice activated systems can provide hands free control of the computer and its peripheries. One strategy in processing the user's voice is to break each word of the speech signal into its phonemes. These phonemes are then used to identify the word. When used as a writing tool such a system can provide assistance to persons with phonemic processing deficits. These individuals may be very verbal but unable to put their thoughts in writing.

The most prevalent manifestation of this deficit is in the area of processing sounds of words and knowing phoneme-grapheme correspondence rules. The individual might use "he saw" for "she was" or "ter" for a word that starts with "tre". Cambell and Butterworth (1985) conducted a case study of a very literate subject with a deficit in phonological processing. They reported that she could easily read aloud irregularly spelled words such a "placebo" and "idyll", but had great difficulty with simple nonwords like "bant". One area of weakness noted involved rhyme judgments. If the word looked like it rhymed and it did rhyme, she was correct every time. She also got 100% correct when the word looked like it did not rhyme and it didn't. If the word looked like it rhymed but it did not (lost/post) she had a score of 63%. Her score dropped to 11% when the word rhymed but appeared not to rhyme (true/shoe).

Another area of weakness for the subject involved an auditory acronym test. The subject was asked to take the first sound of each word in a phrase to form a word. For example, using the phrase "hold aching toes" the word would be "hate". If the subject used orthographic clues and took the first letter of each word, the word formed would be "hat". On 21 such test items this particular subject responded with the orthographic option on all 21 items.

On a spelling test this subject was able to score at an acceptable level when compared to college undergraduates. She scored higher than 20% of the controls. The test included words such as seizure, chlorophyll, questionnaire and personnel. The difference between her results and the controls' results was in the types of misspellings. 93% of the control misspellings could sound like the test word. Less than 60% of the subject's misspellings could be pronounced like the test word. Examining the errors of the 20% of control subjects who scored less than the subject, showed that 94% of their misspellings could be pronounced like the test word.

The authors postulated that the deficit in associating phonemes with letters may play a relatively more important role in writing than in reading. On the spelling test the subject's errors were more often phonemically implausible. This hypothesis is supported by Perin (1983) who found that phonological awareness is more closely tied to spelling that to reading ability. Higgins and Raskind (1995) noted evidence that more than 90% of adults with learning disabilities report significant problems with writing and /or spelling.

Phonological awareness deficits tend to stay with the students as adults and continue even when reading levels improve. Cambell and Butterworth concluded that the subject was lacking in all aspects of phonemic representation: segmentation, manipulation, awareness and assembly. This was in spite of the fact that she was very literate. Bruck (1992) concluded that adults with childhood histories of learning disabilities never approximate levels of performance on phonological awareness tasks that are appropriate for their age or reading level. One phonological awareness task used by Bruck in testing for a weakness involved asking the subject to delete a phoneme from a word and say what is left. For example, "small" would become "mall". For "there" it would be "ere" and not "here". Another phonemic task is counting the number of phonemes for the given word. Some words would have a digraph and thus have more letters than phonemes. For example, "chin" has four letters but three phonemes.

Neurological studies with phonological tasks are also showing measurement results independent of reading improvements. Flowers (1993) noted that regional cerebral blood flow in the angular gyrus area of the brain was independent of reading improvement from childhood to adult. Evidence was also noted showing that during phonological tasks normal subjects inhibited the angular gyrus while readers with disabilities activated it. Wood, F., Flowers, L., Buchsbaum, M. & Tallal, P. (1991) had subjects identify a CV target from among six possible CV units. For normal subjects higher task accuracy predicted lower cerebral blood flow at the left superior temporal site while subjects with learning disabilities showed a trend toward greater left temporal flow with better task performance. This relatively stable functional difference between normal readers and readers with learning disabilities may have a genetic base. Smith, S., Kimberling, W. & Pennington, B. (1991) have shown a possible gene configuration on chromosome six which may cause dyslexia. Additional evidence pointed to a possible factor on chromosome 15.

In conclusion phonological processing deficits continue even after improvements in reading. Furthermore, there is evidence of structural and functional neurological differences between persons with this disability and normal individuals. This permanent physical difference is potentially genetically based. Its most prevalent manifestation is in the area of processing sounds of words and knowing phoneme-grapheme correspondence rules. What is now available is a computer system that can do just that.

A voice recognition system takes the sounds the student makes as he or she is talking, matches these sounds with the sounds in the English language, and computes the word the student is saying. In processing what the student says, the acoustical sequence is divided by a small pause to indicate the break between words. An analysis is done using a sequence of transformations to obtain a maximum likelihood estimation. Part of this analysis is performed by the sound card which is hard-wired for voice recognition enabling more timely responses. The phonemes extracted from this analysis are compared to phoneme reference patterns obtained during training sessions with this user. The similarities between these reference phonemes and the input speech are calculated. A dictionary containing words represented as a sequence of phonemes is searched for the word that yields the maximum similarity.

The effectiveness of using a voice recognition system to assist in writing was tested by Higgins and Raskind (1995). The subjects were undergraduate students with learning disabilities who were asked to take a writing test similar to a proficiency exam required for graduation. The student essays using voice recognition were compared to their essays written without assistance. The scores on the essays written with the voice recognition system were significantly higher. They also noted that bigger words were used and longer essays were written. They also reported one advantage frequently mentioned by the students in the study was the freedom from the mental distraction of having to check spelling or thinking of an easier word to use. This mental energy could be used on content and organization.

Using a voice recognition system does not create a danger of becoming dependent on technology. Such systems will become common in everyday situations. Many commercial systems are being developed to use as alternatives to using the keyboard. Business correspondences will be dictated into a word processing system and then faxed to their destination. Experiments are being done to develop phone systems that can translate what is said into another language. These systems could also display what was said.

The need for this type of an assistive device comes from the fact that more than 90% of adults with learning disabilities report significant problems with writing and/or spelling. College students with learning disabilities use mental energy thinking of smaller words or checking spelling. This energy could be used for content or organization. Given the preponderance of evidence regarding the permanent nature of this disability more then remediation can be justified in helping these individuals overcome their disabilities.

The recognition system we are using requires a 486/33DX computer with 16MB of RAM. Other requirements include 30MB of disk space, a sound card and a VGA monitor. We are using it on a Pentium 120 processor. Words are processed quickly and we have never been disappointed in its speed. It can rapidly process four or five words spoken at normal speed and be ready for the first word of the next sentence.

Getting started requires having the system learn the student's voice. Because of the spelling difficulties previously mentioned, easy reading stories are used for the training. The training consists of the student reading each word of the story. If the correct word is displayed, the student continues reading. If the wrong word is displayed, the student can select from a list of five alternatives. If the correct word is not among that list, the student asks the computer to correct the word. The student begins typing the word until it appears among the alternatives. The computer learns from these corrections.

Voice recognition offers the unique possibility of being able to assist a group of individuals in the area of their greatest weakness. People who have difficulties processing the phonemes of words and making the phoneme-grapheme connections can use a system to accomplish this very task. Individuals with phoneme processing deficits can use their energy on the conceptual issue of what they want to say and how they want to organize it. With the drive for voice I/O in many commercial applications strong, technological advances should be relatively quick. Faster and more accurate systems that can process homophones should be available in the near future. With proper planning and preparation the elementary school students of today should experience much greater success in high school and beyond then do the high school students of today.


REFERENCES

Bruck, M. (1985). Word-Recognition Skills of Adults With Childhood Diagnoses of Dyslexia. Developmental Psychology, 26(3), 439-454.

Bruck, M. (1992). Persistence of Dyslexics' Phonological Awareness Deficits. Developmental Psychology, 28(5), 874-886.

Cambell, R. & Butterworth B. (1985). Phonological Dyslexia and Dysgraphia in a Highly Literate Subject: A Developmental Case with Associated Deficits of Phonemic Processing and Awareness. The Quarterly Journal of Experimental Psychology, 37A, 435-475.

Felton, R., Naylor, C. & Wood, F. (1990). Neuropsychological Profile of Adult Dyslexics. Brain and Language, 39, 485-497.

Flowers, L. (1993). Brain Basis for Dyslexia: A summary of Work in Progress, Journal of Learning Disabilities, 26(9), 575-582.

Higgins, E. & Raskind M. (1995). Compensatory Effectiveness of Speech Recognition on the Written Composition Performance of Postsecondary Students with Learning Disabilities, Learning Disability Quarterly, 18, 159-174.

Keller, E., (1994). Fundamentals of Speech Syntheses and speech Recognition. Chichester: John Wiley & Sons.

Perin D., (1983). Phonemic Segmentation and Spelling. British Journal of Psychology, 74, 129-144.

Ricco, C. & Hynd, G. (1996). Neuroanatomical and Neurophysiological Aspects of Dyslexia. Topics in Language Disorders, 16(2), 1-13.

Saito, S. & Nakata, K. (1985). Fundamentals of Speech Signal Processing. Tokyo: Academic Press.

Smith, S., Kimberling, W. & Pennington, B. (1991). Screening for Multiple Genes Influencing Dyslexia. Reading and Writing: An Interdisciplinary Journal, 3, 285-298.

Sokoloff, L., (Ed.), (1985). Brain Imaging and Brain Function. New York: Raven Press.

Wood. F., Flowers, L., Buchsbaum, M. & Tallal, P., (1991). Investigation of Abnormal Left Temporal Functioning in Dyslexia Through rCBF, Auditory Evoked Potentials, and Positron Emission Tomography. Reading and Writing: An Interdisciplinary Journal, 3, 379-393.