TIDE-ENABL - The First Year

C. Bickley R. Carlson
P. Cudd
S. Hunnicutt
B. Reimers
Concentra, KTH (SE), Univ. Sheffield (GB), KTH, Enter rehabilitering (SE)
Concentra Ltd., Viscount Centre II, Milburn Hill Rd., Coventry, CV4 7HS, U.K.
tel: +44 1203 692323
fax: +44 1203 419875
email: bickconctr@aol.com

1. Summary

The objective of the ENABL project is to develop a speech user interface, using speech recognition, to computer-based generative modeling software for engineering design, analysis and configuration tasks. This project contributes to the goal of TIDE by providing tools that can be used by engineers who are motorically disabled to the extent that they cannot use their hands. The central technical problem being addressed is the implementation of an accessible user interface for rule-based engineering design by speech. Two Demonstrators will be constructed during the project for the purpose of evaluation of vocational use of the complete system. The evaluation will include measures of the use of the speech recogniser and the effectiveness of a speech user interface to a rule-based engineering system. Because many persons with motoric disabilities also have dysarthria, a motor speech impairment due to neurological involvement, the project personnel are also testing a number of persons with dysarthric speech using a standard Swedish test, recording their voices, and analysing the results acoustically. The speech recogniser will be trained with these voices. There is also a voice monitoring and care component of the project which will ensure that users exercise appropriate voice care. This component is included in response to reports in the literature of adverse effects on user voice with extensive use of speech recognition technology. This report is organised along the main themes of the ENABL project. First, the progress made during the first year on the development of the speech user interface and the speech recogniser will be described. Then, the components of the project involved with dysarthric speech and healthy use of voice will be presented. And finally the Demonstrators and plans for evaluation will be described.

2. The speech user interface

A major issue in ENABL is the development of a speech user interface for vocational software for engineers. Development of the speech user interface for any software program that traditionally is accessed by keyboard/mouse (as is common in engineering software) is how to use speech instead of keyboard/mouse. That is, a method had to be developed for writing rules and for manipulating graphical objects by speech only. Rule-based models have traditionally been defined by writing a computer program, and these models have been used by a combination of keyboard and mouse commands. We are developing a method of manipulating rule-based models of The ICAD System by using speech to define parts and to create drawings. The problems that we have solved so far include: how to specify the size and dimensions of a part, how to position a part, how to add any number of subparts to an assembly, how to add an attribute and to specify its rule, and how to view a geometric representation of any part -- using speech only. The solution to each of these problems is general and applicable to using speech for rule-based design of any kind of engineering application; the solutions are not specific to the two Demonstrators. The speech user interface allows a user to size and dimension a part by speaking the name of the part and then the name of a size attribute and its value. Users can position a part by speech within a coordinate system. The speech command to add a part, such as a plate, to a model is "Add a plate part to the tricycle model". The speech commands for adding a new attribute and rule are "Add an attribute named cost to the handlebar part." and "The rule for cost is the product of handlebar weight times 0.05.". Viewing a geometric representation of any part using speech only is part of the more general problem of how to control a Graphical User Interface (GUI) using speech only. To specify a command, the user can simply speak the name of the command. Interface commands in a traditional GUI that depend on mouse clicks (such as Mouse-Right on a specific node) are either menu-specific or involve pointing. A speech-only method for menu selection is to speak the name of a menu window and then to speak the command in the list. In order to specify the screen environment unambiguously, some commands require that a phrase be spoken, such as "Graphics Editview" or "Object Editview". The words "Graphics" and "Object" clarify to the interface which Mouse-Right (Editview) command is needed. Other aspects of control of a GUI will be addressed in the second year, such as a speech-only method to specify how much to zoom the view of an object and where to split the screen.

3. Developments of the speech recogniser

The first system integrating many of the modules of the speech user interface has been built. This system includes the continuous speech recogniser of KTH (Sweden), a speech detection module, and a grammatical-phrase parser. The recogniser is the part of the system that analyses the speech waveform and generates a list of strings of words that are likely to have been spoken. Speech detection is the module that makes it possible to reduce the use of special keystrokes or mouse pointing by having the speech recogniser automatically detect when someone is speaking. The current implementation gives the user a free choice to use either method depending on needs. The parser is a feature of the system that is particularly important for the development of a speech user interface to a rule-based design system. The parser analyses the string of words generated by the recogniser with respect to the syntax of the language and associates semantic tags with the strings. These tags are then checked for sensibility in the current context of the user interface, and a best-choice string of tags is used to generate the response of the ICAD system. The software architecture has been an important part of the development of the speech user interface. The architecture integrates all modules into a Java-based system. Modules can easily be replaced in this structure even while the system is in use.

4. Analysis of accommodations needed for dysarthric speech

Subjects with a variety of speech disorders will be recorded to assess the viability of the continuous speech recognition for potential users. It was determined that samples of mild and moderate degrees of dysarthria will be used for the training of the speech recogniser. Samples of different types and degrees would improve the speech recogniser's capacity to recognize speech with a wide range of dysarthric symptoms (e.g., pitch breaks, slow rate). The design of the dysarthria testing has been planned. Plans for analysing the results of the intelligibility test have been outlined as well as plans for quantifying certain important aspects of the dysarthric speech. The specific portions of the dysarthria test to be administered have been selected and a trial administration of the test conducted. Four men and three women have been given the dysarthria test and been recorded. The first test person has ataxic dysarthria of a moderate degree after a stroke; the second, ataxic dysarthria of a mild degree; and the third, MS with a very mild dysarthria. Another two have dysarthria due to stroke, and one has spastic dysarthria due to cerebral palsy.

5. Determination of factors important for users of a speech recogniser

An investigation has been made among a small number of known users of voice recognition software to examine the need to hold a meeting for users of this software on a regular (annual) basis. The vast majority of the people that replied showed a definite interest. The suggestions that were made by them included health and safety issues, and product demonstrations and comparisons. Two users' voice samples have been recorded for a baseline. A voice-use questionnaire was also administered. Weekly voice checkups with the users have been carried out and notations made concerning the users' voices. After the first symptoms of vocal fatigue (sore throat, coughing) were reported by users, voice care instructions were sent to the users. Researchers at the University of Sheffield have video-recorded pilot sessions of ASR users. Not only does the recording serve as a good demonstration of the ENABL project, but more importantly, it was a good way to validate and trial the complete experimental protocol. As expected, the video recording illuminated several points. Among them was a revalidation of the basic outline for the protocol, which successively consists of recording a 10-minute natural conversation with the subject, the subject reading aloud the Rainbow passage, the recording of sustained vowels and consonants in a vowel-consonant-vowel context, introduction to speech recognition software, a 10-minute silent break and a dictation task of two-hour duration, at which point the same sequence of speech recordings is made as at the beginning. Drinking water is made available to the subject throughout the session. The subject is told they may stop at any time if they are distressed, but that they may, if they wish, continue after the two hours. The above recordings will be a ariety of disabilities (both are part of the Samhall group of companies that specialize in providing vocational opportunities for persons with disabilities). The Samhall Brahe company will be the site of Demonstrator 1 of ENABL. The company specializes in the manufacture of, among other products, heavy-duty wheeled toys for day-care centres. The first person selected as a user of an ENABL Demonstrator had contacted ENABL staff through his disability office after reading a notice about ENABL in a journal for mobility-disabled persons. He has an engineering background, cannot use his fingers, and was unemployed at the time he saw the notice. The other company that will participate in ENABL is Samhall-SAFAC. This company will be the site of Demonstrator 2. This company specializes in the design and manufacture of cardboard boxes. One of their employees has been designing boxes for a few years using a simple CAD program and a specially-adapted pointing/scanning device that he operates by breath control. He cannot use his hands. With the speech user interface he will be able to be much more productive and comfortable, with less stress on his neck muscles than he currently has. Demonstrator 1 is the design of wheeled toys by speech. Demonstrator 2 is the design by speech of cardboard pieces that can be automatically folded into boxes of various shapes and sizes. The progress made on Demonstrator 1 includes specification of generic parts that are appropriate for the task of tricycle design, and creation of generative models for the plate and tube parts. The Demonstrator will include the capability to create by speech various kinds of parts that are manufactured at the Samhall Brahe site and to select and position catalogue parts that are needed. Demonstrator 2 will provide the functionality to create 2D patterns by speech and will incorporate rules governing the box patterns. Later versions will explore ways to add new rules to the models by speech only. A set of phrases commonly used in engineering design were created and translated into Swedish. Alternative translations were provided in order to satisfy the speech recogniser demands on the one hand, and time-and voice-saving demands on the other. A lexicon was created from the design phases and supplemented with numbers, function words, frequently used inflections, unit of measurement, algebraic and trigonometric expressions, etc. A number grammar was also developed including grammatical tagging for the numbers in the lexicon. The lexicon has been linked with the speech recognition component. Training materials for the users of the Demonstrators are being developed. The training of one user has begun and consists of training in the use of speech recognition as well as generative design.

7. Evaluation of Demonstrators

Initial evaluation plans for monitoring and documenting the success of use of the Demonstrators have been made. It is planned that a user who is currently employed will be provided first with a version of Demonstrator 2 that mimics the functionality of the system which he now uses by mouth stick and scanning device. Thus, a baseline measurement of his performance by speech instead of by mouth stick pointing can be made. After these comparisons, his system will be augmented with rules for his vocational task, and increases in productivity will be measured.

8. Conclusion

Implementation of a first version of one Demonstrator system including the user interface architecture, speech recogniser, and the engineering software was accomplished during the first year of ENABL. Work is underway on analysis of users' voices and dysarthric speech characteristics.

Go to the top of this page. | Go to the upper category.