Web Posted on: August 4, 1998

Talkbok - A voice activated environmental controller

Thomas Davenport
Parmjit Chima
John Cotterill
Patrick Haywood
Sensory Aids Research Unit
University of Central England
Birmingham, B42 2SU, UK
Tel: 0121 331 6383
Fax 0121 331 6315
email: tom.davenport@uce.ac.uk

Key words: Voice recognition, speech pattern identification, environmental control, home automation.

Acknowledgements

The authors would like to acknowledge the assistance provided by the Royal Academy of Engineering, Crabtree Electrical Industries Limited, the staff of Ermin House at Gloucester Royal Hospital and the staff of Wheatridge House in Gloucester.

| Top |

1. Introduction

One of the many problems of being disabled is the amount of equipment you need to allow you to lead as normal a life as possible.

This project is about a voice activated environmental controller called Talkbox. Talkbox allows the user to say specified command words and have functions indicated by the command words to operate. For example, the user could say Television and the television would switch on.

| Top |

2. SARU

The sensory aids research unit (SARU) at the University of Central England was established in 1986 by Dr Peter Molloy. The unit was to carry out research into mobility for people with visual impairment. As other people joined the unit other types of project were undertaken. The use of speech recognition for environmental control was one such project.

In 1996 SARU received a grant of £6,000 from the Royal Academy of Engineering which allowed the authors to develop an existing prototype into a controller that could be installed in someone's home. An additional grant of £2,000 from Gloucester Royal Hospital allowed the team to build a second controller to be installed in a respite care room in Ermin House, at the hospital.

| Top |

3. Underlying Philosophy

The authors' target was to build a system that was so integrated into a user's living quarters that there would be little external evidence that any system was installed. Arthur C Clarke, in his book 2001, describes a computer called HAL that takes care of the environment of the astronauts on the space ship. The author's target was to devise a system that would empower the user by giving him/her a greater degree of control over his/her environment.

The original system consisted of a black box controller with no user feedback apart from a row of light emitting diodes (LEDs) on the front of the box and the fact that specific functions were operational. The first systems had eight programmable outputs and two fixed outputs which were used to call for assistance or for emergency help.

| Top |

4. Hardware

The system was based on a standard 80486 personal computer with the DOS operating system. The recognition software uses a speech card called an MACPA card and the interfacing is carried out by means of a 40 channel digital input/output card. The outputs from the I/O card are used to drive a range of devices. The hardware was built into a custom instrument case provided by Crabtree Electrical Industries Limited. On the outside of the instrument case was a row of ten LEDs showing power, system ready, and which functions were operational. There was also a reset button and a button to invoke the voice training routines.

The main lights in the user's room were triggered by the command Main Lights. This command sent a pulse to a bi-stable relay to change its state. If the lights were off Talkbox would switch them on. If the lights were on Talkbox would switch them off. The command Television triggered a solid state relay to switch the television on. In this way a system failure would not switch off the users lights but it would switch off consumer goods. A manual override was provided for all functions.

Infra-red output was provided in the second phase of the project to provide direct control of the user's television.

| Top |

5. Software

In the late 1980s Dragon Systems Inc. launched a dictation program for DOS. The program, called DragonDictate, worked by loading a real mode program that would terminate but stay resident in computer memory. This type of program is known as a TSR. Dragon later released a software toolbox called DragTools which was used in this development. The TSR responds to interrupts issued by the speech card by issuing a protected mode interrupt (see above). The recognition program, which is resident in memory above 1Mbyte, receives the interrupt and starts its operation. The recognition program suspends the currently running real mode program, carries out the recognition and then posts the results back to real mode. The results are in the form of the ten closest matches with confidence figures for each match.

If the confidence figure for the best match is greater than 95% and the match is one of the system's outputs then the command for that output is poked into the keyboard buffer, the protected mode program restores the registers of the real mode program and terminates. When the real mode program is restored, it reads the keyboard buffer and takes the appropriate action.

All the key variables for Talkbox are held in an external setup file which the user may change.

| Top |

6. Development of Installed Systems

The ideal voice for a speech recognition system is one with good annunciation and good repeatability.

The first controller was built for Bill, a young man who had received severe brain damage when he was hit by a car that had mounted the pavement. As a result of the accident Bill now needs 24 hour care.

Bill's speech is difficult to recognise and Bill's memory can be variable. This makes the control strategy difficult. When a voice utterance is passed to DragTools, DragTools returns the ten best matches with confidence figures for each match.

A high confidence figure is needed to prevent the system acting on incorrect recognitions. A low confidence figure is needed to allow for large variability in a user's speech patterns. Bill's system needed simultaneous high and low confidence values.

As Bill's system was only to be used by Bill it was possible to train the vocabulary over an extended period. The control strategy that was adopted was to have a single word available for the first period of use with a minimum confidence level set at 60%. When Bill was satisfied he could control the single command he could ask for additional words to be added.

The controller built for Ermin House was designed to go into a respite care room. The patients would typically book into the room for a one week period. The users would usually have normal speech and some form physical disability. The system had to be easy to use and quick to train.

The control strategy was to have a fixed menu of ten items. A new user would train the words by simply repeating the word displayed on a computer monitor. This process would take a couple of minutes and could be repeated as often as necessary. The first training session would normally take two attempts because people tend to talk like robots when they first use the system.

| Top |

Discussion

The original concept was to build a voice activated environmental controller that would benefit any user. Before the two prototypes were installed the software control algorithms were different and differences were made to the hardware within weeks of installation. Once a user had understood the concept and had trained the voice files the system worked very well when members of the SARU team were present. When users were left to themselves, however, their use of the system was limited and they stopped using it as soon as any problem was encountered.

The team felt that this reluctance was due to the lack of feedback provided by the system. Consequently, the computer monitor that was used for the voice training was left on the system permanently.

It was found that new users would adopt an abnormal voice if they carried out the voice training with someone present. The meant that if the system was used when the user was alone the recognition rate was poor. To try to get around this users were shown how to invoke the voice training routine. One user commented that he felt `daft' talking to his room.

Finally, microphones gave several large problems. Dragon provide a high quality, head-mounted, noise limiting microphone. It was mounted 15 - 20 mm from the users mouth and gave an excellent signal to noise ratio (SN ratio). Users, however, did not like wearing the microphone for extended periods. Wall mounted and desk mounted microphones were tried but the SN ratio was significantly worse (x100) than the head mounted microphone. The team tried a range of pre-amplifiers and band pass filters but the SN ratio was still a factor of 10 worse than the head mounted microphone. The team visited a BBC television studio to see how television sound engineers managed. There it was found that between three and six people were employed to monitor and adjust the sound recording. One of the sound engineers has been helping the team and a better wall mounted microphone is under development.

| Top |

Future Work

SARU's plans, when the resources become available, are to incorporate artificial intelligence and speech output into the system. This will allow us to use an interactive interface and a lower minimum confidence value. For example, with one confidence band set between 60% and 90%, the controller would be able to issue the question `did you say main light?'. There are probably only four appropriate responses to this question and so the recognition rate will be very high.

In a domestic environment the system could easily be interfaced to facilities like a videophone or intercom, burglar alarm or security system. In this type of environment microphones become a bigger issue. Either the microphone stays with the user, in which case the user will have to carry a battery and a transmitter, or a switching system would need to be devised to allow the user free movement from room to room. Both strategies need to be evaluated.

| Top |

Go to the top of this page. | Go to the upper category.