音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

THE KEYBOARD CHANNEL AS AN INVISIBLE COMMAND PATH FOR ALTERNATE INPUT DEVICES

Tom Nantais1, Gerardo Kobeh2 and Fraser Shein3 1Lyndhurst Hospital, Toronto, ON, Canada 2Biomedical Engineering, Universidad Iberoamericana, Mexico City, Mexico 3Hugh MacMillan Rehabilitation Centre, Toronto, ON, Canada

ABSTRACT

A prototype software utility is presented which interprets special "codestrings" in Microsoft Windows 3.1 to control low-level system functions through existing alternate keystroke input devices. These codestrings are invisible to other applications running on the system and to the user as well. The particular application of this concept to computer-based environmental control is highlighted.

STATEMENT OF THE PROBLEM

A common difficulty with computer-based assistive devices is trying to get them to work together. In a hypothetical computer-access/environmental-control scenario, an individual with high-level quadriplegia has purchased a voice recognition system for computer access and a computer-integrated environmental control system (ECS) for controlling appliances. The ECS is actually a peripheral device which receives its commands from special-purpose software running on the computer. The voice recognition system is able to learn user-specific words and phrases. It is reasonable to expect that the user should be able to define macros so that common environmental control actions (e.g., turning on a reading lamp) can be accomplished with a single spoken command. Because the voice recognition system and the ECS were designed independently from each other, however, there is no direct way to control the ECS except through its software interface. To send a command to an appliance while working in a word processor, the user must switch over to the ECS control program, issue the command, and switch back to the word processor to continue working. This need to move back and forth between applications is just an inconvenience; in no way does it prevent the user from controlling appliances. However, when the person is in bed and the computer is across the room, the system becomes much less practical for obvious reasons. Even though there is a reliable wireless link to carry the user's voice signal to the computer, there is no way to verify that the ECS control program is the one that will receive the keystrokes from the voice commands. If the user forgets to make the ECS program active before going to bed, his commands could just as easily be going into his word processor, and without looking at the screen it is impossible to know. There are many simple back-up systems which could make this situation workable. The fact remains though, that the user has purchased a high-end system for communicating and controlling his surroundings, and as configured, it easily becomes unusable. The problem is not design flaws in the voice recognition system and the ECS, rather there is no reliable way to communicate the user's commands from one system to the other.

BACKGROUND AND RATIONALE

The idea for the software utility described here came from reports of successful combination of pointing devices with voice recognition systems [1,2]. In [1], Burnett et al. combined a headpointing device with a voice recognition system in a drafting application. To accomplish direct manipulation actions (clicking, double-clicking, dragging, etc.), the user would move the cursor to an object on the screen and say the name of the desired button pattern. The head-pointer provided the cursor positioning and the voice recognition system emulated the mouse buttons. One question arising from this report is, "What if the voice recognition system's designers hadn't thought to include mouse button emulation? Could these two devices still be combined in a useful way?" Most, if not all, voice recognition systems allow the user to define new text strings and associated spoken commands. Assuming that the low-level mouse emulation functions (e.g., double-click) are available from elsewhere, a second application could be designed to bridge the gap between the voice recognition system and the desired mouse functions. The first step would be to program the voice recognition system to inject a special codestring for each low-level mouse function. The codestring would begin with a rarely used special character such as "~", and would have all of the information necessary to call the corresponding mouse function. This "bridge" application would run in the background, monitoring the operating system's keyboard input queue. Upon detecting a "~", this program would start diverting keystrokes to itself. The program would continue blocking keystrokes until either it receives a valid, complete codestring, or it determines that the keystrokes coming in are not part of a codestring. In the first case, the target application (e.g., the word processor) would never see the keystrokes associated with the codestring. The bridge application would simply call the requested low-level function. In the second case, all of the keystrokes that had been swallowed would be re-injected into the operating system so that they arrive in the target application as normal. Because the bridge application is monitoring keystrokes at the operating system level, there is no need for it to be the active application when a codestring is injected. The user could be working in a word processor and issue a codestring-based command without leaving the word processor. The codestring is "invisible" to the word processor, and the command is carried out in the background. Returning to the environmental control scenario, the user would be able to control appliances through his voice recognition system regardless of which application is currently the active one. This scheme would not provide much additional benefit if all of the low-level functions needed to be designed into the bridge application once and for all; the bridge would just be providing static enhancements to the voice recognition system. (It should be just as easy to lobby the voice recognition designers for new features as the bridge application designers.) One of the major strengths of some popular operating systems is their built-in ability to link in low-level functions dynamically from a library on the disk. These "dynamic link libraries" (DLLs) permit applications to call functions that did not even exist when the application was written. As long as the application can find out the location of the DLL on disk and the name of a function within the DLL, that function can be loaded from disk and executed at run-time. This facility enables the bridge application to control any present or future hardware or software which has an appropriate DLL. The fact that this bridge application could detect any codestring in the keyboard channel, regardless of its source, means that any alternate keystroke input device with macro capabilities could take on new control functions dynamically. A user could, for example, use an abbreviation-expansion program to define abbreviations that expand into valid codestrings. Suppose that the abbreviation "lo" expands into a codestring whose underlying DLL function turns on the desklamp. The user could be typing in the word processor, type the abbreviation, enter a space, and have the codestring injected. The codestring would never appear in the word processor because the bridge application would intercept it and call the DLL function to turn on the light. From the user's perspective, typing "lo" in the word processor turns on the light, and they never need to leave the word processor to do it. This application is similar to a system proposed by Marsden and McGillis [3] which interprets patterns of switch activation into arbitrary low-level function calls within the operating system. They give an example of how the "escape" function might be triggered by closing a particular switch three times in two seconds. The system described here draws on this concept of mapping arbitrary input triggers into underlying operating system calls. We chose to implement the system through the keyboard channel because of the widespread acceptance of transparent keyboard emulation: replacing the keyboard with more accessible devices that inject keystrokes in such a way that the computer cannot tell that they did not come from the keyboard [4,5]. There is such a variety of these transparent keyboard emulators on the market that the user would be able to accomplish the same low-level function calls through any number of input devices. The ability to offer more than one way to accomplish the same action in the interface presents at least theoretical opportunities to increase the ease and efficiency of access [6]. The keyboard channel is a natural backbone for such a multimodal input system.

DESIGN

The codestring concept was tested with a prototype running under Microsoft Windows 3.1. An environmental control system was attached to the computer so that it could control a range of common appliances such as the telephone, television, VCR, and a variety of simple on/off devices like lamps. This ECS accepts commands through its serial port and it understands a protocol proposed by Hensch and Adams [7], similar to the one used for controlling modems. As an example, the command for turning on the television is AT@IRR=POWER<CR> Instead of building a general DLL function call utility, we focused on the DLL responsible for sending commands to the ECS. This simplified prototype still had the same basic function described above: detect a valid codestring and call the corresponding DLL function. However, the only DLL function it could call was the one for communicating through the serial port. The codestring format consisted of three parts. The first part, a prefix, contained an unusual combination of three special characters (~!@). The next part was the string to send out the serial port, and a two character suffix (*%) terminated the codestring. The prefix served as a marker to begin the process of diverting keystrokes away from the target application and similarly, the suffix served to end this process. The codestring detector was set up so that whenever a tilde (~) is received, subsequent keystrokes are immediately blocked from going to their usual destination. This presented an obvious danger that the system could become unresponsive to keyboard input, even when the user is not giving a codestring command. To reduce this risk, we decided that codestrings should only be sent as macro strings which meant that keystrokes would have to be coming in at a high rate for the system to continue blocking them. Any pause in keyboard input greater than 0.5 seconds would send the system back to its normal state (after the blocked keystrokes had been re-sent). If the user entered a tilde by itself, they might notice a slight delay before it appeared in the target application, but the system would not confuse it with the beginning of a codestring. Also, any unexpected characters in the prefix would immediately end the blocking process and any keystrokes that had been trapped would be re-sent.

EVALUATION

The prototype was installed in an apartment at our hospital which is used for occupational therapy assessments. The apartment is equipped with a variety of devices that can be controlled with an ECS, including X10 modules and infra-red receivers. The codestring detector application was configured as described above, and it was successfully tested with a number of alternate input devices and software packages. For example, we were able to use the WiViK on-screen keyboard without any changes to the software to gain environmental control functionality. Abbreviations were simply added which would turn on and off a lamp, a fan, and the television. Similar results were obtained with a commercial voice recognition system.

DISCUSSION

This codestring interpreter prototype demonstrates a new way of using existing keyboard emulation devices to achieve greater control over functions in the computer's operating system. It was possible to design this application because the Windows operating system supports the monitoring and swallowing of keystrokes. As well, Windows' heavy use of dynamic link libraries means that a codestring facility can be applied to devices and system services that were not yet designed when the codestring application was written. The prototype designed here is only useful for sending strings through a serial port. A more general utility would be able to make arbitrary DLL calls so that any number of low-level system functions could be controlled. Environmental control is only one application. Other possibilities include mouse button emulation and mouse macros (e.g., press the closest on-screen button), application-specific control (e.g., copying files), and telecommunications (e.g., sending an e-mail message). Anything with a DLL interface could become an extension of the user's access system. This idea is certainly not without limitations though. The primary usability problem is configuration. Before any new DLL function can be accessed, the user must configure the access system with an unintuitive codestring representing the desired DLL function call. Furthermore, the inner section of the codestring would need to come from the DLL's documentation. We would propose a scheme to embed the required configuration information directly into the DLL. The codestring utility would access this embedded information and present the user with a graphical icon-based representation of what the chosen DLL can do. The user might indicate the desired action by arranging the relevant icons together. For example, the DLL for controlling the ECS might contain a TV icon and a power switch icon, among others. Putting these two together on the screen would result in automatic generation of the codestring required to turn the TV on.

ACKNOWLEDGMENTS

This project was supported by The Lyndhurst Hospital Foundation. We would like to thank IBM Canada Ltd. for donation of the equipment used in the project.

REFERENCES

1. Burnett, J.K., Klabunde, C.R., & Britell, C.W. (1991). Voice and head pointer operated electronics computer assisted design workstations for individuals with severe upper extremity impairments. In Proceedings of the RESNA 14th Annual Conference (pp. 48-49). Washington, D.C.: RESNA Press.

2. Schmandt, C., Ackerman, M. S., & Hindus, D. (1990). Augmenting a window system with speech input. IEEE Computer, August, 50-56.

3. Marsden, R. J., & McGillis, G. J. (1991). An alternative approach to computer access: the "ADAM" interface for the Macintosh. Proceedings of the RESNA 14th Annual Conference (pp. 168-170). Washington, D.C: RESNA Press.

4. Schauer, J., Novak, J., Lee, C. C., Vanderheiden, G., & Kelso, D. P. (1990). Transparent access interface for apple and IBM computers: the T-TAM. Proceedings of the RESNA 13th Annual Conference (pp. 255-256). Washington, D.C.: RESNA Press.

5. Scott, N. G. (1991). The universal access system and disabled computer users. Proceedings of the RESNA 14th Annual Conference (pp. 64-66). Washington, D.C.: RESNA Press.

6. Shein, F., Brownlow, N., Treviranus, J., & Parnes, P. (1990). Climbing out of the rut: the future of interface technology. Visions Conference: Augmentative and Alternative Communication in the Next Decade, (pp. 37-40), AI du Pont Institite, University of Delaware, Wilmington Delaware.

7. Hensch, M., & Adams, K. (1995). Proposal for an ECU standard protocol. Proceedings of the RESNA '95 Annual Conference (pp. 437-439). Washington, D.C.: RESNA Press. Tom Nantais Research Department Lyndhurst Hospital 520 Sutherland Dr. Toronto, Ontario, Canada M4G 3V9

voice: (416) 422-5551 x. 3049 fax: (416) 422-5216 E-mail: 74267.2252@compuserve.com