Web Posted on: March 3,1998
AAC USING A REDUCED KEYBOARD
Cliff Kushler
              Tegic Communications
              2001 Western Ave.
              Suite 250 Seattle
              WA 98121
              Voice/Message: (206) 343-7001 ext. 108
              Fax: (206) 343-7004
              E-mail: ckushler@tegic.com
Non-speaking individuals with severe motor impairments generally require some kind of AAC system to meet their communication needs. AAC systems usually employ some kind of selection matrix to provide access to a symbol set containing a number of elements. For spelling-based systems in English, the number of symbols is at least twenty-seven (when a space is included), and more when the system includes options to select from a number of lexical predictions. For icon-based systems, the number of symbols often reaches a hundred or more. Systems employing a "dynamic screen" approach may require the operator to navigate between different "pages" of screen displays containing many hundreds of symbols. In each of these systems, communication is achieved by selecting a sequence of one or more elements from a selection set. In response to these selections, the system retrieves or generates units of meaning (e.g., from a vocabulary of words, phrases, or sentences) which are assembled by the user into the intended message. Several fundamental trade-offs are involved in the design choices underlying these various approaches. For a given vocabulary size, there is an inverse relationship between the size of the selection set and the average number of selections required to uniquely identify a vocabulary item. Thus, spelling a word requires selecting an average of about five or six symbols (letters) from a set of twenty-seven choices, while an icon-based approach may only require selecting two or three symbols (icons), but from a set of over one hundred.
Many strategies have evolved for helping people who have limited motor skills cope with a large number of symbols or selection items. Large-scale keyboards, keyguards, electronic scanning devices, voice activation, headpointing sensors, and other methods can be used to access large symbol sets. While these strategies do increase the efficiency with which individual elements of a large selection set can be accessed, a larger selection set nonetheless requires correspondingly more time or effort to select a given item.
An alternative approach is to reduce the number of selection items by allowing each cell or key to contain more than one symbol. Input selections by the user are now ambiguous (since each key has several meanings), but this ambiguity can be resolved manually by the user, or automatically by the communication device itself. Research on automatic disambiguation of text input has focused on two strategies, letter-by-letter disambiguation and word-level disambiguation (see Arnott & Javed, 1992, for a review of this research). In the letter-by-letter approach, the system tries to disambiguate each key or cell as it is selected. Statistical analysis of "n-grams" (groups of n letters as they occur in sequence in words) is usually the predictive basis for these systems. One advantage of this approach is that the number of n-grams is relatively small, so storage/memory requirements are also small. A disadvantage of letter-by-letter disambiguation is that the user's attention is required as each key is selected. In word-level disambiguation, user input is interpreted as complete words. The predictive basis for a word-level system is a database of words. To be effective, this approach requires that all possible words be present in the database; so storage requirements are larger than for the letter-by-letter approach. On the other hand, with word-level disambiguation the cognitive load is reduced because the user's attention is only required at word boundaries.
Tegic Communications ("Tegic") has developed a new technique for text input commercially known as T9(TM), which enables efficient generation of any desired text using a reduced keyboard having only a small number of keys. Multiple letters are assigned to each key, so that the specific letter intended by a single keystroke is ambiguous. The system is based on a process known as "word-level disambiguation," where the system compares a sequence of keystrokes to words in a large database to determine the intended word. The proprietary T9 technology includes significant improvements over previous attempts to implement disambiguation approaches. With both domestic and international patents pending, Tegic has successfully licensed this technology in several consumer markets where there is a need for text input on small, hand-held devices. Applications include pocket organizers, "smart" cellular phones and wireless email devices, two-way pagers, and remote controls for TV-based Internet access. However, the most compelling application for this technology is in the field of augmentative and alternative communication (AAC). No other approach to text generation provides the ability to generate unrestricted text from such a small number of keys, requiring only one key selection per letter (or less), and without requiring the user to learn any encoding sequences other than the normal spelling of a word.
The trademark T9 stands for "typing with 9 keys." When adapted for AAC applications, the actual number of keys is reduced to eight because of the ease with which eight keys can be mapped to a variety of selection techniques. The way the basic T9 technology works is that 3 or 4 letters are printed on each of seven keys, and an eighth key is used as an unambiguous "Space" key. Each keystroke designating a letter is therefore ambiguous. The letters can be assigned to the seven keys in alphabetical order, but significantly greater efficiency in disambiguating key sequences can be achieved by assigning the letters in the following groups: THKP, MEG, ISYV, CLOJ, ADFX, QUNW, and BRZ.
Each keystroke sequence is processed with a complete database containing the spelling of a huge lexicon of words. The database is large enough that it contains virtually all of the words that a user might enter, including proper names and geographical terms (cities, countries, etc.). Any words not included (such as the user's last name) are automatically added to the database when first typed by the user using an alternate unambiguous spelling method. Words that match the sequence of keystrokes are presented to the user in a list on the display. The words are presented in order of decreasing frequency of use so that the most frequently occurring word is presented first in the list. After typing a word, the user simply activates the "Space" key, or perhaps more accurately the "Select" key. This automatically selects the first word (the most frequently used word) and enters a space. The user then begins typing the next word. Occasionally (approximately once in thirty to forty words), the desired word will be the second or third most frequently used word matching the key sequence entered. In such cases, the user presses the "Select" key one or two more times to select the desired word before beginning to type the next word. On a touchscreen application, the user may also directly touch the desired word to select it. Thus for the vast majority of text entered, the user simply types, hitting the keys containing the desired letters (one keystroke per character), and hitting the "Select" key at the end of each word just as one would type a space on a standard "QWERTY" keyboard.
Many people using augmentative and alternative communication (AAC) would benefit from greatly reducing the number of keys necessary to directly spell out words. When adapted for use in an AAC application to use as few as 8 keys with high efficiency --generating over 1 letter per keystroke-- the T9 system makes minimal cognitive demands. No codes or patterns need to be memorized other than the spelling of the words to be typed. This approach does not require constant attention to the display, allowing the person to concentrate on what he or she wishes to write.
Prior research on the application of disambiguation to augmentative communication applications has focused on using the letter-by-letter approach, sometimes in combination with a word-level approach based on a small database of common words. This decision to reject a word-level approach was based on the cost of memory at the time, and the belief that systems could not be economically built that would support a complete, or even a large database. This in turn would limit the expression of the user to the set of words included in the database, which is considered to be unacceptable. Although costs of memory have decreased dramatically since the early research was performed, the word-level approach was still rejected as recently as 1992 (Arnott & Javed, 1992). Memory costs are no longer a significant barrier to implementing a system with a very complete database (130,000 to 160,000 words), and part of the technology developed for the T9 system includes a method for highly efficient compression of the database to further contain these costs. In addition, the T9 approach includes important refinements to the user interface that enable efficiencies of up to 1.0051 keystrokes per letter using a system with only eight keys. Early systems generally evolved from technology developed for the telephone keypad, and thus focused on designs requiring 10-12 keys. This was also an impediment to gaining the maximum benefit from this approach in assistive technology applications.
The reduction in the number of selection cells can have important benefits regardless of the selection technique being utilized. For individuals who already use a direct selection technique based on some kind of keyboard, using fewer cells allows larger individual keys or a smaller array of keys which reduce the range of motion required. Either approach can result in faster and/or easier selections. For users of scanning techniques, fewer cells reduces the average time required to scan to a cell. Single-switch scanning can be used with a simple linear scan--thus requiring only one switch activation per key selection--to generate text with an average of only 3.39 scan steps per letter. This number can be further reduced by using word completion and predictive scanning strategies. Predictive scanning alters the scan such that keys are skipped over when they do not correspond to any letters which occur in the database at the current point in the input key sequence. This alone reduces the average number of scan steps per letter to approximately 2.93 scan steps per letter.
In some cases a keyboard having fewer keys may allow direct selection by individuals who previously required a scanning technique. Pointing techniques can use larger targets that can be selected faster and with less effort. Similarly, voice-activated systems are more robust when fewer sounds must be recognized. A Morse code approach can be designed with fewer (and thus shorter) encoding sequences. When the size of the selection set can be reduced to as few as eight selection cells, additional significant benefits can be gained. Eight cells can be very naturally mapped to single movements in two dimensions (i.e. the eight compass points), so that simple control movements such as a single joystick motion or a simple head gesture can immediately select a cell. This is much more efficient than using such an interface to control a secondary process (such as directed scanning) through which a cell is ultimately selected. A simple, robust eight-position eyegaze sensor has been developed that can also be used.
Another important factor to consider in selecting an appropriate AAC system is the memory load involved in learning the encoding of the vocabulary represented in the system. When symbols are used to encode words, phrases, or sentences, the system operator must learn the associations between symbols (or sequences of symbols) and the words or phrases that will be generated when the symbols are selected. This can require the system operator to go through an often steep and lengthy learning curve before a system can used efficiently or effectively. In the case of acquired and/or progressive disorders, this can result in an unacceptably long delay before effective communication can be achieved. In the case of a system based on word-level disambiguation used by an individual with adequate literacy skills, the association between any desired word and the selection sequence required to generate it is based simply on the spelling of the word. Thus, the "encoding" for each word is already known to the system operator, and no learning or memorization of codes is required. The system can be used with great efficiency after only a very short period. This can be of particular importance in the case of degenerative disorders with a prognosis for a relatively rapid progression.
The basic spelling strategy is the same for all selection techniques. The user spells by selecting the key containing the desired letter. At the end of each word, the user activates the "Select" key to end the word. If more than one word matches the sequence of keys entered, a list of words is presented with the most common word appearing as the default choice at the top of the list. The default choice will be the desired word 97% of the time, and can be accepted simply by continuing to type, requiring no additional keystrokes. Repeatedly pressing the Select key, however, allows the user to select later words on the list or to perform other special tasks as required (punctuation, commands, etc.). Text may be output to a word processor, speech synthesizer, etc.
T9(tm) technology, through careful allocation of letters, and attention to the user interface, allows the user to concentrate on communicating, only occasionally requiring attention. While words may be added, the large (132,000 words) built-in database and fixed (but user-modifiable) word assignments means users need only pay occasional attention to the system. Using this approach requires spelling skills, but no memorization is needed; clients with reduced capacity for memorization or other cognitive processing nonetheless will have access to highly efficient communication. The simple underlying model of T9 has a quick learning curve, reducing frustration and other barriers for older individuals (e.g. those with ALS).
Tegic has received a Phase I Small Business Innovation Research Grant (1 R43 RR13191-01) from the National Center for Research Resources of the National Institutes of Health to investigate the potential of T9 as an AAC text generation technique by implementing it on an existing AAC platform (this project is solely the responsibility of the author and does not necessarily represent the official views of NCRR). The hardware platform selected for this project is the Vanguard system recently released by the Prentke Romich Company (PRC). PRC is perhaps the largest manufacturer of augmentative communication aids, and has successfully developed and marketed a variety of AAC systems. The Vanguard was chosen because it supports a variety of selection techniques (direct selection, headpointing, scanning, joystick, etc.), each of which can be effectively used with T9. The Vanguard also features a large color touchscreen display that makes it possible to develop and test a variety of visual interfaces.
A significant portion of the Phase I project is devoted to implementing and testing specific improvements that are designed to increase the effectiveness of the T9 technology when used in a communication aid. Once the new version of the firmware has been developed, testing will begin with a group of ten individuals who will be provided with Vanguard systems that have a specially modified version of the T9 firmware installed.
References
Arnott, A. L., & Javed, M. Y. (1992). Probabilistic Character Disambiguation for Reduced Keyboards Using Small Text Samples. Augmentative and Alternative Communication, 8, 215-223.
