音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

A PRELIMINARY STUDY INTO SCHEMA-BASED ACCESS AND ORGANIZATION OF REUSABLE TEXT IN AAC

Peter B. Vanderheyden, Patrick W. Demasco, Kathleen F. McCoy and Christopher A. Pennington Applied Science and Engineering Laboratories, University of Delaware / A. I. duPont Institute Wilmington, Delaware

Abstract Augmentative and Alternative Communication systems must, to be effective, support a rich language model that among other things recognizes that much of our conversation is reusable. This means that whenever possible previously composed messages (e.g., sentences) must be easily retrieved rather than typed from scratch. One approach to reusable conversation uses the notion of schema-based structures that recognizes that many of our activities, including conversations, can be seen as an ordered sequence of actions. These schemata are often partially specified so that adaptation to the specific situations is possible. This paper described a prototype implementation based on schema theory and the preliminary results of a two user evaluation in a role-playing situation. Background The field of AAC technology research and development can be broadly classified into three areas: 1) physical interfaces; 2) language models; and 3) output. While progress in all areas is essential for the development of effective systems, language is central to the operation of the system and represents the most active area of AAC research. By language model we are considering: 1) the basic "language units" (e.g., letters, words, phrases); 2) how they are represented (e.g., text, graphics); 3) how they are organized; 4) what additional processing the system does based on user selections (e.g., prediction). In the last decade, researchers have begun to understand the need for more sophisticated language models in AAC. One striking example is the notion of "reusable conversation" introduced by the University of Dundee which recognizes that, despite the enormous flexibility of language, much of what we say, we have said before[2]. Research into reusable conversation has emphasized two major areas: · high frequency and typically short utterances such as greetings [2]; and · lower frequency, but longer length narratives such as stories and jokes[3]. While these types of conversational material are extremely important, there is still an enormous range of contexts that have not yet been sufficiently addressed. For example, the general task of ordering take-out food on the phone will include many "canned" pieces such as giving your address. However it will also have many less predictable components such as the specific food that you are ordering which will depend on the specific restaurant called and that moments preferences. These situations are characterized as being partially structured. You may say roughly the same things in roughly the same order, but there will also be numerous differences each time the situation occurs. In our work we attempt to organize and thus give access to reusable texts by storing the text in "schema" structures. In the field of Artificial Intelligence, Schank and his students [5] developed the notion of a schema as a typical sequence of events associated with stylized activities. For example, going to a restaurant contains an entering, ordering, eating, and exiting scenes. Each scene in this schema contain sequences of actions which are typical in the restaurant domain. Schema theory can be applied to AAC text organization by associating text with the activities captured in the schema A prototype system called SchemaTalk has been developed. It is described below and in more detail by Vanderheiden [6]. It differs from other recent work on schema based organization by Alm and colleagues [1] in a number of ways. Most importantly, it allows partially specified sentences that can be filled with content appropriate words. The purpose of this paper is to describe the first evaluation of the system. Approach SchemaTalk is a stand-alone application that can be accessed from a computer or AAC system. A product-oriented version would be more tightly integrated with the AAC system so that functions such as speech synthesis would not need to be duplicated. However, for research purposes, this configuration is most flexible. Schemata are constructed by the user and stored in a separate text file. Motivated by Schank [5], several levels of schematic structures are available to the user. A hierarchy of MOPs (Schank's "memory organization packets") constrains the global context. Each MOP can contain a sequence of scenes and a list of slots. A scene contains a sequence of sentences (and sentence templates), while a slot contains a list of fillers. When a MOP is chosen from the MOP list, the first scene is entered, and the first sentence of that scene is highlighted. The user can advance from one sentence to the next, or to the previous sentence. The user can also advance to the first sentence of the next scene or the previous scene. Kellermann et al. [4] suggested that scenes are weakly ordered in conversation. Therefore, the user also has the option of calling down a list of scenes (or MOPs) and directly selecting the next one to go to. Slots and fillers have two purposes in SchemaTalk. First, they enable the user to prestore related words or phrases. For example, an avid sports fan could list the names of teams in the slot "team names." Second, a slot can be associated with a particular location in a sentence to create a sentence template. When a sentence template is selected, the list of fillers for a given slot is displayed. For example, choosing the sentence "Who wants to watch the <team names> play on tv at my house?" causes the list of team names to be displayed. Then choosing "Toronto Blue Jays" completes the sentence. Sentence templates reduce the number of selections required to produce a sentence. They also reduce the need to store and scroll through long lists of sentences, had each template filler been stored separately. The SchemaTalk interface is written in Tcl/Tk and C++ in a UNIX environment, and appears as a single window on a computer display. Dialog boxes containing lists of MOPs and scenes appear when called, while sentences for the current scene or fillers for the current slot are displayed as a scrolled list in the main window. Evaluation Method In order to make a preliminary evaluation of the SchemaTalk approach, a series of mock interviews was carried out. Two subjects participated, one with physical and speech impairments, and one without. We refer to them here as S1 and S2. The idea was to see if the subjects could develop schemata and use them effectively in a "real world" situation. S1 and S2 were each interviewed multiple times for a sportswriter's position at ESPN, a television sports channel. Staff at the laboratories played the role of interviewer. S1 and S2 were required to produce all of their verbal communication using SchemaTalk. All interviews were recorded on videotape and then transcribed. S1 normally uses a commercially-available augmentative communication system to speak. He has had no previous experience in interview situations, although he has a strong interest and extensive knowledge about sports. He is 39 years old and produces fairly complex sentences, but has difficulty with spelling and grammar. S1 had not previously spoken with any of the interviewers. In order to use SchemaTalk, S1 was required to learn commands for controlling a computer from his AAC system (using a mouse / keyboard emulator) in addition to learning how to control SchemaTalk itself. S2 is a staff member at the laboratories who speaks using his natural voice and has never participated in a real job interview. S2 is 29 years old and has no disabilities or language difficulties. S2 was familiar with all of the interviewers. Although S2 could control SchemaTalk directly from a computer keyboard, he was required to type using a mouse-controlled on-screen keyboard. This set a minimum of approximately 1 character per second on his typing speed. Although no time limit was given for an interview, a guideline of one hour was suggested. S1's interviews all lasted about the entire hour while S2's interviews were shorter. Nevertheless, the two subjects produced approximately equal numbers of words. In the first interview, subjects directly entered their text: S1 used his augmentative communication system and S2 used the on-screen keyboard. There were a few discrepancies in the conditions for the two subjects in the second interview. In interview #2, S1 used a simplified SchemaTalk interface that contained his sentences from interview #1 and a practice interview in the form and order in which they were originally produced. This was intended to give him some practice controlling the interface and interacting with it using his AAC system. In contrast, S2 was given the complete SchemaTalk interface in interview #2 and allowed to access sentences he had previously organized into schemata. In the remaining interviews, both S2 and S1 were able to access sentences they had previously organized schematically. Because S1 was not familiar with schemata organization and computers, S1's organization was developed in conjunction with one of the authors. Results / Discussion In analyzing the interviews, we were interested in how SchemaTalk (and access to sentences organized by schemata) would affect the user's communication patterns. Several measures are listed in the following tables., In each table we differentiate between words generated by (1) selection using SchemaTalk alone, (2) direct typing, and (3) a mixture of 1 and 2. In the turn count, other refers to communication by gesture, vocalization, or no response.

Interview #1 #2 #3 #4
TurnCount

(turns/interview)

schema N/A 6 1 5
mixed N/A 1 2 2
direct 23 2 5 6
other 20 3 17 11
TOTAL 43 12 25 24
Word Count

(words/turn)

schema N/A 19.8 20.0 23.0
mixed N/A 24.0 32.0 32.0
direct 10.0 10.5 16.2 19.2
TOTAL 10.0 18.2 20.6 22.6
Speech Rate

(words/min)

schema N/A 4.6 14.6 13.8
mixed N/A 2.8 6.6 6.4
direct 4.5 1.7 2.3 3.1
TOTAL 4.5 3.5 3.6 5.3

Table 1: Summary of Results for S1

#1 #2 #3 #4
Turn Count

(turns/interview)

schema N/A 7 11
mixed N/A 2 3
direct 22 17 5
other 0 3 0
TOTAL 22 29 19
Word Count

(words/turn)

schema N/A 11.0 5.4
mixed N/A 14.5 29.0
direct 10.7 3.9 17.0
TOTAL 10.7 6.6 12.2
Speech Rate

(words/min)

schema N/A 37.0 28.3
mixed N/A 18.5 11.1
direct 6.3 3.6 8.4
TOTAL 6.3 7.8 11.5

Table 2: Summary of Results for S2

As can be seen in the results summaries, this preliminary study shows that the users benefited from using SchemaTalk. Both the number of words per turn and the overall speech rate saw a definite increase over the course of the study. This is an indication that the quality of turns and the amount of information included in each turn went up as the users became more familiar with the device, taking advantage of its capabilities to access and organize reusable text. Although the number of turns per interview widely fluctuated, this was probably due to each interviewer's individual style of questioning. Perhaps a better measure to look at would be the proportion of turns involving prestored text. This would reflect the success of the users predicting the types of questions and responses needed for the interview setting. At the very least, the results of this study encourage further testing and development of this method of accessing prestored text. References [1] Norman Alm, Arthur Morrison, and John L. Arnott. A communication system based on scripts, plans and goals for enabling non-speaking people to conduct telephone conversations. In Proceedings of the 1995 IEEE International Conference on Systems, Man and Cybernetics, pages 2408-2412, 1995. [2] Norman Arthur Alm. Towards a Conversation Aid for Severely Physically Disabled Non-Speaking People. PhD thesis, University of Dundee, 1988. [3] Liz Broumley, John L. Arnott, Alistair Y. Cairns, and Alan F. Newell. Talksback: An application of AI techniques to a communication prosthesis for the non-speaking. In L. C. Aeillo, editor, Proceedings of the 9th European AI Conference, pages 117-119, Stockholm, Sweden, 1990. Pitmans, London. [4] Kathy Kellermann, Scott Broetzmann, Tae-Seop Lim, and Kitao Kenji. The conversation MOP: Scenes in the stream of discourse. Discourse Processes, 12:27-61, 1989. [5] Roger Schank. Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge University Press, 1982. [6] Peter Bryan Vanderheyden. An augmentative communication interface based on conversational schemata. In Vibhu Mittal and John Aronis, editors, Developing AI Applications for People with Disabilities, Workshop Notes, IJCAI-95 Workshop Program, pages 203-212, Montreal, Quebec, Canada, 1995.

Acknowledgments This work has been supported by a Rehabilitation Engineering Research Center Grant from the National Institute on Disability and Rehabilitation Research of the U.S. Department of Education (#H133E30010). Additional support has been provided by the Nemours Research Programs. Patrick Demasco Applied Science and Engineering Laboratories A.I. duPont Institute 1600 Rockland Road, P.O. Box 269 Wilmington, Delaware 19899 USA Internet:demasco@asel.udel.edu