音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

LESSONS FROM APPLYING CONVERSATION MODELLING TO AUGMENTATIVE AND ALTERNATIVE COMMUNICATION

Norman Alm
Applied Computer Studies Division
University of Dundee
Dundee, Scotland, UK DD1 4HN
Voice : +44 1382 345596
Fax : +44 1382 345509
Website: http://alpha.mic.dundee.ac.uk

Alan F. Newell
Applied Computer Studies Division
University of Dundee
Dundee, Scotland, UK DD1 4HN
Voice : +44 1382 344145
Fax : +44 1382 345509
Website: http://alpha.mic.dundee.ac.uk

John L. Arnott
Applied Computer Studies Division
University of Dundee
Dundee, Scotland, UK DD1 4HN
Voice : +44 1382 344148
Fax : +44 1382 345509
Website: http://alpha.mic.dundee.ac.uk

Web Posted on: November 22, 1997


Introduction

One approach which has been taken to increase the speaking rate and communicational impact of augmentative and alternative communication for non-speaking people is to use conversational modelling to direct predictive systems. Research prototypes have been developed to experiment with such an approach, and these have given us a number of findings about this approach to augmentative communication, not all of them obvious ones.


Incorporating pre-stored texts in aided conversations

One such system was developed to test whether, if prestored texts were readily available, their inclusion in computer-aided conversation would prove of benefit to the quantity and quality of conversational output by the system user. The user interface allowed for speaking a quick comment, a small paragraph, or a sequence of paragraphs making up a coherent narrative. The layout encouraged the user to be flexible in mixing and matching among these possibilities.

Trials of this prototype were conducted using a single-case experimental design. The intention was to assess the effectiveness of the prototype system, in terms of its ability to help the user take a fuller part in a dialog, and to have more control over the direction of the conversation. Analysis of the transcripts showed that, when the prototype was added to the user's communication modes, the user was able to increase significantly the total number of words employed in each conversation (by a factor of three). The output of the other speakers was unaffected, if anything being slightly higher, which indicates that the AAC user having the ability to introduce text did not create more passive behaviour on the part of the other speaker. Conversational control by the AAC user was also significantly increased, as measured by his increased use of initiating remarks and decrease in responding remarks. Again, the natural speakers retained their level of initiators even when the AAC user's was increased, indicating a dialog which was in general more lively. Before the trials it was considered a possibility that a system which allowed the user to introduce texts as lengthy as they liked into a conversation might make the interaction too one-way, and for the other participant, too much like listening to a lecture. This proved not to be the case in the trials of this prototype. [1]

With another similar prototype, tests were performed on the content of the conversations produced, compared with conversations between two unaided speakers. The unaided conversations were between pairs of volunteers who were asked to converse together on the same topic as was used in the prototype tests. Transcripts of randomly sampled sections of the conversations and audio recordings of re-enactments of the samples with pauses removed were rated for social competence on a six-item scale by 24 judges. The content of the computer-aided conversations was rated significantly higher than that of the unaided samples. [2]

This finding came as something of a surprise to the researchers, since the purpose was to establish whether conversations using pre-stored material would simply be able to equal naturally occurring conversations in quality of content. Of course, the pauses in actual computer aided conversation do have an effect on listener's impressions of the quality of the communication, but this finding is still of interest, since it suggests that in some ways augmented communication could have an edge over naturally occurring talk. A plausible explanation for this finding is that naturally occurring talk is full of high-speed dysfluencies, mistakes, substitutions, and other 'messy' features which listeners tend to discount with their ability to infer what the speaker is intending to say. Pre-stored material is by its nature selected because it may be of particular interest, and it is expressed more carefully than quick flowing talk, and thus may appear more orderly and dense with meaning than natural talk.


Using models of conversational interaction

Having demonstrated that incorporating prestored texts into aided conversations could enhance their quality and effectiveness, the next step has been to explore possible ways to structure this material so that it can easily be called up by the user at appropriate points in the conversation. This is a challenging task. Efficient large scale storage and retrieval of texts is in itself a research challenge. The fact that what is wanted is, ideally, instant results, with minimal attention to the search task by a user who is wanting to concentrate on the interaction itself, makes the task even more significant.

It seems feasible that using models of conversational interaction as structures to hold the stored text is a sensible way to proceed. The fact that our knowledge of conversational structuring is incomplete does not mean that there are no helpful possibilities, and a number of prototypes have been developed based on models of conversational interaction.

Conversational features such as opening/closing sequences, backchannelling, and step-wise topic shifts have all been investigated and produced lessons for system designers.

Conversation can be infinitely variable. However, set against this dynamic aspect to conversational encounters, which relates to the unpredictability of everyday life in general, there is also a static aspect to interactions. That is, there are features of our personality and social position which we portray to others by providing them with a relatively static picture. One way we accomplish this is by the way in which we handle social etiquette routines. These routines are usually well-structured, commonly understood, and are performed in more or less the same manner by all of us. They are particularly in evidence in opening and closing sequences in a conversation.

A demonstration system called CHAT included a number of important opening and closing routines were provided in a semi-automatic way for the AAC user. The words actually spoken were drawn from a store created beforehand by the user, so they always reflected that person's individual method of expression, but the retrieval process allowed the user to operate at the level of the speech act, rather than search out specific phrases. The system also included a range of feedback speech acts, which could be said with the equivalent of one keystroke. Another feature which the prototype had was the ability to alter the mood of the utterances produced. The user could say the same things in four different moods : polite, informal, humorous, and angry.

Trials of this system demonstrated that reasonable sounding interactions of particular types were possible to achieve at greatly increased speaking rates. Users and conversation partners both reported that the system improved the quality of their interaction. The only negative comments were that the users felt frustrated that the system could not help them in the same way when getting on with the central portion of a conversation, and talking about a particular topic [3]. One of the people testing the system used it in unexpected ways, which showed the extent to which imaginative use of a given facility can often extend its capabilities. She used a feedback remark designed to elicit more information from the other speaker (e.g. 'Tell me more about that, if you can.') as a means of humorously prolonging a conversation with someone who clearly trying to leave. The same person used the mood setting playfully to introduce humour into her interactions (e.g. by staying in angry mood throughout, just to tease the other person). This showed that, given a sufficiently rich structure for prestored material, it was possible to give it more individuality by imaginative uses of it.

Conversation tends to consist of alternating turns, with alternating perspectives on the topic being expressed. A prototype called TALK has been developed which allows the user to change perspectives and thereby select a new set of candidate texts for speaking by choosing three aspects of the topic : person (YOU, ME), time (PAST,PRESENT,FUTURE), and orientation (WHERE, WHAT, HOW, WHEN, WHO, WHY). The system also incorporated CHAT-like features, which allowed rapid production of openers, closers, and feedback remarks. An additional commenting facility was also included, which allowed for a selection to be made from a small set of comments which could be used in many contexts, but which needed to be specified precisely. Evaluations of this system showed it to be helpful in increasing the amount and quality of the conversational material produced [4]. It is not clear whether the perspectives used thus far are the optimal ones for a wide range of conversations, and further work is planned to investigate this.

Another prototype been developed to try out another method of modelling step-wise topic shifts in conversation. This prototype used a method called fuzzy information retrieval to achieve this simulation. With conventional text storage methods, each item is tagged with a set of descriptors, and a search for related items involves comparing these descriptors. Although we have experimented with using such a tagging method [5], this is not an entirely successful approach to use in a conversational database, because of its inflexibility. Fuzzy methods have been applied successfully in control systems of various sorts. If applied to an information retrieval system, the theory allows for more flexible storage and retrieval methods [6]. The similarities between items in the database can be captured without the need for similar items to share a number of descriptors from a given set. From the point of view of a conversational database, a fuzzy set retrieval system has the advantage that, given one item, it will always produce a set of the most similar items in the database. It will never return from a search with no items found.

To evaluate the performance of the prototype system which was developed against an equivalent system based on ordinary database retrieval methods, a version of the system was created which used the same stored text items, but which depended on conventional database searching to compare stored items. As expected, the conventional system often produced no texts which matched a given text, whereas the fuzzy set system always produced a full set of candidate texts [7].


Summary

A number of lessons have been learned from building models of conversational interaction and using them as a way to structure prestored conversational texts for use in a communication system for non-speaking people. Even though our knowledge of how conversation is structured is incomplete, such modelling does seem to have the potential to offer improvements in conversational quantity and quality. Naturally, users would always have the ability to speak new material, but the ability to incorporate reusable material efficiently and 'invisibly' is a goal worth pursuing.


References

[1] N.Alm, J.L.Arnott, A.F.Newell (1992) Evaluation of a text-based communication system for increasing conversational participation and control, Proceedings RESNA International '92, pp. 366-368.

[2] J.Todman, L.Elder, N.Alm (1995) An evaluation of the content of computer-aided conversations. Augmentative and Alternative Communication, Vol 11, pp. 229-234.

[3] N.Alm, J.L.Arnott, A.F.Newell (1992) Prediction and conversational momentum in an augmentative communication system. Communications of the ACM. Vol 35, No 5. pp 46-57.

[4] N.Alm, J.Todman, L.Elder, A.F.Newell (1993) Computer aided conversation for severely physically impaired non-speaking people. Proceedings of INTERCHI '93, pp. 236-241.

[5] N.Alm, J.L.Arnott, A.F.Newell (1989) Database design for storing and accessing personal conversational material, Proceedings of RESNA '89, pp. 147-148.

[6] C.V.Negoita, P.Flonder (1976) On fuzziness in information retrieval, International Journal of Man Machine Studies, Vol. 8, 1976, pp. 711-716.

[7] N.Alm, M.Nicol, J.L.Arnott (1993) The application of fuzzy set theory to the storage and retrieval of conversational texts in an augmentative communication system. Proceedings of RESNA '93, pp. 127-129.