Web Posted on: February 16, 1998

A FRIENDLY DOCUMENT READER BY USE OF MULTIMODALITY

Philippe Truillet, Nadine Vigouroux & Bernard Oriola
IRIT UMR CNRS 5505
118, Route de Narbonne
31062 TOULOUSE - France
Voice/TDD/Message: (33) 561 556 314
FAX: (33) 561 556 258
Internet: {truillet, vigourou, oriola}@irit.fr

1. INTRODUCTION

This paper aims to describe a new generation of World Wide Web (W3) document reader by the use of a multimodal presentation. Even if commercial accessibility tools allow blind to access and to read W3 documents they mainly "display" the ascii text after the filtering of graphic objects and/or the adaptation of the document structure in a form that can be more easily read by visually impaired persons like in the WAB system (Kennel 1996). Most of these approaches suppress the document structure layout in the goal to use commercial screen reader software to read text audibly or through on-line Braille display. Even though, several works in the field of cognitive research (Thon 1993, Vivier 1996) have pointed that the layout of the document is a sense carrier and seems to be important for increase the comprehension process during a reading task.

To reduce these limitations, we are developing Smart-Net, a multimodal user interface that takes into account also the text attributes and the document layout to present information in a non-visual form. This interface is based on:

1) an interpretation process of the HTML (Hyper Text Markup Language) tags which represents explicit and descriptive encoding of each component of the document;
2) a cooperation model of tactile and aural modalities to present W3 in a more accessible way, according to the user's sensory and cognitive capabilities.

Firstly multimodal and sound concepts will be discussed as solutions to reduce the accessing problem of HTML documents on the Web. Secondly some examples of these modality cooperation will be described as illustrations.

2. ACCESSIBILITY TO THE WORLD-WIDE-WEB BY BLIND

2.1. The accessibility aims

Offering an access to the W3 affords a challenging opportunity to the blind to get information. W3 browsers give access to huge databases (e.g. electronic newspapers, remote lectures, etc.) related to professional activities as well as cultural ones. This is a great opportunity for blind people who cannot access printed documents. Since 1994, several user interface tools have been realized which can be used by blind people such as Lynx, pwWebSpeak (TM), W3 Access For The Blind or Microsoft Internet Explorer (C) and Netscape Communicator (C) coupled with for example the screen reader JAWS (for an overview, see Truillet 1997).

To get an overview of the accessing problems on the W3 (access, interaction, presentation), evaluation protocols are necessary to know the interaction needs of the users. Therefore, a protocol on the usability of browsers (pwWebSpeak (TM), Lynx and Microsoft (C) Internet Explorer associated with the screen-reader JAWS) was worked out at IRIT and run by six blind people (Oriola 1998). The analysis of this evaluation has shown the following needs:

1) Obtain quickly an overview of the document in terms of the semantic content but also the size, the hyperlink number, etc.;
2) Navigate from link to link, or from section to section trough the document (here the section refers to the level of the structure, for instance, header, chapter, paragraph, etc.);
3) Have notification concerning both the typo-dimensional information ("Where am I, Is it a title?") and the typographic attributes ("The sentence I have heard, Is it in bold?");
4) Obtain notification information on non-textual elements present in the document (e.g. graphs, diagrams, pictures, video, etc.). It would be helpful if the visually impaired person can be aware of them;
5) Have knowledge of text elements typed with no standard attributes, for example, a list of words in italic, in bold.

HTML -the markup language used by W3- allows the publisher to encode and to label both text and structure of the document (i.e. morpho-dispositional characteristics and typographic attributes such as styles, etc.). It is why HTML tags offer appropriate labeling useful for the displaying of electronic documents on screen. The challenge is to use them to "display" the HTML document in a non visual form.

2.2. The Smart-Net goals

The Smart-Net system is based on two main interaction techniques used in the new generation of interfaces for disabled persons:

1) The multimodality: the use of several modalities to communicate,
2) And the better integration of the sound modality in the interface.

The Smart-Net system developed at IRIT interprets the tags of HTML (Version 2.0) according to the modality cooperation model. Here, the interpretation process tries to present the HTML component to the user in the best way possible according to these user's sensory capabilities i.e. with a minimum cognitive effort. "Providing a non-visual interface for blind means on the one hand replacing the preponderant visual modality by several audio and tactile modalities and on the other hand working out solutions to materialize a document layout" (Vigouroux 1994).

Interactive systems can benefit from the sound features. We have identified three main functions:

1) Speech restitution by means of Text-To-Speech systems or digital restitution tools for speech recordings;
2) Sound feedback (helpful to report on the behavior of the system, for example, the current link cannot be reached);
3) And event notification (for instance, the end of a document transfer). On another hand, other researches have studied the auralization of documents (Portigal 1994) and shown that one audio modality cannot completely substitute vision.

The cooperation model of aural and tactile modalities, described below, tries to reduce this weakness. Cooperation between these two modalities are based on the results of (Bernsen 1994)'s studies in the field of the output modalities for the representation of information.

3. DOCUMENT STRUCTURE AND MULTIMODAL STRATEGY TO PROVIDE LAYOUT INFORMATION

The Smart-Net user interface interprets the HTML DTD (Document Type Definition). This language is mainly used to encode information to be presented in a visual form. Few systems developed for blind take into account the HTML tags structure except for the hypertext tag (Lynx, Web Access For Blind, etc.).

The Smart-Net kernel interface seems to be more efficient and complete: it is able to interpret as well as the typographic tags than the hyperlink and structural ones. For instance, an efficient interpretation of typographic attributes allow to clearly identify the text from its presentation features. Results of the interpretation process activate the appropriate multimodal strategies of presentation. Find below possible interpretation of the sentence "It's a very important thing" (text written in bold).

Either:

1) The pronunciation of tags (e.g. "in bold") with their associated texts are synthesized by means of TTS devices or again they are sent off to the Braille display;

2) The values of the three prosodic parameters -speed, intensity and pitch- can be specified according to relevant tags. For example, if a part of a text is in bold characters, the user interface translates this attribute by synthesizing this piece of text at a slower speed for individual word emphasis. Default values of each prosodic parameter are linked to the various tags;

3) The content and the structure are sent on different devices. For example, the tags associated to the text are sent on the Braille display while the content (here the text) is synthesized by the TTS devices;

4) A mixed strategy combining 1) and 2) or 2) and 3).

These presentation strategies can be chosen by the visually impaired persons according to their preferences, to their cognitive load as well as to the interaction context of the reading. This represents some advantage over previous systems described above.

Moreover, making use of the structure can be a solution to the handling of multilingual documents. In fact, blind people are often confused when the language in which the links are expressed is different from the language in which the document is written.

Two solutions can be provided. Either a tag (e.g.) is inserted into the structure to identify the language in which the document or a piece of the document is written (this feature is specified in the new version of HTML 4.0), or a lexicon is used to identify the language by means of morpho-syntactic information (this feature begins to be implemented in e-mail reader software (Dial & Play from Elan Informatique, etc.). The first solution supposed the author marks the document, not in the second solution.

The second solution will be implemented on the Smart-Net platform. A multilingual TTS synthesis device is then necessary to switch quickly to the language of the newly linked document.

4. THE SMART-NET USER INTERFACE

4.1. The Smart-Net Platform

Our interface uses in output the multilingual TTS system Elan ProVerbe Speech Engine and an ALVA Braille display.

We have chosen to develop a stand-alone application (JAWS is not used) in order to control outputs to blind users. Our interface is currently running on a Windows 95/NT platform and will be tested in few weeks to study the modalities cooperation in a reading task.

4.2. Other Functions

As described below, the main characteristic of the Smart-Net user interface is the multimodal presentation.

As mentioned by users, the main problem concerning the reading of a document with existing tools is the lack of audio, vocal or/and tactile feedback notification about the environment (e.g. transfer time from the network, connection reset by peer, etc.) When the blind use the Smart-Net interface, they are notified/or have feedback whenever an event occurs - from the environment and after an user action- such as the end of the transfer of a document, the end of the reading of a text, a possible activation of an hypertext link anchor, etc.

5. CONCLUSION

The Web can currently be accessible by blind. Nevertheless, significant and effective methods of non-visual interaction must be worked out to access in the best way the World-Wide Web. Smart-Net is an example of this new challenge in the area of the new generation of user interface for disabled people, based on document structure interpretation. In our opinion based on several studies in cognitive research and Man-Machine Communication, these effective methods have to be based on a multimodal presentation but also appropriate notification and feedback functions.

Even if all problems encountered on the W3 are not solved like frames or applets accessibility, an evaluation protocol is currently worked out to point out the relevancy of the multimodal presentation model. The aim of this protocol is to evaluate if a multimodal presentation facilitates the understanding process during a reading task.

6. REFERENCES

ALVA BV, http://www.alva-bv.nl

Bernsen N.O. (1994), "A Revised Generation of the Taxonomy of Output Modalities", The AMODEUS Project, ESPRIT Basic Research Action 7040, TM/WP11.

Elan Informatique, http://www.elan.fr

Kennel A., Perrochon L., Darvishi A., WAB: World Wide Web Access for Blind and Visually Impaired Computer Users, SIGCAPH, Newsletter, June 1996, Page 10-15.

JAWS - Job Access for Windows, http://www.hj.com

Lynx, http://lynx.browser.org

Microsoft Internet Explorer (C), http://www.microsoft.com

Netscape Communicator (C), http://home.netscape.com

Oriola B., Vigouroux N., Truillet Ph., Lacan F. (1998), "Interaction sur le World-Wide-Web par les non-voyants", to be published.

Portigal S., (1994) Auralization of Document Structure, Thesis, The Faculty of Graduate Studies, Guelph University.

pwWebSpeak (TM) , The Productivity Works Inc., http://www.prodworks.com

Thon, B., Marque, J.C., & Maury, P. (1993). "Le texte, l'image et leurs traitement cognitifs". Colloque Interdisciplinaire du CNRS,"Images et Langages", Multimodalité et Modélisation Cognitive. Paris, 29-39.

Truillet, Ph., Oriola, B., & Vigouroux, N. (1997). "Multimodal Presentation As a Solution To Access A Structured Document". 6th World-Wide-Web Conference. Santa-Clara, 07-11 April 1997.

Vigouroux, N., & Oriola B. (1994. "Multimodal Concept for a New Generation of Screen Reader", 4th International Conference. Computers for Handicapped Persons. Springler Verlag. Vienna, 154-161.

Vivier, J., & Mojahid, M. (1997). "Adaptation de la mise en forme matérielle au lecteur : collaboration pluridisciplinaire Eune modélisation différenciée selon les objectifs de lecture". Le Texte procedural : langage, action et cognition. PRESCOT. Toulouse, 49-69.

Web Access For the Blind, http://www.inf.ethz.ch/department/IS/ea/blinds/

Go to the top of this page. | Go to the upper category.