CounterVision: A Screen Reader with Multi-Access Interface for GUI

Yoshihiko Okada, Katsuhiro Yamanaka, Akio Kaneyoshi and Osamu Iseki
Kansai C&C Research Laboratories, NEC Corporation
1-4-24 Shiromi, Chuo-ku, Osaka 540 JAPAN
TEL: +81-6-945-3213; FAX: +81-6-945-3096
Internet: okada@obp.cl.nec.co.jp

Web Posted on: December 12, 1997

1. Introduction

Many screen readers that enable the blind to access computer systems with a graphical user interface (GUI) have been developed. We have studied the ways of providing blind users with an effective interface for multimedia information and are in the process of developing a new type of screen reader, CounterVision, for this purpose. CounterVision incorporates a concept of multimodal user interface, which we call Multi- Access Interface. In the interface, two types of operational aspects, the logical structure and visual structure of a GUI, and two types of operational methods, indirect and direct pointing, can be flexibly combined according to needs of a user. Multi-Access Interface is realized by utilizing the conventional interface devices, e.g., a numeric keypad, a voice/sound synthesizer, a touch screen, and a Braille pin display, and a special pointing device with an image pin display. In this paper, the basic concept and configuration of CounterVision and the prototype of each subsystem are described. The CounterVision system is available for Microsoft(R) Windows(R) environments.

2. Access methods of GUI screen readers

Interactive objects such as window, title bar, icon, button, and menu bar, all of which comprise a GUI display, are presented by using two types of structural aspects[Mynatt]: a logical tree hierarchy and a visual display layout located in a 2-D space. Conventional screen readers allow users to deal with either of these structures. For operating interactive objects of GUI, there are two methods of pointing at a object. They are indirect pointing method, which utilizes command input devices such as a keyboard, and direct pointing method, which utilizes graphical pointing devices such as a mouse. These two methods correspond to the one for text oriented, i.e. DOS-based, screen readers and for ordinary GUI for sighted users, respectively. Considering these structural aspects and pointing methods, GUI access methods can be categorized into four types.

(A) Indirect-Visual Structure (IVS) search method Arrow keys or a numeric keypad are used to move along a matrix that consists of divided small areas displayed on a screen, and users obtain information about the GUI interactive objects.
(B) Indirect-Logical Structure (ILS) search method Arrow keys or a numeric keypad are used to traverse along a logical structure hierarchy of GUI interactive objects, and users obtain information about the objects.
(C) Direct-Visual Structure (DVS) search method Users directly point at GUI interactive objects displayed on a screen by using direct pointing devices and obtain information about the objects.
(D) Direct-Logical Structure (DLS) search method Users directly point at the logical structure of GUI interactive objects represented in a 2-D space by using direct pointing devices and obtain information about the objects.

Generally, with the logical structure search methods, it is easy for users to understand the relation between tasks and interactive objects and to search desired objects with certainty, but it is difficult to obtain information about the layout of the interactive objects. On the other hand, with the visual structure search methods, it is possible for users to obtain the locations of interactive objects, but it is difficult to find the locating rules and accurate search paths. Though sighted users can intuitively grasp these two structures at the same time and operate the GUI interactive objects, blind users can only grasp one side. Each access method has both merits and demerits respectively for the blind. We believe that adapting access methods to user's situation dynamically based on their characteristics will help users to manipulate various information in GUI environment easily and effectively.

3. Multi-Access Interface

Multi-Access Interface is designed to present blind users both two types of GUI structure aspects and allow them to use the four access methods selectively to have an advantage in their situations. We examined users' situations based on their tasks and tried to adapt operational methods, output media and operational devices for each situations. The situations were categorized into four as listed below. Followings are the discussions about adaptations.

(a) Task: accessing textual contents
Method: IVS search
Media: Braille, synthesized voices, sounds
Devices: arrow keys, Braille pin display keys
(b) Task: general GUI operations
Method: ILS search
Media: synthesized voices, sounds
Device: numeric keypad
(c) Tasks: obtaining information about layout and image, learning GUI environments
Method: DVS search
Media: tactile image pins, synthesized voices, sounds
Devices: dedicated pointing device with an image pin display, touch screen
(d) Task: routine work operations
Method: DLS search
Media: synthesized voices, sounds
Device: touch screen

3.1 Adaptation of the search methods

In a GUI environment, the information presented to users is categorized into two types: operational information and contents of interest.

(1) Access to operational information

When users want to get general information or make documents, it is expected that a GUI system can be operated without being concerned about the layout of interactive objects. Therefore, the logical structure search methods are suitable for the basic GUI operations of starting, operating, and switching application programs. For efficient operation when doing routine work, direct pointing can effectively be used for shortcut operations. However, when users want to learn or customize a GUI system or an application, understanding the structure among interactive objects and their layout information is more important than efficient operation. Therefore, the visual structure search methods, especially the DVS search method, are more effective because users can obtain the absolute locations of interactive objects.

(2) Access to contents or documents

When users want information concerning textual contents (documents), it is possible to access sentences, tables, etc. by moving the cursor, just like with conventional DOS screen readers. This method is equivalent to the IVS search method and is already familiar to users. The IVS method is especially important for grasping spatially laidout texts such as tables. On the other hand, when users want to do desktop publishing or something concerning images, it is important to understand the absolute location of information. Therefore, we adopted the IVS search method for accessing textual contents and the DVS search method for accessing graphical contents.

3.2 Adaptation of output media

The following media are available as outputs with CounterVision.

synthesized voices (male, female)
sounds
Braille
image pin display

Using auditory outputs is reasonable for searching interactive objects or being informed of an operational result, because in such situations, the information may not last long. Synthesized voices are suitable for outputs where object captions must be identified. Sounds can be used for outputs of other attributes or operational results, because they should only be presented briefly for efficient operation. Furthermore, their variations could be limited in number, thus allowing users to easily remember them. Synthesized voice outputs are also available for displaying textual contents. Since users may want to read the texts carefully or repeatedly, a Braille pin display, which can maintain information and reduce the burden on auditory cognition, must be provided. It is so difficult for the blind to grasp a 2-D layout and/or images, but a dedicated image pin display will help users obtain such information when it is used cooperatively with auditory feedback.

3.3 Adaptation of operational devices

The following devices are available for operation with CounterVision.

keyboard (including numeric keypad)
Braille pin display keys
touch screen
pointing device with a small image pin display

With Multi-Access Interface, users can employ multiple access methods by properly switching them. However, selecting a device according to operational semantics may cause confusion and increase cognitive burden. Operational devices must be naturally assigned according to the target information of the user, and access methods must cope with this without being concerned about mode switching.

As described so far, the IVS search method is used to obtain textual contents or documents. The arrow and Braille display keys are customized to for functions based on cursor movement operation and are adapted to the IVS method according to the application. Therefore, users can use devices according to the type of job. For example, if input operations are for mainly making documents, they will use the keyboard; if operations are for mainly getting information such as reading mail, they will operate the Braille display and obtain information. By operating a numeric keypad customized for logical structure traversing functions, the ILS search method can be used to obtain operational information. This will help users distinguish similar devices, arrow keys and a numeric keypad from each other by remembering the former operation according to contents and the latter operation according to logical structure.

With the two direct search methods, some pointing devices with direct manipulation interface are necessary. A touch screen customized to display logical structure tables is available for the DLS search method, and a pointing device with a small image pin display is available for the DVS search method. The latter device has been especially developed in this project. It has switches and a small tactile pin display that presents a part of the area on screen. Users move it just like a mouse on the touch screen, and information for the area being pointed at is provided.

As such, users select operational devices according to their needs, and they can suitably use the multiple access methods suitably without being concerned about access mode switching.

4. CounterVision and its subsystems

The CounterVision system consists of a personal computer, input/output devices and support software to effectively realize and control the Multi-Access Interface. One of the main software products is Off Screen Model (OSM)[Kochanek], which is a database that logically holds information of the interactive objects on screen. Our OSM has some special features, e.g., properly linking multiple objects which are semantically related but systematically separated in a dialog window, and registering hidden objects by using other objects on screen as well as displayed objects. CounterVision translates data structures of OSM into a certain structure based on the required access methods. Using OSM, coordinating entire GUI environments is assured even when the access methods are switched.

To examine adaptation of the four methods, we have developed and evaluated prototypes of the CounterVision subsystems, CounterVision/Screen Reader (CV/SR), Touch Sound Display(TSD), and Direct pointing CV/SR.

4.1 CounterVision/Screen Reader (CV/SR)

CV/SR has been developed to implement functions that equal existing GUI screen readers and supports the two indirect search methods. CV/SR is available for use with a standard PC (NEC PC-9821 series) and the Japanese editions of Windows3.1 and Windows95. Users can switch operational methods between the two via keyboard operations and obtain information about objects mainly through auditory feedback.

In particular, we prepared a special purpose search table for the ILS search method. This table consists of columns for task processes of Windows and rows for selectable objects entered as a task proceeds. It supplies blind users with unified operations and allows them to easily grasp the relation between tasks and objects. In addition, by displaying a search table window, sighted users can watch and assist a blind user.

According to an informal evaluation of CV/SR by blind users, the search table contributed to an easy understanding of object structure and the ILS search method worked efficiently for the assumed general operations. However, some problems have also been discovered. Using the search table structure is inappropriate when users need to understand the situations, especially when parallel operational options are presented. An example of this is when Program Manager has two types of parallel operational options, such as application starting operations and menu operations. In addition, as hierarchy deepens, it is harder to grasp the current position because of an increasing burden on cognition. Although CV/SR has an information filtering function through which users can limit the displayed items according to their needs, when too many items are presented, a more efficient search method may be required. The DLS search method may be a candidate because search paths cannot be restricted by itself.

4.2 Touch Sound Display (TSD)

As a first step for the DVS search method, we have developed TSD, which is an interactive auditory display. Users can touch interactive objects on the touch screen, and auditory feedback for the object being pointed at is obtained. Each object has a variety of information such as name, caption and some other attributes. Furthermore, many hierarchical levels of object are displayed at the same time in spite that blind users can be overwhelmed. These attributes and levels must be selected by users according to their needs dynamically. Object-View Control is the function for this purpose.

When we observed blind users operating TSD, they could freely search interactive objects and grasp the location and relation between objects, even if their operation was rather inefficient. This means that TSD and its DVS search method can be used to learn the concept of a Windows environment. However, as expected, this method is not suitable for general operations where efficiency is needed. Furthermore, users could not find needed objects that were near the one being pointed at, because only information about the object being pointed at was presented by the feedback.

4.3 Direct pointing CV/SR

The DLS search method is available with a Direct-pointing CV/SR prototype system. In this system, users can directly touch the search table of CV/SR that is unfolded on a touch screen.

Based on our pilot study, blind users could approximately point at objects on the search table without using a certain search path. This system was considered just like TSD, except that presented objects were are arranged in a tabular format by a rule. The rule causes the objects to have an absolute location with semantics in a 2-D space, and the item filtering function ensures that an object is being directly pointed at. Therefore, the user may be able to grasp the locations of objects that are aligned by the rule of the search table. However, some tactile guide lines will be necessary to make direct pointing more accurate and easier. It must contribute to an easier understanding of parallel operation structures by locating divided tables on their opposite sides according to the types of tasks.

5. Conclusion and future work

The appropriateness of adapting the four methods in Multi-Access Interface has been accepted for the most part. As a next step, we must integrate these subsystems and the pointing device with the image pin display that we are incorporating into the CounterVision system. Then we need to evaluate whether or not the four search methods can be switched easily and naturally. We expect that most of problems with the direct- pointing methods will be solved by utilizing the pointing device with the image pin display. Furthermore, the four methods may not be always used independently. For example, when a layout includes the logical or semantic information. Therefore, we are investigating using the methods in various combinations.

This work was performed as part of the National Research & Development Programs for Medical and Welfare Apparatus under entrustment by the New Energy and Industrial Technology Development Organization (NEDO).

Reference

[Mynatt] E.D.Mynatt, G.Weber: Nonvisual Presentation of Graphical User Interfaces: Contrasting Two Approaches, CHI'94 Conference Proceedings, pp.34-42, 1994.3.

[Kochanek] D.Kochanek: Designing an Off Screen Model for a GUI, lecture Note in Computer Science, Vol.860, pp.89-95,1994.

Go to the top of this page. | Go to the upper category.