AN ACCESSIBLE WEB BROWSER: THE APPLICATION OF FIRST ORDER DESIGN PRINCIPLES IN PwWebSpeak (TM)
Markku T. Hakkinen
The Productivity Works, Inc., Trenton, New Jersey
hakkinen@dev.prodworks.com
John C. De Witt
De Witt & Associates, Inc., Glen Rock, New Jersey
dewitt@village.ios.com
Web Posted on: November 30, 1997
ABSTRACT
Information access for persons with visual disabilities has generally been achieved through multi-layered systems that apply screen-reading technologies to interpret visually oriented presentations. Using classic control-theory, this accessibility model may be described as "second order." By definition, it is significantly harder to learn and use than a first order, direct manipulation interface. In approaching the design of pwWebSpeak, the authors developed a model of a first order, non-visual interface to the world wide web. This approach directly reads the HTML document's contents and uses a rule base for controlling the non-visual, auditory rendering of the information. The interface allows direct navigation and querying of the document structure, eliminating the need to infer structure from a visual presentation. The authors describe the key principles of first order design as applied to web browsing, and its implications for the design of other accessible technologies.
INTRODUCTION
As computer-based information and the devices used to access and interact with it have proliferated into our global society, the definition of a computer user has grown to encompass an ever broadening set of individuals. Because of the visual/motor interface foundation of most computer-based systems, users with certain disabilities, such as blindness or low vision, are frequently denied access to computer-based systems, without the aid of "assistive technology." In the traditional sense, assistive technology makes adaptations or extensions to existing, visually oriented systems that facilitate translation of one display modality into another, such as text to speech synthesis. Though assistive technology improves access, it's only a small step toward the concept of equal access and universal design embodied in government regulations such as the Americans with Disabilities Act and the Telecommunications Act of 1996.
The original intent of the World Wide Web was to provide universal access to information using a standard content definition language (HTML), with platform independent visual presentation. Tim Berners- Lee, creator of the web, has used the term 'universal readership' to define the environment in which information can be accessed by any user with any computer.
As the Web has evolved, the original concept, which in principle held great promise for universal access to information, has been extended to compete more directly with the interface capabilities and styles of personal computer applications. Whereas HTML initially defined structured, textual documents, it has now become an application development framework, with the inclusion of Active X and Java, and multimedia presentation engines such as MacroMedia Shockwave. The notion of a universal client to the web has been supplanted by competition among major vendors each trying to dominate the browser marketplace.
Within this rapid evolution, accessibility has taken an effective back seat. It may well be said that just as screen reading technology has begun to have more success in adapting itself to the Graphical User Interface, the technology access barrier has once again been raised by the internet.
Non-visual access to the web using one of the existing mainstream browsers, such as Netscape Navigator or Microsoft Internet Explorer involves the use of a traditional screen reader application. Screen readers provide a translation of the on-screen presentation, which includes standard control mechanisms and textual information into speech or braille. In general, screen readers have no application knowledge, meaning that each new dialog or visual layout must be investigated by the user to determine its meaning and operation. With the richness of the web, every new web site encountered thus presents a new challenge for non-visual understanding.
In approaching the issue of providing effective and efficient access to the web for non-visual users, the authors applied the principles of first order design to the creation of a new, non- visual web browser. The philosophy of first order design is based upon the fundamental rules of control system design.
In traditional Human Factors Engineering, examples of first order systems, such as the automobile steering wheel, are contrasted against second (or higher order) systems, such as the steering control of a super tanker. In driving an auto, turning the wheel in the desired direction generally results in the immediate turn of the auto. Feedback is immediate and in general, easy to learn and master. A supertanker, on the other hand, has a tremendous lag time between steering input and visible result of the action. The resulting system is difficult to learn and master.
Using this model, a direct manipulation of the GUI interface for a sighted user can be considered first order, in that response to actions is immediate and is in general, visually intuitive. The well designed GUI is, thus, an optimal interface for the sighted user in that it is easy to learn and master.
For the non visual user, the combination of the screen reader and the underlying visual GUI results in a system that is, at best, second order for many interactions. Because screen readers can not interpret or present the intent of the application, the non-visual user must explore the screen presentation to understand its usage. Graphics, in the form of icons or visuals, offer no clue, and must be labelled by trial and error or labelled by a sighted user.
The combination of screen reader and web browser thus results in a system that by design is significantly harder to learn and use. Even with planned improvements such as Microsoft's Active Accessibility, the visual metaphor remains the dominant style of user interaction. Thus, for the majority of blind and low vision users, who have no computer experience, the internet will remain a challenge.
Visual applications also exhibit inherent limitations that need not restrict the possible interactions for a non-visual user. Information, such as web documents are presented in windows, and as such, screen readers only access the same amount of information that is available visually. If the word 'Configuration' trails off the right edge of the window so that only 'Configurat' is visible, the screen reader will announce 'Configurat'. The underlying application, has the full word in memory, and the visual user can scroll to see it. In addition, visual users may have the option of zooming a document to see context, thereby loosing the visual clarity of the text and providing little information to the screen reader.
Control functions may also be required for visual display concepts that have no necessity or meaning for the non-visual user. In particular, the WYSIWYG model of a printed document so commonly used in software applications to present digital information is less meaningful when that digital content can be directly presented as spoken text stream. Information presentation and navigation based upon document structure can be a more direct and efficient interface for the visually impaired user than reliance upon a printed page metaphor. Non-visual users functioning in non- collaborative environments need not be constrained by the limitations of the physical display screen.
Navigation of displayed web pages is controlled in two ways: via hyper-links within the displayed pages, and by browser controls, such as commands to move backwards through a list of recently accessed pages. Browser usability is enhanced through the addition of extended features like book marks, save to a local file, or mail functions. Browsers such as Netscape or Internet Explorer, help web page authors achieve market differentiation through application of animation and visual display attributes, inevitably leading to more "creative" and complex presentations of graphical images, and text colors, fonts or styles.
However, the underlying basis of a browser is the recognition and processing of document structure. By nature of the information contained in HTML, the browser can choose appropriate display attributes, display text, trigger the display of multimedia elements, and then act upon user selected navigation or control requests. In designing pwWebSpeak, the authors sought to provide an easy to master interface for web browsing by creating a non- visual, first order browser.
In considering accessible browser design, the present authors began by defining a modular architecture that would support a variety of access methods for both input and output. With the assumption that sighted users would remain with the traditional browsers such as Netscape, Internet Explorer, Mosaic (or even Lynx), the fundamental design focused on alternative display formats such as text to speech, large print, and Braille, with control mechanisms including keyboard, speech recognition, and sip/puff switches.
The pwWebSpeak project began with no assumptions or existing code- base for the browser. In fact, early prototypes of the browser had no visual interface, with all web presentation coming directly from the text to speech synthesizer. A fundamental concern in the development process was to create a browser that would run on an economical hardware platform, and support a user's existing investment in assistive devices such as external text to speech synthesizers.
The core of the pwWebSpeak accessible browser is an HTML processor, which parses an accessed web page into internal structures used to control navigation and display. A major part of the processor is a rule base that determines how output should be formatted for auditory presentation. With text to speech output devices, it is possible to alter speech parameters so that different voices can be assigned to different structural elements in the documents. For example, when used with a synthesizer such as DecTalk, pwWebSpeak presents HTML heading text in a voice different from that used for the body text.
The rule base can be compared to the proposed HTML cascading style sheets (CSS) for audio. In the case of pwWebSpeak, the rules for rendering the HTML to speech are stored and processed within the browser.
pwWebSpeak's rule base development has been based on interpretation of HTML 3.2 tags and can be customized to include non-standard extensions, as well as user defined translation rules. In general, the rule base is used to map HTML tags into auditory attributes appropriate to the synthesizer device being used. Basic rules define handling of in-line graphic and media objects, with alternate text automatically presented if available. Tables are presented in appropriate row and column order. Research is underway on additional table navigation rules.
The output of the HTML processor is placed into several internal structures. One structure is used to contain a list of all document elements while another contains a list of all links in the current document. Additional structures contain processed text ready for direct output to the selected output device.
In supporting the non-visual presentation and navigation of web documents, pwWebSpeak allows the user to choose to listen to an entire document, individual elements, or browse the document structure. Links are available from a selection list, and also indicated in context when the document is read. pwWebSpeak's first order approach enables single keystroke movement into data forms where information headers are heard, fields are easily filled and the completed form quickly submitted. An entire web page can be scanned for a desired text string with an internal search engine and then spoken in context. Ongoing speech can be interrupted or skipped with a single keystroke.
To provide an overall framework for understanding the page structure and content, pwWebSpeak provides summarization of the page, identifying with a single keystroke the number of links, images, data entry forms and major elements of the document structure. We continue to explore additional ways of quantifying information content and layout. In addition to summarization, users can 'drill down' into individual document elements for added information, such as URLs or position within the document.
CONCLUSIONS
The brief history of accessible design has shown us that considering the needs of those with disabilities can often have broader, societal benefits. For example, closed captioning, now available with all new televisions, is being used by more than just the hearing impaired, including those learning to read. By making the web accessible to those who could not otherwise do so with existing technologies, we are in turn making the web usable in ways not originally envisioned. In our own early testing with pwWebSpeak, eyes were opened for some users, hearing web pages spoken to them as they performed tasks some distance from their desktop PC. The idea that one must be tethered to a computer display to access the web goes away, with mobile workers accessing web-based information via telephone and maintenance workers reviewing web-based procedure guides via headset and voice commands as they repair complex machinery. The concept of non-visual, first order information access is one that can find broad application.
In order to ensure that the web remains accessible to browsers such as pwWebSpeak, web content developers must become aware of and adapt their sites to conform to the emerging design recommendations. Further, web authoring tools need to be designed so that they promote accessibility, making the generation of accessible content easy and requiring no significant extra effort. By supporting accessibility at all levels of the web, we can ensure that the original philosophy of client independent, universal access to web-based information lives on.