音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

Web Posted on: May 10, 1999

Dueling Scanners

Peter M. Scialli, Ph.D.

The second annual Dueling Scanners Event was held on Tuesday, March 16, 1999 in conjunction with the CSUN Conference, Technology and Persons with Disabilities. It is the world's only venue in which vendors of computer based reading products for the blind can present their products for side-by-side comparisons before a live international audience. This report is a result of what was heard and seen during that session. The report was written entirely by the judges whose participation was approved, in advance, by the vendors.

Following the text of the report are written responses from the vendors who participated in the session. Here is the report:

As Jim Fruchterman, President of Arkenstone, put it, "Thanks, Peter, for providing a deadline for adaptive OCR products to announce new innovation." It is true, too: Dueling Scanners has become the event of the year for putting competitors face to face, or in this case, page to page.

Since this event receives wide coverage, and since this report will be read by many in the blindness field, it is vital that as much information as possible is presented so as to allow individuals to make a more informed decision as to what OCR software might best suit their needs, or the needs of their clients.

This year the vendors and their representatives who participated were: Jim Bliss from Jbliss Imaging, Jim Fruchterman and Mike May from Arkenstone, and David Bradburn and Steven Baum from Kurzweil Educational Systems Group.(KESG).

Mr. Bliss showed VIP Info System running on a Pentium/2 333 MHz machine with a Microtek scanner connected via the USB.

Arkenstone premiered Open Book: Ruby Edition, version 4.0, using a Pentium/2 400 MHz machine with a HP 5200C scanner using the USB port.

Kurzweil Educational Group showed its K1000 version 4.0 on a Pentium Celeron 333 MHz with an HP 4C scanner using a SCSI connection.

The session was moderated by Peter Scialli of Shrink Wrap Computer Products, and judged by Rich Ring, Supervisor, International Braille and Technology Center at the National Federation of the Blind and Larry Skutchan, Director of Technology Projects at the American Printing House for the Blind.

The participants were provided with a list of tasks and questions to perform and answer ahead of time, so all three came to the event with the same information. This list of questions and tasks was also available to the audience upon arrival at the session.

High audience participation and overall interest made the answers to specific questions take a bit longer than time permitted, so some of the questions were combined in hope that all the material could be covered in the three-hour time slot. Another problem that resulted in the foreshortening of the list of questions was, that having three vendors made it somewhat time consuming for each representative to discuss the systems they had brought. While three hours may appear to be a long time to expect an audience to maintain interest, there was no sleeping in this session.

The judges began by providing two notoriously difficult pages for each vendor to scan. The first of these samples was a page from "Workforce Diversity" magazine. Each vendor scanned the first page and pointed out basic operation of their system in the process. Each vendor described the difficult page and pointed out how their product handled the various issues presented by this document. This page, though it did not contain graphic material, contained some unusual fonts that were handled differently by each OCR system. VIP attempted to render this material, and did an extremely poor job with the portion of the page that contained the unusual font style. Ruby did about the same. Kurzweil, on the other hand, skipped this portion of the page entirely! In its default configuration, Kurzweil 1000 will ignore "degraded" text. Though this setting can be changed, the problem here is that in this case, since the blind user wouldn't know that there was a portion of the page being skipped, he/she would not realize that in order to see the entire page, for better or for worse, this setting would in fact need to be changed. Though in this instance, the page actually made sense even with this portion left out, we believe that this does the blind user somewhat of a disservice, since he/she would have no way of knowing that there was in fact text on the page that wasn't being properly recognized. It would be interesting to understand what the Kurzweil 1000 program defines as degraded text, since to some extent this like so many other things could be subjective.

The second difficult page provided by the judges was a page from the Damark catalog that contained many graphics scattered throughout the page, multiple columns, and many random sidebars of information in different fonts. One of the most dramatic moments of the show came after VIP performed in the mediocre manner most of the audience expected from experience with current OCR systems, and Ruby nearly flawlessly rendered the page. It should be noted that Kurzweil 1000 did almost as well with this page. Interestingly, Kurzweil and Ruby presented the material in different order. Each vendor justified why their system made the correct decision on how the text was presented, and the arguments for each were persuasive. The page was, in fact, one that a sighted person could read in more than one order, as well, illustrating just one of the subjective decisions these systems have to make. Optical character recognition, in order to be effective, is far more than simply a matter of turning scanned images into readable text. Retaining a page layout that still allows the user to determine the contents of a page in some kind of rational order is essential.

Vendors then went on to describe what it was about their product that made it unique. Bliss demonstrated some impressive features designed for low vision users. His system scanned the page twice, once to gather a black-and-white image of the page for OCR, and another time to acquire the color image of the page. If the page contained more than one image, the user could save each color image to a separate file. This process looked simple, but it would clearly require some vision in order to choose which images one wanted to save.

Given that Dueling Scanners 99 was billed as a comparison of systems especially designed for the totally blind user, the judges did not find this feature particularly valuable, but it should be noted that it could indeed be a useful tool for a person with usable vision.

Bliss also demonstrated a unique feature in scanning software, a hand-held camera that could be used to enlarge paper documents on the user's PC screen. This camera didn't have the resolution to provide an image that would be suitable for optical character recognition; it was strictly used for enlargement. Again, while this was a nice feature for those who can benefit from it, many members of the audience were especially looking for features geared toward the totally blind user.

Using Xerox's Textbridge recognition engine, the Bliss system did not provide the kind of accuracy demonstrated by either Kurzweil 1000 or Open Book: Ruby Edition.

One problem that was obvious was a result of not using a scanner with AccuPage. Many times during the session, brightness level adjustments had to be made manually for VIP to read any text at all.

Like the other vendors, Jbliss Imaging has wrapped a clean user interface around their component of the system. VIP's simple interface consisted of using the four corners of the numeric keypad for control of the application. The interface seemed simple and consistent, but the judges believe that including a standard Windows interface would be a useful addition to the product. VIP uses AT&T's FlexTalk for its speaking voice, and it does not support other dedicated speech synthesizers as both Ruby and K1000 do. This would not be an issue for someone who is purchasing their first computer, but it is surely a drawback for those considering adding optical character recognition software to an already existing system, especially if one has a speech synthesizer to which one has grown accustomed. Note that all three systems come with a software text-to-speech engine that allows the user to obtain speech through a sound card.

Featuring a revamped interface, Open Book: Ruby Edition is a totally redesigned program which has maintained support for the familiar "classic" Open Book menus for those who desire to use them. Ruby is a far more powerful and flexible package than its predecessor, and one would have to say that it was the most improved program being shown. Ruby's new features include editing functions such as cutting, pasting, moving, and inserting pages, and a spell checker and thesaurus, as well as dual-recognition engine processing. The new user interface follows standard Windows user interface guidelines, therefore, if a user is already familiar with the menus and dialog boxes presented in Windows 95, 98 or NT, Ruby will be simple to understand and work with. Since the "classic" Open Book menus are retained as an option in this version, those who feel comfortable with them need not venture into unknown territory.

Arkenstone's Ruby also incorporates many features that would be especially useful to those with low vision. Those features include, but are not limited to: support for numerous fonts and font sizes, contrast adjustments, the ability to adjust the spacing of words, lines and sentences, and the ability to highlight text as it is spoken. Ruby also has an "exact view" mode where an image of the original page, including graphical elements, may be displayed and magnified during reading.

Ruby's dual-engine recognition process impressed the judges: It submits the scanned image to the first engine for deskewing and decolumnization, then uses the second engine to recognize the individual components from the first engine. This innovation represents a departure from the traditional use of a commercially available OCR product in a specialty product for the blind. It bespeaks some interesting technical possibilities for the future. Ruby also supports the recognition of languages other than US English. It will not support the recognition of multiple languages on the same page, nor will it automatically switch text-to-speech engines in order to accommodate those languages. However, it does come with ViaVoice Outloud, which has six text-to-speech engines for several other languages. A total of thirteen recognition languages are furnished on the CD. During the installation process, one is not given the opportunity to install those engines automatically.

Both Ruby and Kurzweil support direct grade II Braille translation and, according to Jim Fruchterman, the screen could be read in interactively translated grade II if one has a screen reader that supports this feature. The judges assume this is true of all three systems. Arkenstone uses TurboBraille for its grade II translation, and Kurzweil uses NFB Trans. The judges do not know which Braille translation software program VIP is using. The judges do not know if the translators involved had been recompiled for Windows or if they are using the DOS version of the software. The judges would not see the fact that a DOS version of any Braille translation program was being used as a disadvantage.

The programs do a reasonably good job when it comes to creating quick and dirty Braille translation. This feature would be useful in settings where Braille documents were required ASAP. One example of this would be in an educational environment, where a blind student needed Braille copies of handouts that were being distributed to the sighted members of his/her class. One thing to keep in mind here is, that no matter which of the three programs one is using, the ability to scan and then emboss will result in readable Braille, but it will not in most cases result in properly formatted Braille. To create Braille documents whose format is correct, a working knowledge of a Braille translation software package and a word processor is required.

K1000's host of new features highlighted the introduction of version 4.0. This system comes with both Flex Talk and Lernout & Hauspie text-to-speech engines. This feature impressed the audience later in the session when David Bradburn scanned a page from a pamphlet he found in his hotel room. The page was an AT&T instruction page on how to make a long distance call. The same information was presented in several languages. Amazingly, the K1000 not only changed its recognition language, but also switched text to speech engines to immediately read the different languages as they appeared on the page. Neither of the judges is fluent in German, French, and Spanish, but the system sounded like it was recognizing and announcing the text properly.

Ever since version 3.0, the K1000 has had some impressive editing features that made document management a snap. These included features like re-scanning a page, moving and deleting pages, and direct editing capabilities. While both Ruby and K1000 now support spell checking, Kurzweil sports a simple-to-use feature that lets you put common scanning mistakes and their correct spellings into a list so the software always replaces the misrecognized word with its correct version during the recognition process. The K1000 also optionally removes the hyphens at the ends of lines when the system determines they belong to a hyphenated word. If you scan a lot of books, this is a function that you probably already perform with your word processing software with search and replace or with macros, but bringing this useful function directly to the user interface exemplifies the kinds of new conveniences introduced by this release.

As stated earlier, one feature that both impressed and disturbed the judges was the K1000's tendency to entirely throw out poorly recognized text, although David Bradburn made a convincing argument complete with a dramatic demonstration with a better sounding page than the same one scanned by the other two systems. Bradburn demonstrated this when it came to a side bar in the text that was both a different size than the rest off the page and in a totally different font. The continuity of the page seemed better when the text was completely rejected. He also assured them that it was possible to retain this partially recognized text if the user was interested in placement on the page or some other aspect of the less-than-perfect recognition.

We have already pointed out what we feel the drawbacks to this approach are. Bradburn explained that K1000's distinguishing qualities were its high degree of accuracy, host of features, including international recognition and speech, and its simplicity. Noting that most users could be up and running within ten or fifteen minutes, Bradburn noted the system came with a videotape to quick start, and the manual in Braille. The setup was also the first self-voicing installation program in the industry, according to Steven Baum. He made his point fairly clear when he responded to a question about the feasibility of a totally blind person installing the system by saying that if you could insert the CD into the drive and come back in fifteen minutes, you would have it installed.

The judges asked each vendor to bring and scan a page that would highlight the unique capabilities of his particular system. This is where David Bradburn brought that AT&T pamphlet page with the multiple languages.

One of the vendors brought a page that had straight text at the top of the page and skewed text toward the bottom of the page. K1000 did a fine job on this page, but Arkenstone decided the whole page was skewed and only got the skewed part of the page right. VIP didn't do well, for some reason, on either part of this page.

One of the judge's questions involved the issue of self-voicing applications having the problem of user interfaces that don't always provide adequate review capabilities. This is often a problem with self-voicing applications when the user gets a prompt that he/she didn't understand correctly the first time. The question is, how does one get to repeat that information? When you use a screen reader with off-the-shelf applications, you use your screen reader's review commands to examine the material you want to hear again. Since these self-voicing applications provide their own interface and you don't run your screen reader while using these programs, they must provide a way to repeat relevant material.

VIP's answer to this problem was to press Escape and repeat the command sequence that originated the prompt in the first place. The problem with this approach is, of course, that there are some situations where you won't want to press escape to cancel the current operation, because you don't necessarily know or remember what the current operation is. It is possible that you will cancel something you don't want to cancel. Ruby and K1000 both provide a way to repeat prompts, but we did see cases where the interface wasn't always as cleanly implemented as it could be.

Each of the vendors demonstrated their product's ability to store documents in a variety of word processor formats. Both Ruby and K1000 provided direct translation to several word processor formats including--in K1000's case--PeachTree Write, an obscure format that nevertheless delighted the audience with the sheer volume of word processors supported. VIP used the standard Windows cut-and-paste capabilities, which has both advantages and disadvantages. All three programs do permit permanent storage of scanned documents in a variety of formats, a feature that also makes it easier to emboss accurately translated and formatted Braille.

Each of the three products supports ways of changing the color of the foreground and background text and the size of the font. All three programs also support a feature that shows a highly visible mark that moves word by word as the program reads the text. This is an extremely useful feature for low vision or LD users.

When it came to showing other features not specifically outlined already in the session, each vendor took a few minutes to elaborate. Jim Bliss discussed how the hand-held camera that came with VIP allowed the low vision user to examine hand written notes and packaged goods--items, in other words, that would not lend themselves to optical character recognition. This is, again, a feature that requires some vision to use. Bliss also showed how VIP could be used for email and Internet www browsing. The VIP system has its own self-voicing web browser, similar to IBM's Home Page Reader or the Productivity Work's PW Webspeak. Though such features are fine in their place, we do not feel that they are important when it comes to an optical character recognition package.

David Bradburn showed how the K1000 could be combined with voice recognition software to provide a solution for people who can't use the keyboard. While the judges do not feel that voice recognition technology is advanced enough to allow the novice user complete control over the system, the capability has possibilities for individuals with limited motor skills. It should also be noted that adding voice recognition capabilities should be possible for all three systems.

In concluding, Jim Fruchterman emphasized Ruby's retention of the Classic Open Book user interface as an option for traditional users of the product. At the same time, Ruby offers the full power of a true Windows 32-bit application. He also described the detailed context-sensitive help that is now present in the product, making Ruby a tool that almost anyone should be able to use quickly and easily.

What conclusions might be drawn from Dueling Scanners 99? Unfortunately, it is not a simple matter of one program being vastly superior to another. Each has strengths and weaknesses. But perhaps we can provide some general statements that might assist one in determining which of these packages would be the correct choice. For a totally blind user, the choice is difficult. Both Arkenstone's Open Book: Ruby Edition and Kurzweil Educational Systems Group's Kurzweil 1000 offer accurate recognition and a wealth of features. Both programs can be easily learned by novice and advanced users alike. Both programs are fairly easy to install. Both of them allow the user to work with a standard Windows interface or one that utilizes the keys on a PC's numeric keypad. Insofar as recognition accuracy is concerned, the differences in performance between these two programs were not great enough to declare a clear winner. Since in these judges' opinion, the most important determining factor is accuracy, one could not go wrong with either of these packages.

Thus having said that overall recognition accuracy is far too close to call, one must make a choice between these two programs based upon their respective feature sets. And, to some extent, this is not an easy thing to do. Both programs allow one to edit, use a spell checker, dictionary, and thesaurus and they both allow one to set up a list of launchable applications that can be run without closing the respective programs. One feature that we particularly like that is present in Kurzweil 1000 and not available in Open Book: Ruby Edition, is the ability to create and edit a list of corrections that can be applied to the recognition process either automatically or upon the user's request. But would this feature alone be enough to make a user decide to purchase Kurzweil 1000 over Open Book: Ruby Edition? We think not. We believe that users and rehabilitation professionals alike owe it to themselves to examine each of these programs carefully, keeping in mind their particular needs as well as those of their clients.

It will be noted that thus far in these conclusions, no mention has been made of VIP from Jbliss Imaging Systems. This is because, in the opinion of these judges, this program is simply not appropriate for use by a totally blind individual. There is simply not enough verbal feedback built into this package. In fact, even if an individual has some useable vision and his/her primary purpose for purchasing the program is optical character recognition, we believe that the greater accuracy provided by both Open Book: Ruby Edition and Kurzweil 1000, and the low vision features built into those programs, would still make either of them a better choice. However, we feel that if some of the other capabilities that VIP provides such as the ability to process pictures and the use of a digital camera are important to the low vision user, VIP is clearly the program to buy.

Though these conclusions do not provide the easy answers that all of us long for, we feel that many of the features and functions of these programs have been discussed in this report, and that it is now up to you to decide which of these programs is right for you or your client.

Here is the Response of Kurzweil Educational Systems Group to the above report:

Before we begin with our response to some of the observations and conclusions made by the judges at 'Dueling Scanners,' we would like to extend our thanks to Peter Scialli for organizing this event. It was an enjoyable start to CSUN 1999.

Here are our comments:

  1. Price and Performance: No-one addressed performance in the report. Halfway through the session, an audience member commented that Kurzweil 1000 (K1000) appeared to be the fastest system. That was confirmed by timings. KESI's response was that we were glad to hear that because we were using the slowest PC of all the vendors - a 333MHz Celeron computer. In addition, we announced that K1000 pricing was now $995 ($1,195 with DECtalk Access-32 speech). We also think it is worth noting that Arkenstone used a hardware DECTalk synthesizer in their system. We are not sure why, given that their product now includes software speech, but it does, of course, have a significant impact on the final price to the customer.
  2. Degraded Text: The K1000 is the only product to offer the choice of recognizing everything on a page - poor quality print and coffee stains alike - or ignoring them for a clearer understanding of a document. We find that most people don't want to hear long sequences of punctuation marks. None of the systems were capable of recognizing the text printed in a highly ornamental font. Only The K1000 was capable of ignoring it entirely, if the operator so chose.
  3. Accupage: A reference indicated that the high level of recognition accuracy exhibited by K1000 [and Ruby] was through using a scanner with Accupage. K1000 did not make use of Accupage at any time during the session. Nor did we adjust the brightness setting.
  4. Editing: K1000 has included the Editing feature since v2.0, which we launched in the summer of 1997.
  5. Test Documents: While no mention was made of the documents bought by the other vendors, it was KESI who provided the skewed document. The AT&T pamphlet mentioned was actually used to address the question of international languages. This document demonstrated K1000's unique ability to correctly OCR and speak [in the relevant language] by paragraph. As we recall, clapping was involved at this point.
  6. Closing Statements: Not all of what was said by KESI was captured in this report. Of specific note, we mentioned our industry leading 12-months of FREE updates and superior customer support that is available 12 hours per day, a claim the other vendors did not choose to contest when invited to. We demonstrated another unique feature, 'Text Summarization,' which summarizes the open document.
  7. We agree with the judge's conclusion that it is best for a user to examine each of these programs for themselves, keeping in mind their own particular needs. We were the only vendor who distributed free demonstration CDs at this session and throughout CSUN, making such an examination possible.

Following is the response from Arkenstone, Inc.

I was personally delighted by the opportunity to demonstrate the newest version of Open Book at Dueling Scanners. We appreciate the effort put forth by the judges in crafting a well-written report that highlights the state of the art in reading systems for people with visual impairments, and of course Peter Scialli for organizing it.

The most important point I can make is to encourage prospective users to test out the products they are interested in on their own books and documents. Seeing your own documents read is the fundamental test of a reading system, and allows you to assess that most important factor: accuracy. We feel that our new dual-OCR technology will provide the best possible results, but please judge this for yourself! Testing the program also gives valuable feedback about the design and support built into the product. At Arkenstone, we pride ourselves on the care and attention we place on design issues and incorporating feedback from our many thousands of users around the world.

We would like to expand on a few of the issues mentioned in the report. Our thirteen OCR recognition languages are all installed automatically, so that our users can scan in documents in many different languages. The French, German, Italian and Spanish ViaVoice Outloud speech synthesizers do require a separate install to be run on our Ruby CD. Our self-voicing install is based on the industry standard InstallShield technology, and offers the user the choice between a typical automatic install or a custom installation. The custom installation is important so that our users can make their own decisions about major installation issues, such as which speech synthesizer drivers they want to be available inside Open Book.

Our Windows-standard user interface is designed to provide our users with a very well behaved example of the Windows User Interface. Ruby not only repeats prompts, but also makes it easy to spell out difficult strings, such as file names. Open Book has long served as first product for people just getting started with PCs, and we think that our careful attention to design has extended Open Book's strengths in this area. Users have always appreciated our ability to read documents smoothly and naturally, taking care of issues like removing hyphens from scanned text. At the same time, we've built in many of the powerful capabilities requested by our users.

In conclusion, Arkenstone values our relationship with each of our more than 20,000 users, and our goal is to provide them with the best possible reading tools. Our commitment to this shows in our products, our service and our attitude. We hope that everyone interested in reading tools will take the time to test our commitment and our products!

Jim Fruchterman

The following response was provided by Jim Bliss of Jbliss Imaging:

Dueling Scanners 99 provided an excellent and informative comparison among three products. Even though this session was billed as a comparison of systems especially designed for the totally blind user, there were many in the audience who were interested in the needs of users with low vision. In evaluating the conclusions, it is important to keep in mind that VIP is intended for a low vision audience, as well as blind, which is a different target audience than the other two products. With this audience, it is important that the visual displays, as well as the speech, enable reading to be as fast and easy as possible. VIP's wide range of text attribute adjustments, and choice of four different viewing modes, means that it can be optimized for most visual impairments. We also believe that the combination of features in VIP, (e.g., picture viewing, e-mail, memo writing, Internet, address database, etc. as well as scanned document reading) meets the needs of our target audience in an easy to learn, efficient, and cost effective manner.

With respect to the ease of installation and learning, VIP has made this straight forward and simple without vision. The installation CD is fully voiced and there is built in contextual help in both speech and large print, as well as a full manual on line. In addition there is a six-cassette tape audio tutorial as well as an audio CD tutorial.

Since the CSUN Conference, VIP's scanning and OCR accuracy have been improved.