音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

Web Posted on: August 4, 1998


Automated production of a daily electronic newspaper: exploitation results

Geert Bormans
Jan Engelen

Katholieke Universiteit Leuven
Res.Group TEO &#150
Document Architectures
Kardinaal Mercierlaan 94
B-3001 Leuven &#150
Heverlee, Belgium
tel: + 32 16 32 18 66
fax: + 32 16 32 19 86
email: Geert.Bormans@esat.kuleuven.ac.be

1. Summary

DiGiKrant offers to reading impaired persons a daily, electronic and complete version of the Flemish newspaper, "De Standaard" since January 1997. The newspaper is sent daily to subscribers on a disk or via e-mail. The electronic newspaper on disk reaches the subscriber each morning before 8 AM. As the format, the SGML EIF, a result from the TIDE-CAPS project, was chosen. The newspaper can be read with the CAPS Reading Station, another CAPS result, interfacing to a Braille reading device, a voice synthesiser or a screen magnification.

This paper describes the full process of the daily creation of an SGML newspaper starting from articles in a relational database. It also describes the distribution process. The involved parties will be presented briefly.



| Top |

2. Preliminary Issues

The main focus of the Consortia in the TIDE-CAPS projects has always been directed towards the increase of accessibility to digital information for print disabled persons [Reference 1]. The work of CAPS started at the end of 1991 as a Pilot Phase project in which the access to digital newspapers was investigated. The main results of this first phase were a structural description of a newspaper in the Standard Generalized Markup Language (SGML) [Reference 2], a special workstation for reading newspapers in this format, and an extensive user evaluation.

Documents formatted in SGML, consist of several parts of which the Document Type Definition (DTD) and the Document Instance (the actual content of the document) are the most important ones. The advantage of SGML is that the structure of the document is separated from the content. This way an application can be built which is able to access the information and present it through a defined user interface.

For the Flemish partners involved in the CAPS [Reference 3] projects, DiGiKrant was a logical extension of the research results. A new project was set up. The further research was funded by the Flemish Government (IWT, Flemish Institute for the Promotion of Scientific-Technological Research in Industry). The consortium consisted of 4 groups, having a complementing functionality. The information for the electronic newspaper is provided by Vlaamse Uitgevers Maatschappij (VUM), the most important publisher of Flemish Newspapers. The BrailleKrant v.z.w. initiated the research. This group is a service provider for print disabled people and was already publishing a braille extraction of "The Standaard", in co-operation with VUM. For the extraction and SGML conversion the Research Group on Document Architectures of Leuven University was involved. Sensotec is a company, developing and distributing technical aids for visually impaired persons. Sensotec was involved in CAPS for the development of the Reading Station and had the same role in the DiGiKrant project.

Two research topics needed to be explored more thoroughly before actually being able to get to a commercial end product. Research was needed on the information providers’ site. The consortium had to explore how, if possible, the texts from the publishers’ database could be extracted and converted into SGML. Further research was also needed on the service providers’ site. What modifications in the existing reading software would users like within this specific case study? What would be the most interesting way to get the service to the public?

This new project finally resulted in a commercial end product.



| Top |

3. Production Process

This chapter explains the production process starting from the creation of the articles up to solving the interface to the reader.

3.1 Database Extraction

The articles for the newspaper are individually stored in separate files, distributed over several UNIX-disks. References to the file and all the metadata are stored in a relational database. The articles are edited and stored in a proprietary format. In a first step the newspaper structure was extracted from the relational database via SQL scripting. The newspaper articles were selected from the structure information and prepared for transferring to a DOS-environment. This way all the section, structure and publication date related information is maintained. Both the structure and date information and the collection of text-files is then transferred from the UNIX machines to a dedicated PC to be converted into SGML.

3.2 SGML Conversion

In a second step the articles themselves are converted into SGML and collated into the newspapers’ structure. The conversion software was built in a highly generic way to cope with continuously ongoing and badly documented changes in database structure or text format.

The DiGiKrant process is active inside the premises of the paper newspaper. Therefore safety precautions had to be taken in order to make sure that the process of printing the "normal" newspaper cannot be disturbed in any way by the creation of the electronic newspaper.

Of course, the highest priority for the publisher lies with the daily production of the paper edition. Errors in the database are often only changed in the final page layout. Layout options are changed, customised for writers’ preferences e.g., almost on daily basis. The conversion software needed some intelligence to discover and handle freshly invented tagging and a second run module was necessary to cope with errors in the database. Severe errors are corrected in the first run and if necessary the total conversion process is restarted.

3.3 Distribution

On the users’ site some modifications were made in the existing reading software. From all the different solutions for getting the newspaper to the subscriber a simple floppy disk was chosen. For the still limited number of subscribers this was found the cheapest and most secure distribution medium. The publisher's printed newspapers distribution network is used for transporting the disks. This way an on time delivery is assured.

Other media like cable TV or radio broadcasting were investigated. Since at this time no magazines or other newspapers are electronically available for the users in Belgium, etc. these media have a too high start-up price. As it is the aim of the service provider, BrailleKrant v.z.w., to open the access to as much information as possible as soon as possible, it may be expected that having an increasing number of subscribers and offering an increasing number of information sources, other media techniques will become important in near future.

At the moment it is already possible for subscribers having an e-mail account, to receive the newspaper as an attachment to an e-mail.

3.4 Reading Station

The newspaper can be read with the CAPS-Reader, another CAPS result, interfacing to a Braille reading device, a voice synthesiser or a screen magnification. After an in-depth user assessment several new tools (extended search facilities, copying text fragments, archiving possibilities,…) were added to the existing software [Reference 4].



| Top |

4. Development and future plans

The daily electronic version of "De Standaard" as such is a direct commercial result from two TIDE-projects. "Het Nieuwsblad", a second newspaper from the same group has been made available for reading impaired people from March 1998. This newspaper has 13 regional editions which are all available now. Negotiations with other newspaper groups are currently going on in order to offer reading impaired people within a couple of years a broad spectrum of different Flemish newspapers.

In future work will be done to improve the access to magazines. Research has started to use an editor version of the reading station to enable note-taking in course handling for visually impaired students.



| Top |

Acknowledgement

This work has been partly funded by the Flemish Government via the Flemish Institute for the Promotion of Scientific-Technological Research in Industry ("Instituut voor Wetenschap en Technologie", IWT, DiGiKrant-project).



| Top |

References

[1] [2] [3] [4]

  1. Jan J. Engelen, Filip M. Evenepoel and Frank Allemeersch, Standardisation of Accessibility: the Tide CAPS and HARMONY Projects, 2nd TIDE Congress, Paris, 26-28 April 1995, The European Context for Assistive Technology, pp. 59-65, IOS Press, Amsterdam, ISBN 90-5199-220-3.
  2. B. Bauwens, F. Evenepoel, J. Engelen, C. Tobin and T. Wesley, Structuring Documents: The Key to Increasing Access to Information for the Print Disabled, 4th International Conference on Computers and Handicapped People, Vienna, September 1994, Lecture Notes in Computer Science, 860 pp. 198-206, Springer Verlag, Berlin, ISBN 3-540-58476-5.
  3. The CAPS Pilot and Extension Phase Projects (TIDE project numbers 136 and 218), for more information on the results of these projects, contact the main contractor, Prof. Jan Engelen, K.U.Leuven, Kardinaal Mercierlaan 94, B-3001 Heverlee, Belgium, fax: +32 16 32 19 86.
  4. CAPS-Reader, for more information on this software, contact Frank Allemeersch, Sensotec n.v., Nik. Gombertstraat 21, 8000 Brugge, Belgium, tel.: +32 50 33 76 75, e-mail: frank.allemeersch@sensotec.be.



| Top | |TIDE 98 Papers |