音声ブラウザご使用の方向け: SKIP NAVI GOTO NAVI

Web Posted on: February 24, 1998


SIGNING AVATARS

Carol J. Wideman
President
email: cw_ssi@bellsouth.net
Edward M. Sims, Ph.D.,
Chief Technical Officer
email: es_ssi@bellsouth.net
Seamless Solutions, Inc.
3504 Lake Lynda Drive, Suite 390
Orlando, FL 32817

This paper includes material based upon work supported by the U.S. Department of Education under purchase order number RW-97-076003. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views or policies of the Department of Education.

This paper also includes material based upon work supported by the National Science Foundation under Grant No. DMI-9760230.

Abstract

Over 500,000 hearing impaired Americans are currently excluded from communicating in their most expressive idiom over long distance communication systems. Although widespread availability of Teletype (TTY), electronic mail (e-mail) and electronic facsimile (fax) provide the means to communicate in written English, both speed of communication and the level of personal interaction are significantly diminished relative to the depth of expression available with facial expressions, speech, or sign language. Video communication among homes, schools, and workplaces, although feasible, is too expensive for everyday utilization and does not support any means to translate among the written, spoken, and signed word.

This paper describes research which provides Internet-connected virtual reality representations of the signer's face, arms, and hands, called "avatars", that enable sign language communication via low bit rate standard phone lines and personal computers.

This technology enables lower cost, long-distance sign language communication, and supplies a more natural and more rapid interaction between humans and computer applications. This research is funded in part by the U. S. Department of Education Small Business Innovation Research Program (SBIR) and a grant from the National Science Foundation SBIR program.

Introduction

Provision for universal access to the National Information Infrastructure (NII), especially in its application to Distance Learning, has been identified as one of the leading issues driving policy formulation regarding the NII. Equity of service can only be achieved to the extent that the hearing impaired, and other disabled Americans, are provided with a means to create and receive information content over the new information networks.

The growing availability of multimedia computers at work, at school, and at home presents an opportunity to enrich our lives with new and improved interactions with both deaf and hearing acquaintances and colleagues. Already, low-cost fax and e-mail services are being employed for personal and business communications. However, these written media provide neither the speed nor the full emotional content of the deaf person's native sign language. While audio capabilities are now enhancing the hearing person's computer interactions, providing cues and feedback for education and entertainment software applications, as well as communications, the deaf are not benefiting from similar enhancements.

Although video playback and video teleconferencing are available to the deaf, they address only part of the problem. Playback from disk, which provides sufficient quality for sign visualization, is limited to prerecorded media and does not support significant user interaction. Video teleconferencing, unless provided by prohibitively expensive, high speed links, does not provide sufficient quality to communicate signs effectively. Even if these limitations were overcome, they would not improve the ability of those not trained in sign language to communicate with the deaf, and they would not enrich the interactions of the deaf with non-video based entertainment and education software.

The emergence of low-cost (<$1500) personal computers which can render images of 3-D models at real-time rates makes plausible the development and exploitation of a new means for the deaf to interact with the computer, to communicate with each other over long distances, and to communicate with those who have no knowledge of sign language. It can also aid in the education of the both the hearing and the hearing impaired in the use of sign language. The integration of PC technology into consumer products such as the telephone and television results in a compelling case for 3D avatars communicating in sign language.

Our concept is to develop an avatar and a software library of 3D computer animations, scripted representations of words and concepts. These appear as 3D animated characters on the computer screen, with the full capability to form the signs of American

Sign Language (ASL) and Pidgin Sign English (PSE), which includes more "Englishisms" such as auxiliary verbs that are omitted in ASL. By coupling these animations with voice processing, scripted behaviors, or text readers, the characters will be made to simulate realistic signing in response to voice, actions, or text.

Statement of the Problem

Natural signing requires 100 degrees of freedom of body motions to be exercised at a rate of 2 to 3 signs per second. Signs include 1) twenty-six Finger Spelled Letters - used for proper names and uncommon words, 2) approximately 3500 word signs, formed by arm and hand motions - used for common concepts, and 3) head motions and facial expressions that form part of the grammar and provide emphasis. An animated, signing human character must have articulated shoulders, elbows, wrists, knuckles, and neck, as well as controllable eyes, eyebrows, and lips. These must be updated at a rate of at least 10 frames per second and presented to the viewer in a CRT "window" of at least 100,000 picture elements (pixels). This level of dynamics and resolution goes well beyond what is available in today's video games, but it is within the scope that is achievable using newly introduced PC processor and video accelerator technologies. The technological problem to be solved is how to control this level of dynamics within the bandwidth limitations of pervasive Internet access.

Approach

Seamless Solutions has developed a working proof-of-concept demonstration of real-time, interactive sign language communication over the Internet, on low-cost PCs, using human character animation. This demonstration consists of each of the following components: 1) a fully articulated human character model; 2) a library of "basic" animations: letters, words, and facial expressions; 3) a means for encoding a stream of animations for transmission over the Internet, and 4) an "Animation Engine", implemented in VRML and Java, for converting the transmitted data into Avatar gestures. Seamless Solutions selected the Virtual Reality Modelling Language (VRML 2.0) to model and animate 3-D Virtual Humans for sign language communication. VRML defines a standard language for representing three-dimensional scene that may be communicated over the Internet and animated and viewed interactively in realtime. VRML viewers are available not only as plug-ins to

Internet browsers, but also as interactive objects that may be embedded into standard office documents (multimedia documents, presentation, and spreadsheets). VRML viewers for Netscape and Microsoft Web browsers are available for free download from both Cosmo. (a subsidiary of Silicon Graphics, Inc.) and from

Microsoft.

New human character models were developed with an appropriate level of complexity for signing. These models were built using standard, commercial software such as Kinetix' 3-D Studio Max (TM) and Ligos' V-Realm Builder(TM). Each character is modeled using approximately 5000 polygon facets grouped into 50 segments controlled by 100 degrees of freedom in motion. A project to add deformable facial features (eyes, eyebrows, and mouth) is underway. This level of character complexity was found to be just within the capacity of available PCs to render at real-time (10 to 15 frames /second) rates. We were able to achieve frame rates of 15 frames per second on a Pentium II, 300 MHz desktop computer, and 6 frames per second on a 166 MHz Pentium MMX laptop computer. We anticipate that new laptops introduced this year will support at least the minimally desired 10 frame per second update rate.

Several methods for "capturing" signs and facial expressions (reducing them to digital data) were explored. These included:

1) directly manipulating and storing joint angles using the modeling / animation program, 2) using "inverse kinematics" to generate joint angles from "end effector" positions (i.e., automatically computing elbow, wrist, and knuckle angles from finger tip positions), and 3) motion capture using an instrumented glove and magnetic tracking devices. For a limited vocabulary, a combination of the first two methods has proved to be fully satisfactory. For a full vocabulary, either magnetic or optical motion capture will be desirable.

Results

Our initial results indicate that real-time, interactive communication in sign language over the Internet is in fact feasible within the performance bounds of standard, consumer PCs, commonplace (28.8K) modems, and pervasive, low-cost software (Windows(TM)95 and Netscape or Microsoft Internet browsers). All 100 degrees of freedom of the arm, hands, head, and face can in fact be controlled and displayed with a resolution and update rate that fully supports comprehension. The essentials of this technology can be demonstrated as a proof of concept today.

However, our research also indicates that simple transliteration of English text into signs, without the coordinated facial expressions of ASL grammar, is of limited application - primarily as a teaching aid for those learning to sign. In order to gain full acceptance of the deaf community, it is vitally important to enhance the text-to-sign translation with facial animation. Our "Facial Animation for Communication Enhancement" (FACE) project, funded in part by a grant form the NSF, is addressing this need.

Conclusion

Sign Language Communication via the Internet is achievable usinglow cost processor and modem technologies that are becoming available at this time. A proof of concept has shown that the necessary arm, hand, and facial gestures can be achieved at the necessary resolutions and update rates. New research and development is required to address the need to fully coordinate the facial animations with the arm and hand gestures. We have initiated this research and invite comment and collaboration in achieving our goal for fully expressive, long-distance, multi- user communication for the hearing impaired.

Acknowledgments

Sponsors:

U.S. Department of Education SBIR contract, Dr. Richard Johnson, COTR

National Science Foundation SBIR Grant, Dr. C. Denver Lovett, Program Manager

Consultants:

Dr. Michael Tuccelli, Coordinator, Community Education, St. Augustine School for the Deaf and Blind,

Dr. Pat Kricos, Professor, University of Florida, Department of Communication Processes and Disorders

References

C. Dede, "The Technologies Driving the National Information Infrastructure: Policy Implications for Distance Learning", Southwest Regional Laboratory, George Mason University,

Fairfax, VA, 1994

"Universal Access Project, NII Infrastructure, Initial Draft", University of Wisconsin, 1995.

J. Harkins et al., "Project TFA: Telecommunications for All", 1996.

N. Badler, C. Phillips, and B. Webber, "Simulating Humans: Computer Graphics Animation and Control", Oxford University Press, New York, NY 1993.

S. Reese, "Character Animation with 3D Studios Max(TM ), Scottdale, AZ, 1996.

G. Newby, "Gesture Recognition Using Statistical Similarity", Virtual Reality and Persons with Disabilities Conference Proceedings, 1993.

P. Doenges, T. Capin, F. Lavagetto, J. Ostermann, I. Pandzic, and E. Petajan, ""MPEG-4: Audio/Video & Synthetic Graphics/Audio for Mixed Media", Image Communication Journal, Special Issue on MPEG4, 1996.

N. Badler, B.L. Webber, W. Becket, C. Geib, M. Moore, C. Pelachaud, B. Reich, and M. Stone, "Planning for Animation" in Computer Animation (N. Magnenat-Thalmann and D. Thalmann, eds.), Prentice Hall, 1995.

Roehl, B., "Specification for a Standard VRML Humanoid, Version 1.0", 1997.

Perlin, K., and A. Goldberg, "Improv: Techniques for Scripting Interactive Actors in Virtual Worlds", SIGGRAPH, 1997.