The American Sign Language Linguistic Research Project (ASLLRP) has involved collaborations between linguists studying the structural properties of American Sign Language (ASL) and computer scientists interested in the challenging problems of sign language recognition from video and generation via signing avatars.
A centerpiece of several related projects has been the creation of expanding, Webaccessible, searchable, linguistically annotated, computationally analyzed video corpora. Included are high-quality synchronized video files showing linguistic productions by native users of the language from multiple angles along with a close-up view of the face. The data sets currently available include a collection of utterances and short narratives, as well as almost 10,000 examples of individual, citation-form ASL sign productions (corresponding to about 3,000 ASL signs, each produced by up to 6 different signers). The Web interface is undergoing enhancements to expand the search capabilities and to allow ready access to visualizations of the computational analyses. The annotation software (SignStream®3) is under development and will soon be released. The publicly available corpora will also soon include a large amount of recently annotated video data.
Domain knowledge related to the linguistic organization of ASL, in large part derived from these annotated corpora, has been incorporated into computational learning methods for detecting the linguistically significant elements of the videos, in particular, for recognition of specific non-manual expressions (i.e., movements of the head and upper body, and facial expressions) that convey essential grammatical information over phrasal domains, and for segmentation and identification of manual signs from continuous signing. The computer-generated analyses of the video also offer great potential for use in linguistic research. Furthermore, the linguistic and computer-based models of the non-manual components of ASL are being leveraged to improve the quality of ASL generation via avatars.
This talk will present an overview of the collaborative research, discuss some of the challenges in relation to the linguistic properties of language in the visual-gestural modality, and describe the shared data sets.
*The work presented here has resulted from collaborations including Rutgers (Dimitris Metaxas, et al.), Gallaudet (Ben Bahan, Christian Vogler), and Boston (Stan Sclaroff, Ashwin Thangali) Universities, as well as the Rochester Institute of Technology (Matt Huenerfauth). This work has also been partially funded by grants from the National Science Foundation.