TR#392: Unsupervised Cross-Modal Analysis of Professional Monologue Discourse

Michael A. Casey and Joshua S. Wachman

Appeared at the Workshop on the Integration of Gesture in Language and Speech (WIGLS)
Wilmington, Delaware and Newark, Delaware, Oct '96

This paper describes research in which evidence from audio and visual-kinesic data is combined to obtain an automatic, unsupervised characterization of discourse in the monologues of comedians Jay Leno and David Letterman. We describe the process of obtaining feature vectors from audio and video data and present results of classifying the feature space in terms of statistically significant clusters.