TR #404: Understanding Manipulation in video

Matthew Brand March 1996

Appeared in:

Proceedings, 2nd International Conference on Automatic Face and Gesture Recognition (FG96)
Killington, Vermont


Manipulations are a significant subset of human gestures that are distinguished by the fact that their logic and meaning are particularly clear, being heavily constrained by physical causality. We present techniques and causal semantics for interpreting video of manipulation tasks such as disassembly. Psychologically-based causal constraints are used to detect meaningful changes in the integrity and motions of foreground- segmented blobs; a small causal model of manipulation is used to disambiguate and parse these into a coherent account of video's action. The causal constraints are drawn from studies of infant perceptual development; as with infants, they precede and may possibly even bootstrap the ability to reliably segment still objects. Our implementation produces a script of the causal evolution of the scene -- output that supports cartoon summary, automated editing, and higher-level reasoning.