TR#394: Computers Seeing Action

Aaron Bobick

Appears in:
Proceedings of the British Machine Vision Conference,
Edinburgh, Scotland, September, 1996

As research in computer vision has shifted from only processing single, static images to the manipulation of video sequences, the concept of {\em action recognition} has become important. Fundamental to understanding action is reasoning about time, in either an implicit or explicit framework. In this paper I describe several specific examples of incorporating time into representations of action and how those representations are used to recognize actions. The approaches differ on whether variation over time is considered a continuous mapping, a state-based trajectory, or a qualitative, semantically labeled sequence. For two of the domains --- whole body actions and hand gestures --- I describe the approaches in detail while two others --- constrained semantic domains (e.g. watching someone cooking) and labeling dynamic events (e.g. American football) --- are briefly mentioned.