We show how machine perception techniques can allow people to use their own bodies to control complex virtual representations in computer graphic worlds. In contrast to existing solutions for motion capture, tracking people for virtual avatars or intelligent interfaces requires processing at multiple levels of resolution. We apply active perception techniques and use visual attention to track a user's pose or gesture at several scales simultaneously. We also develop an active speech interface that leverages this visual tracking ability; by electronically focusing a microphone array towards a particular user, speech recognition in acoustically cluttered environments is possible. Together, these methods allow virtual representations of people to be based on their actual expression, tracking their body and face gestures and speech utterances as they freely move about a room without attached wires, microphones, or other sensors.