Motion capture, or mocap, is a technique of digitally recording movements for entertainment, sports and medical applications. It started as an analysis tool in biomechanics research, but has grown increasingly important as a source of motion data for computer animation as well as education, training and sports and recently for both cinema and video games. A performer wears a set of one type of marker at each joint: acoustic, inertial, LED, magnetic or reflective markers, or combinations, to identify the motion of the joints of the body. Sensors track the position or angles of the markers, optimally at least two times the rate of the desired motion. The motion capture computer program records the positions, angles, velocities, accelerations and impulses, providing an accurate digital representation of the motion. This can reduce the costs of animation, which otherwise requires the animator to draw each frame, or with more sophisticated software, key frames which are interpolated by the software. Motion capture saves time and creates more natural movements than manual animation, but is limited to motions that are anatomically possible. Some applications might require additional impossible movements like animated super hero martial arts.

Optical systems triangulate the 3D position of a marker with a number of cameras with high precision (millimeter resolution or better). These systems produce data with 3 degrees of freedom for each marker, and rotational information must be inferred from the relative orientation of three or more markers; for instance shoulder, elbow and wrist markers providing the angle of the elbow. A related technique match moving can derive 3D camera movement from a single 2D image sequence without the use of photogrammetry, but is often ambiguous below centimeter resolution, due to the inability to distinguish pose and scale characteristics from a single vantage point. One might extrapolate that future technology might include full-frame imaging from many camera angles to record the exact position of every part of the actor's body, clothing, and hair for the entire duration of the session, resulting in a higher resolution of detail than is possible today. A newer technique discussed below uses higher resolution linear detectors to derive the one dimensional positions, requiring more sensors and more computations, but providing higher resolutions (sub millimeter down to 10 micrometres time averaged) and speeds than possible using area arrays [1] [2].

Passive optical systems use reflective markers and identify each marker from its relative location, with the aid of kinematic constraints and predictive gap filling algorithms. These systems are popular for entertainment, biomechanics, engineering, and virtual reality applications; tracking a large number of markers and expanding the capture area with the addition of more cameras. Unlike active marker systems and magnetic systems, passive systems do not require the user to wear wires or electronic equipment. Passive markers are usually spheres or hemispheres made of plastic or foam 25 to 3mm in diameter with special retroreflective tape. Manufacturers of this type of system include Vicon-Peak [3], Motion Analysis [4] and BTS [5].

Active marker systems have an advantage over passive in that there is no doubt about which marker is which. In general, the overall update rate drops as the marker count increases; 5000 frames per second divided by 100 markers would provide updates of 50 hertz. As a result, these systems are popular in the biomechanics market. Two such active marker systems are Optotrak by Northern Digital [6] and the Visualeyez system by PhoeniX Technologies Inc.[7].


Newer active marker systems such as PhaseSpace [8] modulate the active output of the LED to differentiate each marker, allowing several markers to be on at the same time, while still providing the higher resolution of 3,600 x 3,600 or 12 megapixel resolution while capturing at 120 (128 markers or four persons) to 480 (32 markers or single person) frames per second. The advantage of using active markers is intelligent processing allows higher speed and higher resolution at a lower price which competes with magnetic and inertial systems but provides the submillimeter resolution of optical systems. This higher accuracy and resolution requires more processing than older passive technologies, but the additional processing is done at the camera to improve resolution via a subpixel or centroid processing, providing both high resolution and high speed. By using newer processing and technology, these motion capture systems are about 1/3 the cost of passive systems.

Magnetic systems, calculate position and orientation by the relative magnetic flux of three orthogonal coils on both the transmitter and each receiver. The relative intensity of the voltage or current of the three coils allows these systems to calculate both range and orientation by meticulously mapping the tracking volume. Since the sensor output is 6DOF, useful results can be obtained with two-thirds the number of markers required in optical systems; one on upper arm and one on lower arm for elbow position and angle. The markers are not occluded by nonmetallic objects but are susceptible to magnetic and electrical interference from metal objects in the environment, like rebar (steel reinforcing bars in concrete) or wiring, which affect the magnetic field, and electrical sources such as monitors, lights, cables and computers. The sensor response is nonlinear, epecially toward edges of the capture area. The wiring from the sensors tends to preclude extreme performance movements. The capture volumes for magnetic systems are dramatically smaller than they are for optical systems. With the magnetic systems, there is a distinction between "DC" and "AC" systems: one uses square pulses, the other uses sine wave pulses. Two magnetic systems are Ascension technology and Polhemus.

A motion capture session records only the movements of the actor, not his visual appearance. These movements are recorded as animation data which are mapped to a 3D model (human, giant robot, etc.) created by a computer artist, to move the model the same way. This is comparable to the older technique of rotoscope where the visual appearance of the motion of an actor was filmed, then the film used as a guide for the frame by frame motion of a hand-drawn animated character.

Inertial systems use devices such as accelerometers or gyroscopes to measure positions and angles. They are often used in conjunction with other systems to provide updates and global reference, since they only measure relative changes, not absolute position.

RF (radio frequency) positioning systems are becoming more viable as higher frequency RF devices allow greater precision than older RF technologies. The speed of light is 30 centimeters per nanosecond (billionth of a second), so a 10 gigahertz (billion cycles per second) RF signal enables an accuracy of about 3 centimeters. By measuring amplitude to a quarter wavelength, it is possible to improve the resolution down to about 8 mm. To achieve the resolution of optical systems, frequencies of 50 gigahertz or higher are needed, which are almost as line of sight and as easy to block as optical systems. Multipath and reradiation of the signal are likely to cause additional problems, but these technologies will be ideal for tracking larger volumes with reasonable accuracy, since the required resolution at 100 meter distances isn't likely to be as high.

The procedure Edit

In the motion capture session, the movements of one or more actors are sampled many times per second.

If desired, a camera can pan, tilt, or dolly around the stage while the actor is performing and the motion capture system can capture the camera and props as well. This allows the computer generated characters, images and sets, to have the same perspective as the video images from the camera

A computer processes the data and displays the movements of the actor, as inferred from the 3D position of each marker. Older passive marker systems are notorious for requiring a human to spend a great deal of time to "clean up" the data elminating marker swapping. A single sensor mis-reading might cause the computer to believe that the actor's arm was pointed straight up into the air for a fraction of a second, for example, when it was not. Newer hardware with active markers reduces marker swapping, but still requires software to fill in gaps since all optical systems suffer from occlusion.

After processing, the software exports animation data, which computer animators can associate with a 3D model and then manipulate using normal computer animation software such as Maya or 3D Studio Max (Now both owned by Autodesk!). If the actor's performance was good and the software processing was accurate, this manipulation is limited to placing the actor in the scene that the animator has created and controlling the 3D model's interaction with objects.

Advantages Edit

Mocap offers several advantages over traditional computer animation of a 3D model:

  • Mocap can take far fewer man-hours of work to animate a character. One actor working for a day (and then technical staff working for many days afterwards to clean up the mocap data) can create a great deal of animation that would have taken months for traditional animators.
  • Mocap can capture secondary animation that traditional animators might not have had the skill, vision, or time to create. For example, a slight movement of the hip by the actor might cause his head to twist slightly. This nuance might be understood by a traditional animator but be too time consuming and difficult to accurately represent, but it is captured accurately by mocap, which is why mocap animation often seems shockingly realistic compared with hand animated models. Incidentally, one of the hallmarks of rotoscope in traditional animation is just such secondary "business."
  • Mocap can accurately capture difficult-to-model physical movement. For example, if the mocap actor does a backflip while holding nunchucks by the chain, both sticks of the nunchucks will be captured by the cameras moving in a realistic fashion. A traditional animator might not be able to physically simulate the movement of the sticks adequately due to other motions by the actor. Secondary motion such as the ripple of a body as an actor is punched or is punching requires both higher speed and higher resolution as well as more markers.


On the negative side, mocap data requires special programs and time to manipulate once captured and processed, and if the data is wrong, it is often easier to throw it away and reshoot the scene rather than trying to manipulate the data. Newer, lower cost active marker optical systems allow real time viewing of the data to decide if the take needs to be redone.

Another important point is that while it is common and comparatively easy to mocap a human actor in order to animate a biped model, applying motion capture to animals like horses can be difficult.

Motion capture equipment costs tens of thousands of dollars for the digital video cameras, lights, software, and staff to run a mocap studio, and this technology investment can become obsolete every few years as better software and techniques are invented. Some large movie studios and video game publishers have established their own dedicated mocap studios, but most mocap work is contracted to individual companies that specialize in mocap.

Applications Edit

Video games use motion capture for football, baseball and basketball players or the combat moves of a martial artist.

Movies use motion capture for CG effects, in some cases replacing traditional cell animation, and for completely computer-generated creatures, such as Gollum, Jar-Jar Binks, and King Kong, in live-action movies.

Virtual Reality and Augmented Reality require real time input of the user's position and interaction with their environment, requiring more precision and speed than older motion capture systems could provide. Noise and errors from low resolution or low speed systems, and overly smoothed and filtered data with long latency contribute to "simulator sickness" where the lag and mismatch between visual and vistibular cues and computer generated images caused nasea and discomfort.

High speed - high resolution active marker systems can provide smooth data at low latency, allowing real time visualization in virtual and augmented reality systems. The remaining challenge that is almost possible with powerful graphic cards is mapping the images correctly to the real perspectives to prevent image mismatch.

Motion capture technology is frequently used in digital puppetry systems to aid in the performance of computer generated characters in real-time.

Related techniques Edit

Facial motion capture is utilized to record the complex movements in a human face, especially while speaking with emotion. This is generally performed with an optical setup using multiple cameras arranged in a hemisphere at close range, with small markers glued or taped to the actor's face.

Performance capture is a further development of these techniques, where both body motions and facial movements are recorded. This technique was used in making of The Polar Express, where all actors were animated this way.

An alternative approach was developed by a Russian company VirtuSphere, where the actor is given an unlimited walking area through the use of a rotating sphere, similar to a hamster ball, contains internal sensors recording the angular movements, removing the need for external cameras and other equipment. Even though this technology could potentially lead to much lower costs for mocap, the basic sphere is only capable of recording a single continuous direction. Additional sensors worn on the person would be needed to record anything more.

See also Edit

External links Edit

Motion capture hardware Edit