(Page 1 of 3 pages for this article 1 2 3 >)
Friday, March 13, 1998
Timing Video to Audio
Chris Meyer | 03/13
Wherein Mr. Video asks Ms. Audio: “What’s my motivation in this scene?”
In the days of yore, editing video and audio used to be considered two different disciplines. Today, most desktop and non-linear video editing programs also edit audio with the same tools and capabilities. As a result, video editors are all but expected to also edit their own audio.
For most, this means just mixing together narration, music, and the occasional sound effect. However, if go one step further and make your video edit decisions based on the audio - and vice versa - you will end up with a final program that is tighter, and more compelling to watch, than if you just let the respective cuts fall where they may. The same goes for 2D and 3D animation: Allowing audio to inform your timing decisions results in a stronger overall experience.
Don’t know anything about audio or music? Hang on and we’ll give you a crash course in the next two pages. Already have a good idea of how audio and music work? Jump ahead to the section titled Cut Time on Page 3 and we’ll go over a few tips and tricks to keep in mind, followed by a brief case study. As with any artistic discipline, rules are meant to be stretched and broken - but they give you an important head start.
Audio Basics
For there to be a sound, something must vibrate. This vibration could be a guitar string swaying back and forth, a drum’s head jumping up and down, or pieces of glass shattering when a baseball goes through a window. We’ve all seen a speaker cone jumping in and out - same thing. These motions vibrate the air, pushing it towards you and the pulling it back away from you. This in turn pushes your eardrum around, causing it to flex in sympathy. This stimulates nerves in your ears, which ultimately convince your brain that a “sound” has occurred.
The pattern and nature of these vibrations affect the character of the sound we perceive. The stronger the vibrations, the louder the sound. The faster the fundamental pattern of vibrations, the higher the apparent “pitch” of the sound. The basic back-and-forth vibration cycle also usually has a detailed pattern of small jerks back and forth inside the larger overall pattern; the unique details of this inner pattern gives us clues that allow us to distinguish one sound from another, even if they are of the same loudness and basic pitch. Humans can perceive vibrations from a speed of twenty back-and-forth cycles per second to as high as twenty thousand cycles per second - a lot faster than the pokey frame rate of video or film.
Sound is recorded by intercepting these vibrations in the air with a device akin to our eardrum - typically, a microphone - which converts them into electrical signals with a similar vibrational pattern. In a computer environment, these vibrations are frozen by “digitizing” or “sampling” that electrical signal. When sound is digitized, its instantaneous level (how much the air has been pushed towards or pulled away from the microphone) is measured (i.e. sampled) and converted into a number (i.e. digitized) to be stored in the computer’s memory. A very short instant later, the signal is measured again to see how the air pressure changed since the last measured moment in time. This process is repeated very quickly over a period of time to build up a numeric picture of what the pattern of vibration was.
This resulting “waveform” is typically displayed on a computer screen by drawing a point or line that represents the air pressure at one point in time, followed by additional points or lines that represent succeeding points in time. As a result, you can “read” a waveform from left to right to see get an idea of the vibrational pattern. Almost no one can look at the resulting squiggles and tell you precisely what the sound was, but you can pick up some clues: Namely, louder points in time will be drawn taller than quieter points in time; if you zoom in far enough to see individual wave cycles, cycles that take longer relatively to fluctuate up and down are lower in pitch that ones that fluctuate more quickly.
The figure below shows a computer display of a simple audio waveform. As the curve of the wave goes above the center line, air is being pushed towards you; as it goes below, air is being pulled away. Time passes from left to right; the markings along the bottom are in 10 millisecond (hundredth of a second) increments - giving an idea of how fast sound vibrates. The second figure shows a different sound, zoomed in the same amount as the first figure. Since the up and down excursions are not as tall as the previous figure, you now know that this sound is relatively quieter; since the up and down excursions are happening faster, you know it is higher in pitch. These are very simple sounds, zoomed in a great deal - most sounds are more complex, but they follow the same basic rules.

Two simple sounds, captured in the computer and zoomed in the same amount. Since the waveform in the upper figure is taller, this sound is louder; since the waves in the lower figure are more closely spaced together, this sound is higher in pitch.
When we are animating or editing visuals to sound, the most interesting points in the audio tend to be the loudest ones: the moment a door slams, lightning cracks, a drum is hit, or a baby’s crying reaches its crescendo. By looking for these “peaks” - taller points in the audio waveform - we have a tremendous head start in finding the more interesting audio events, which we can then use as a start to base our visual animations around.
next page: music basics
(Page 1 of 3 pages for this article 1 2 3 >)
You must be registered to comment. This is an effort to reduce spam. Please REGISTER HERE.
|