Timing Video to Audio
Wherein Mr. Video asks Ms. Audio: "What's my motivation in this scene?"
By Chris and Trish Meyer | March 13, 1998
In the case where the audio (typically, music) has already been supplied for a project, the first task is "spotting" the audio file to find the most interesting moments in it. The way to do this is by looking for peaks in the waveform in an audio file's "clip" window, listening to the sound around this point to verify what is actually going on at this moment in time, and setting a marker (a vertical line with a flag on top) for each interesting event.
We tend to use simple, unnumbered markers (set in After Effects by using the asterisk key on an extended keyboard's numeric keypad; M is the magic key in Final Cut Pro) to denote at least the start of each musical bar or measure, and annotated markers for major sections of the music, such as chorus, guitar solo, et cetera. Sometimes we will even mark every major beat in a piece of music. The times of these markers are then transferred to some form of animation or "hit" sheet, along with a description of what each one was. We refer to this when setting up the timing of keyframes for our animations.
When spotting music, we will place a marker at the start of each measure or even each beat. We often add a second layer (the orange track here) to hold additional markers and comments that describe the sections of the music, as well as master markers along the timeline (the numbered markers here) to quickly navigate between these sections.
By now, you are probably saying "You've explained how to find peaks, which are probably great clues to drum hits in the music, but what the heck is a beat, bar, or measure?" Most music is divided into divisions of time known as bars or measures. A basic rhythmic cycle of music tends to occur one per measure. Most popular music today is written in "4/4" time. The bottom number defines what the basic unit of time - a beat - is; in this case, it's a quarter note. The top number in this time signature defines that four of them make up a bar or measure of the music. If the music is not in 4/4 time, the next most common case is that it's in 3/4 time (three beats to a measure). A waltz is the most common example of music in 3/4 time (dum-da-da-dum-da-da).
If you tend to clap your hands or tap your toes along with music, you will find you usually tap or clap in sync with each beat. Think about disco music for a second: That constantly thumping bass drum is pounding out each beat. Not all music, of course, is this obvious, but it usually isn't too hard to figure out the beat. The beat that seems to make you want to clap or tap the loudest is usually the "downbeat" or the first beat per measure. This is the most important beat in a measure of music to match visual cues to. Try listening to some rhythmic songs, and instead of clapping or tapping, count "1, 2, 3, 4" in time with the beats - with the 1 being the downbeat - to start to develop a feel for this.
Sometimes you will be provided ahead of time with a numeric value for the tempo of a piece of music. If so, this will greatly aid you in making sure you're detecting bars and beats correctly. The most common unit of measure used for tempo is "beats per minute" (abbreviated as "bpm"). It means exactly what it says: This is how many beats occur in each minute. For example, if the tempo is 120 bpm, take the number of seconds in a minute (60), divide it by the bpm, and you now know the length or spacing of each beat in seconds - in this case, 0.5 seconds.
To calculate what this means in frames, multiply this value by your frame rate. At 30 frames per second, 120 bpm works out to 15 frames per beat (60 Ã· 120 = 0.5; 0.5 x 30 = 15). Indeed, frames per beat - "fpb" for short - is the second most common unit of measure for tempo, although typically only musicians who create music specifically for video or film have ever heard of it. If you are working in a different visual frame rate, such as 24 frames per second (most film), just plug in that number in place of the 30 above. Using the example above, 60 seconds Ã· 120 bpm = 0.5 seconds x 24 = 12 fpb for film. For PAL, use 25; for NTSC video, use 29.97. There is a shortcut to this math: Assuming 30 fps and music in 4/4 time, the quickest path to fpb is to divide 1800 by the tempo in bpm (i.e. 1800 Ã· 120 bpm = 15 fpb). For film, PAL, and NTSC, the magic number is 1440, 1500, and 1798.2 respectively.
To find out the duration of each measure, multiply the resulting frames per beat value by the number of beats per measure. If the music is in 4/4 time (four beats to a measure), multiply the answers above by 4: A tempo of 120 bpm now works out to 2.0 seconds, or 60 frames per measure at a frame rate of 30 fps. Common tempos for popular music range from 80 to 120 bpm, although it can range all over the place from a lazy jazz shuffle of 60 bpm to hyperkinetic rave music with a tempo of 160 bpm. If you are having trouble spotting all the beats in a piece of music just by looking at the waveform, the frames per beat and frames per measure values should provide a general guide for how often you should be locating beats and the starts of measures.
Don't be too alarmed if it wanders a frame or so on any given beat. This happens because tempos often work out to fractional numbers of frames per beat. A tempo of 110 bpm, for example, works out to 2.182 seconds per measure, or 0.545 seconds per beat; multiplied by a frame rate of 30, you get 16.364 fpb. In the case where a beat lands between whole frames, you just have to pick the nearest frame.
Elsewhere on PVC, we've posted an article that includes a list of "magic tempos" that work out to simple integer numbers of frame per beat. If you are working with a musician, try to get them to use one of these tempos to make your life easier.
Bars and beats are great timing references for when visual events should happen. When in doubt, cut, start, or stop a scene or effect at the beginning of a measure, and perform fades over one or two beats. For faster events, you can usually keep dividing the length of beats by multiples of two to find good sub-hit points; dividing them by threes can also give them an interesting feel (known in musical terms as "triplets"). This, of course, is not a hard and fast rule you should follow at all times - what looks and feels best should always take precedence over mathematical rules - but it can give you a helpful framework to start with.
I have been focusing on music, but similar principals can be applied to sound effects. Again, look at the tallest peaks in a sound effect's waveform profile to help locate the important points, such as when a door slams or train engine passes by. Mark these down on your animation timing sheet just as you would downbeats - the only difference is they don't follow any mathematical logic of beats between the peaks.
next page: tips, and a case study
Get articles like this in your inbox: Sign Up