(Page 2 of 2 pages for this article  <  1 2)

Saturday, August 16, 2008

Living with (Data) Loss

mp3 and its cousins are a fact of life… here’s how to get the most out of them

Magic Part II… The Takeaway

If you couldn’t hear it in the first place, is it really missing?

There’s a song about stars at night being big and bright (clap four times) in the heart of Texas. But while that state is certainly nice, we know those same stars also live over Ohio and New York. If you’ve got a good imagination, you can visualize being in Times Square, looking up, and seeing the same starry sky.

image

But while you can imagine it, you’ll probably never see it. Having all those bright signs around makes your eyes less sensitive to starlight, and the atmosphere over Times Square further confuses things by reflecting ‘earthlight’ back at you.

As you learned in my last blog entry, human ears get similarly desensitized when they hear other sounds at nearby frequencies (the nerves can’t handle the data). Add that to the natural ‘blurring’ of even the best playback systems and acoustics (equivalent to those atmospheric reflections), and it’s no wonder we’re sometimes blind to certain details in a recording.

I’ll assume you read that blog entry, or already knew how spectral and temporal masking work. If not (nyah, nyah): you’ll just have to trust me.

Framed!

When you run a signal through an mp3 or similar encoder, the algorithm first breaks the audio into frames, lasting up to a few milliseconds each.

These frames have nothing to do with video frames in the same file. Their length is determined primarily by the data rate - or amount of compression - you’ve chosen. Lower rate files, with more compression, use longer frames.

Each frame is boosted so its loudest wave reaches 0 dBFS. This is to take advantage of every bit during processing. The amount of boost is noted with the frames, so they can be restored to their original volume on playback.

(That’s why normalizing or boosting a raw audio file doesn’t make compression any more efficient. A louder file might be easier for users to hear after it’s been decoded, but that’s a different issue.)

The algorithm looks at each frame and measures how much energy the frame has at different frequencies. The number of frequencies is a trade-off: more bands allows tighter masking, but requires sharper filters that respond more slowly. The mp3 format uses up to 512 bands, other compression systems have more or less.

  • If a particular band is silent during the frame, the process notes it and doesn’t waste any more data there.
  • If a band is loud, it reduces the number of bits. The loud signal will mask noises at the same frequency.
  • If a band is soft, it’s processed with more bits, unless there’s a masking sound in an adjacent band. Then it assumes the band won’t be heard, and deletes it entirely.
  • The resulting audio is run through a data packer similar to WinZip or Stuffit. Normal audio is too complex to compress well in these systems, but they do a good job with the simpler data-reduced frames.

The common mp3 algorithm uses this scheme. How good it sounds depends on how well the encoder has been written, and on the bitrate chosen. The newer AAC algorithm couples it with a quick look at adjacent frames to see if temporal masking will hide even more details. For a given bitrate, a good AAC will sound better than a good mp3.

Your First Choices

The most critical setting in a lossy compression scheme, including mp3 encoders, is the bitrate. Lower bitrates mean longer frames, increasing the chance that masking sounds won’t last the whole time. The result is noise and a flangey or chirping effect.

Which bitrate you consider low, and how much noise or distortion is acceptable, depends on the application. But if you do things right, broadcast-quality sound can be achieved at 128 kbps. One of the most important factors is which encoder you use. Even in a standardized format like mp3, there are multiple trade-offs that program designers have to make.

Commercial encoders are usually better designed in this respect than freeware. Because they’ve also paid licensing fees to the Fraunhofer Institut - inventors of the mp3 format - commercial publishers may have had more access to inner workings of the system. But at least one free encoder, the open-source LAME library, is also very good.

It makes sense to use a high-quality encoder. Other things will help as well:


  • If you have to encode at a low bitrate, get rid of high frequencies first. Apply a low-pass filter at 8 kHz to 12 kHz (or use a good sample-rate converter to lower the rate to 22 kHz, which filters sounds above 10 kHz). The moderate dullness this imparts will be less objectionable than low bitrate noises.
  • Don’t try to help the high-frequency filtering by boosting just below the Nyquist Limit, even though many encoders or sample rate converters give you this choice with a “Preserve Highs” option.  It wastes precious bits on unimportant sounds, and can increase the chance of flanging or chirping.
  • Don’t use extreme broadcast-style level compression, particularly multiband compression. This makes it harder for the algorithm to tell the difference between important sounds and those that can be lost.
  • Speech is harder to encode than music because it changes faster. The most common distortion at low bitrates is a reverberation-like noise tail on the words. It can be lessened by lowering the number of bands in the encoder, which raises the internal filters’ response times. ( Most encoders don’t let you control the number of filters, but many let you select a “speech” optimization. It does the same thing.)
  • Higher background noise levels also increase problems with encoding. Start with the cleanest possible recording.
  • The above note does not mean you can use a noisy recording if you run it through most Noise Reduction plug-ins first. The two algorithms fight each other.

While we’re at it, the encoder might give you some other choices as well:


Stereo or joint stereo
      Most algorithms expect the left and right channels of a stereo pair to be similar. This is usually true in music. A joint stereo mode encodes only major differences between the channels, particularly at high frequencies, freeing up more of the bitrate for better quality. But ambiences and crowd sounds can be very different on the left and right, if the space isn’t reverberant and there are lots of spread-out sources. With these sounds, “joint stereo” pushes things toward the center.

Variable bitrate         This option, also known as VBR, can both reduce file size and improve the sound. The algorithm uses different bitrates for each frame, depending on how many are needed. This avoids wasting bits on pauses or easy-to-encode passages.

VBR works best on simpler or slower-moving sources, including a lot of new age or classical music. It presents little advantage on faster and highly processed sounds, such as most pop styles, because the maximum bitrate must be used for most frames.

Lossy: The Next Generation

When you convert a compressed file back to 16-bit linear audio, something will be missing. If you encode it again, the algorithm has a harder time finding details that can be safely deleted. Noise and distortion build up with each subsequent pass.

If you must go through multiple encodings, stay with the highest bitrates possible. If the final release format will be at a low bitrate, don’t apply it until the last step.

There is some evidence that multiple generations through the same compressor sound worse than the same number of generations through a variety of algorithms.

What have they done to my song?

Want to hear exactly what the mp3 algorithm takes away from voice or music, when you do it properly? No hype, no simulation… but a scientific experiment you can replicate on your desktop. It’s at my website.

Next time: how lossless encoders shrink files without sacrificing any data.

AudioDistributionPost Production

(Page 2 of 2 pages for this article  <  1 2)


               

How to get good production dialogue

Matt Jeppsen | 05/15- 11:05 AM

Color Correction Practice Game

Steve Hullfish | 05/05- 02:54 PM



Name:

Email:

Location:

URL:

Smileys

Remember my personal information

Notify me of follow-up comments?

Submit the word you see below:





To be considered for listing, contact pr (at) provideocoalition (dot) com


Copyright © 2012, HD Expo, LLC a division of Diversified Business Communications. DBA Createasphere

All rights reserved. HD EXPO, High Def EXPO, Createasphere, E-Tech, Entertainment Technology Exposition, 3D Production Workshop, VariCamp, P2 Camp, ColorCamp 101, and Lighting, Filters & Gels for HD are all trademarks of HD Expo, LLC.

Terms of Use  |  Privacy Policy

Check PageRank