(Page 2 of 2 pages for this article  <  1 2)

Thursday, August 14, 2008

Hearing What’s Not There

Sometimes, making data disappear can be acceptable

Magic Part I… The Mask

How your brain gets around a neural traffic jam

Our ears are not particularly precise sensors.

It’s not for lack of trying: each ear has close to 30,000 nerves on the basilar membrane, tuned to respond to different pitches. But those membrains aren’t like giant organ keyboards, with specific nerves for every tone we ever hear. That would be too much data for the brain to process efficiently.

Instead, when we hear a tone at a particular frequency, a group of nerves centered around that pitch fire. How many nerves go off depends on the volume and other factors. A loud sound triggers more nerves. The brain interprets these groups as a specific pitch and volume.

The nerves aren’t spread out linearly. They’re more concentrated at frequencies where sounds tend to be important, and sparse at the extremes of the band.

In other words, the first audio data compression systems were human. They evolved in our eardrums and auditory cortex.

This has been known for years, and been measured across very large populations. It’s generally called the Threshold of Hearing. Pitches above the threshold get heard. Those below it, don’t.

You can express it with a graph:

image

Low frequencies are on the left, mids in the middle, highs on the right. (Those calibrations are logarithmic because that’s how we hear pitch.) The vertical decibels are calibrated relative to the frequency where most people’s ears are the most sensitive, around 3.5 kHz. You could consider 0 dB on this chart to be true 0 dB SPL - the nominal threshold of hearing - or any other normal listening level.

The important thing isn’t the calibrations; it’s what happens at the heavy brown line. That’s how the threshold varies with frequency, in most people. (The line is pretty accurate, given my drawing abilities; there are more rigorous ones elsewhere on the Web.)

At 3.5 kHz, the short, green bar at 15 dB is louder than the threshold. It gets heard. But the red bars at 50 Hz and 15 kHz are ignored, even though they’re louder. In fact, most people can’t detect a very high or very low pitch until it gets some 40 dB louder than one they could comfortably hear in the mid-range!

There seems to be good evolutionary reason for this. While roaring predators are louder than human speech, the most important parts of intelligibility are around 3.5 kHz. That’s where it would be most advantageous to understand your neighbor’s shouts, even if there’s a tiger nearby. (More about these frequencies in an earlier blog entry.)

The darned line keeps moving

How many nerves get involved for a particular tone depends on volume, and is constantly being adjusted by our ears. That’s necessary. A nearby jet plane hits your ear with about 10,000,000,000 more pressure than the quietest tones used in a hearing test. But it means there’s even more data compression going on in your head.

All this efficiency comes with a sacrifice. Because louder sounds use larger groups of nerves, and the threshold is constantly being adjusted, softer sounds at a nearby frequencies can’t get through at the same time. Neural pathways that would normally respond are already busy.

The effect can be thought of like this:

image

When something loud enough comes along (blue bar, about 40 dB at 2 kHz), it drags the threshold with it. The green bar from our previous drawing - and a slightly louder one I added at 1 kHz - don’t get heard, even though they’re above the normal threshold.

The actual amount of masking varies with the frequency, volume, and overall timbres of the sounds, but it’s always there. It gets broader at the extremes of the bands, where nerve bundles are more spread out. A 250-Hz sound, 25 dB above the threshold, ties up so much neural activity that a simultaneous 200-Hz sound that’s 10 dB softer actually disappears.

After-images (and pre-images) in your ear

One of the magic tricks described in the Nature Reviews article is The Great Tomsoni’s Colored Dress Change. His assistant appears in a white dress, which he says he’ll turn red. Her white spotlight goes out and a red one comes on. He makes a joke, the audience laughs, and he tells the booth to change the light back. When the spot turns white again, her dress is made of red fabric!

I write audio tutorials, so you’ll have to read the article to see how he does it. But I’ll give you an audio-based hint. Nerves are chemical, and chemicals have to recover after they’ve been fired. This results in a time-based masking as well.

image

In this drawing, frequency doesn’t matter. A long loud tone is sounded (blue bar, lasting 180 ms or about 6 frames), and it drags the threshold up to match. But look at what happens in the 50 ms or so after the tone: nerves are still recovering, so the threshold stays up. In fact, the brain even forgets nearby pitches that happened 20 ms or so before the tone, because its pathways get overwhelmed!

What it all means

These two effects - loudness and temporal masking - are the basis behind perceptual encoders like mp3, AAC, and Dolby Digital. Our hearing mechanisms can’t hear certain sounds, so bits in a compressed audio file don’t get wasted on them. They’re also the basis behind most noise reduction algorithms, but we’ll save that for a future series.

Of course, you know there’s a lot of bad perceptual encoding going on. Sounds get thrown away that the brain should be hearing, and we miss them. And bad choices during the encoding can add artifacts that make things even worse. But it’s not the encoding’s fault… it’s the user’s.

Next article: What these compression algorithms actually do, and how to make them do it more efficiently. It’s usually not what’s on your encoder’s menus. Here’s a link.


Technical note:

This masking effect was researched with multiple tones sounding together, at normal and moderately high listening levels. A different set of curves, Fletcher-Munson, was taken with single tones over a much wider range of volumes. It’s similar, but suggests that relative low-frequency sensitivity increases as sounds get much louder, and the difference disappears by 140 dB SPL (threshold of pain). The high end loss remains fairly consistent at any volume.

Fletcher-Munson is the basis behind ‘loudness compensation’ switches on some hifi amps (along with some dubious assumptions about recording levels and speaker efficiency). It’s also the very real reason why movies are mixed on dub stages that are calibrated to theatrical levels.

Both phenomena can be reconciled. But unless you regularly listen to extremely wideband sounds at painful levels, it’s not important here. If you do listen that way - considerably above OSHA recommendations for even short bursts - you’ll permanently damage your ears very quickly. Then you probably won’t hear anything at all.

AudioPost Production

(Page 2 of 2 pages for this article  <  1 2)



VIDEO KILLS THE RADIO STAR, AGAIN

Ty Lowell | 01/06- 11:26 AM

Realistic Screen Compositing

David Torno | 01/04- 03:43 PM

Syntheyes 101

David Torno | 01/04- 02:00 PM



Glad you liked it, Bill.

Full size mix theaters are calibrated so that -20 dBFS on the track = 85 dB SPL, at the listening position, from a single speaker. (Smaller rooms are often set to -82 dB SPL instead.)

That puts normal dialog around -20 dBFS in the center channel.  But of course it’s really up to how the scene feels (which is why the theater is calibrated).

Of course of course, it’s really up to the director.

That also puts the 0 dBFS boom booms at 105 dB SPL… per speaker. How the speakers will add depends on what’s in each channel.

If you played something through all 6 channels in phase, you could reach 120 dB SPL. Boy, would I hate to sit in that theater.

(Further, that assumes the theater when it’s showing is calibrated. Lots of luck there...)

Posted by JayR  on  08/18  at  10:16 AM


Name:

Email:

Location:

URL:

Smileys

Remember my personal information

Notify me of follow-up comments?

Submit the word you see below:




Advertisements
















Copyright 2008 ProVideo Coalition LLC
Check PageRank