How does MP3 compression work? That's a common question that deserves a fast answer. I actually searched around the net before sitting down to write this and noticed that everyone was writing novels about the topic but nobody could explain it in a quick, normal-person manner. So here we go!
"Any sufficiently advanced technology is indistinguishable from magic." - Arthur C. Clarke
For most of us, the quote above perfectly sums up the mystery of MP3 compression. Sadly, it also sums up most of the explanations out there.
I bought an album online recently and it was delivered as a 160 kbps file. I only noticed because the file size itself was much smaller than the uncompressed wave files I've created myself.
What I didn't notice was any drop in quality. Can you explain how this works, because obviously the compression algorithm is leaving out some data to make this happen, right?
Mason, that's a great question. I can answer this plainly only thanks to having spent a lot of time worrying about how to distribute my own music across the internet, and of course after refreshing myself. It's been a good 15 years since I spent time thinking heavily on the topic.
Nobody cared about this stuff when we were working in the analog field. We had vinyl records, 8-tracks, cassette tapes, and compact discs (these are digital but didn't need compression). MP3's became a "thing" after the explosion of the internet.
A typical uncompressed wave file might be as big as 30 MB for a typical 3 minute song. But after being run through the MP3 compression algorithms that might drop down to 3 MB without any serious loss of quality.
This was preferable when our bandwidth speeds were extremely low on dial-up modems and we might of even had bandwidth caps for the month. Instead of waiting days to download a song, we could do it in a couple hours (and in the present, a couple of seconds!).
MP3's are maintaining their presence due to MP3 players like the iPod. They have limited hard drive or flash drive space, so with compression we can carry around a lot more music.
Plus there's no need for full resolution files when we're doing yard work or at the gym using tiny sports earphones. It's also a huge space and bandwidth saver for online streaming services.
MP3 stands for MPEG Layer 3
MPEG is a video file type that did the same thing for videos as MP3's did for audio. In fact, MP3's are just the 3rd layer set apart for audio on the video files. It's all the same technology.
Here's where it gets crazy. The people who designed these compression algorithms used our knowledge of psychoacoustics to manage the data bandwidth. Psychoacoustics refers to how our brain interprets sounds.
The brain uses certain tricks like auditory masking to allocate resources and attention to what is the most important sound happening at any given time. Using this info, we know what we can get rid of, data-wise.
The first and easiest savings are to go ahead and cut out a certain frequency range if the music allows for it. Adults begin to lose their capacity for hearing above 16-18 kHz, whereas the top limit for humans is around 24 kHz. At that level there's not a lot going on in terms of intelligibility. It's just "sparkle, shine, sheen."
In most cases, we don't need to have it at all or at least can encode it into the MP3 file at a lower resolution.
This refers to something our ears and brains do called simultaneous masking. Basically, if a loud sound is blaring out over the top of a lot of low-volume sounds, you're naturally going to focus on the loud sound. What this means is that we can spend lot less data on the quiet sounds. They don't need as much detail encoded in them during those times.
In the same fashion above, if two sound events occur within milliseconds of each other, we're only going to be able to focus on the loudest one. It's how we've been evolutionarily primed to react. Our ears and minds can't separate events that close in time.
So what the encoder algorithm does is ignore or at least allocate much less data to the quieter sound since we won't perceive it anyways.
The minimum audition threshold refers to volume. As a voice or sound becomes quieter and quieter, we're able to make out less and less detail. The encoder knows this and chooses to not save every single detail of quiet sounds since we can't use it anyways. And if a sound dips below a certain volume threshold where the human ear can't hear it, then it gets tossed out completely.
And finally this is where the real work is done. Once you've processed all of the savings mentioned above, you're still going to be left with a hefty file of large size. That's because all of the left over data is still being stored at the highest resolution possible. Here's how the geniuses behind MP3 solved it.
First and foremost, MP3 is a lossy data compression technique by definition because we immediately drop the bit depth of the audio from 24 bit or above down to 16 bit. Lossy refers to this drop in resolution but doesn't have to mean a loss in audio quality.
16 bit is a depth that has plenty of headroom to provide a high signal-to-noise ratio. It means that every sample has 16 bits to encode with (using a 0 or a 1 in binary). By dropping from 24 bit to 16 bit we've already made a 25% saving in size with no discernible quality difference.
Speaking of each sample having 16 bits each... that's another place massive savings are made. Sample rates can get as high as 96,000 samples per second! 44.1 kHz is your typical sample rate for MP3's and that's still a ton of samples per second, but it represents a 50% drop in the amount of data being stored versus 96 kHz sample rates. Here's how it works, via picture:
The basics is that a lower sample rate captures less "snap shots" of each moment of music. You can think of it like a movie or a video game at 60 frames-per-second versus the typical 24 fps. 24 is more than good enough but 60 looks great during fast action scenes. It works the same for music and sample rates.
And finally we set a limit to the data throughput. This takes into account everything mentioned above and then sets a ceiling on how much data you can send at once. Most MP3 streaming and selling services use a CBR, which is a constant bit rate, usually of 128 kilobytes per second.
Other common options are 192 kbps, and 320 kbps which is the highest available on MP3 and as good as uncompressed audio quality. Some stream services will only send 64 kbps and you can definitely tell. Quality takes a serious drop below 128 kbps.
Constant bit rates are preferable for these services and consumers because it helps them predict their bandwidth and storage needs. But advances have been made for personal use such as VBR, which is a variable bit rate.
What this does is allows a lower bit rate during quiet parts of songs and a higher bit rate at louder or more complex parts of a song. This is preferable for those who prefer the highest quality audio but still desire the data savings of MP3's.
And that's it! That's the briefest and most simple explanation of MP3 compression you're ever going to find. Next time one of your friends asks "How does MP3 compression work?" you can send them here or if you can remember these details then explain it to them and make them feel inferior. That's what friends are for!
Thanks for such a solid question. I had to dig deep to answer it!