Optimizing FLAC Audio Files for Production Hosting

Networking contexts (e.g. web sites) are one area where obsessive, tedious optimization is always warranted. Every byte saved before upload is a byte saved during each and every subsequent download.

Compression is the gift that keeps on giving!

Web developers have long understood the benefits of CSS and JS minification, and have recently started paying some attention to image and video sizes as well. But when it comes to audio, particularly lossless formats like FLAC, there's still plenty of room for improvement.

Like most encoding formats, the reference FLAC encoder comes with a lot of obscure tunables to balance compression ratios and processing times. The default settings are good enough (i.e. sane), but imperfect.

If you need squeeze every last byte out of a FLAC file, there's a little more work to be done. This article will walk you through that process.

Making FLAC.

If your audio source is not already in FLAC format, go ahead and convert it to FLAC. Don't spend too much effort here; the default encoder settings will suffice for the time being.

flac -o output.flac input.wav

Most audio programs can output FLAC directly, too, so feel free to simply run a "Save As" from whatever editor you're using.

(If your source file is in a lossy format like MP3, you're doing audio wrong! Haha. Keep it as an MP3, and ignore the rest of the article.)

Tags

FLAC metadata lives at the front of the file. To make edits more write-efficient, PADDING blocks are inserted to preallocate space for additional data. (If a subsequent change can be written within the padding, the audio data won't have to be shifted over to make room.)

Because padding is literal file bloat, we'll be removing it later, so before doing anything else, make sure all your tags are set just the way you want them. (This will save you having to restrip padding afterward.)

Most metadata content is just text, simple key/value pairs. You can add/edit metadata using metaflac, or whatever GUI you prefer.

Because each character you type directly contributes to the overall file size, it pays to be a little stingy. Keep any comments to the point, watch out for dangling whitespace, and replace fancy "smart" quotes and apostrophes with their straight ASCII counterparts (saving two bytes each!).

Now is also the time to apply replaygain, if you're into that. Some editors can handle this for you, but it is recommended to use metaflac for consistency:

# A single file.
metaflac --add-replay-gain /path/to/a.flac

# An album.
metaflac --add-replay-gain /path/to/*.flac

Because we're optimizing for size, you should avoid doing silly things like embedding album cover images or lyrics inside your FLAC files. APPLICATION blocks should generally be avoided too, as they only have relevance for whatever application wrote them. (Remove them if already present.)

Other than that, check one last time for typos, and call it a day!

With the editorial work out of the way, we can now move onto the fun, technical bits!

Fake Stereo

Mono recordings in the wild, particularly when mastered for CD, are often represented as stereo with identical left and right channels. FLAC is good at optimizing for this, but your files won't be as small as they would be if the tracks were properly mono to being with.

(If you already know your file is using the correct number of channels for its content, or has more than two channels, you can skip ahead to the next section.)

The Internet is full of complicated advice on the matter, but "fake stereo" can be uncovered really easily using the program sox:

sox /path/to/a.flac -n oops stat 2>&1 | grep amplitude

The above command runs an Out of Phase Stereo analysis, which is a fancy way of saying it "diffs" the two channels. If they're equal, i.e. fake stereo, the result should be silence, which the min/max amplitudes (printed after running the above) will reflect:

Maximum amplitude:     0.000000
Minimum amplitude:     0.000000

Mastering artifacts may give you numbers that are only almost zero, like 0.000031. As a rule of thumb, if both the min and max are ±0.000x — zeroes down to the thousandth place — the left and right channels are effectively the same, and you should convert the source to mono.

Most of the time, though, you'll see completely different min/max values, because most stereo files actually contain stereo data! If you see nothing at all, your source wasn't stereo to begin with. In either case, no action is needed; you can skip ahead to the next section.

But if you happen to find fake stereo, you need to remove the redundant channel before proceeding. Our friend sox can help with this too:

# Build a new file with just the left channel.
sox /path/to/a.flac /path/to/b.flac remix 1

# Check the integrity of the new file.
flac -wst /path/to/b.flac

# Make the change permanent.
mv /path/to/b.flac /path/to/a.flac

Block Sizes

FLAC supports ten different block sizes, but only one block size will provide the "best" compression for a given file. Before jumping into the final re-encoding, it is worthwhile to figure out the appropriate block size.

While the answer itself is unpredictable, there are a few patterns you can take advantage of to simplify the search:

  1. The bigger block sizes are usually better;
  2. When testing sequentially, compression improves as you approach the "best" block size, and worsens as you get farther away;
  3. Testing compression with -8 yields the same "best" block size as more intense options like -8ep about 99% of the time, and is magnitudes faster;
  4. In the rare cases where -8 and -8ep disagree, it's more or less a rounding error; the true best will be plus or minus one size from the estimated "best";

In other words, encode your file (quickly) with the largest block size first, then work your way down the list until the compression gets worse. The previous run — the not-worse one — is the "best"!

# Encode with a block size of 4608 and print the resulting file size.
# Nothing is actually saved here; you're just getting information.
flac -8 --blocksize 4608 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes

# Repeat with a block size of 4096, and compare the size to the previous run.
flac -8 --blocksize 4096 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes

# And so on down the line. Stop as soon as you see a *bigger* size.
flac -8 --blocksize 2304 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 2048 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 1152 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 1024 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 576  --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 512  --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 256  --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 192  --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes

In practice, it is very unlikely you'll ever have to do all ten passes. If 4096 gives you a larger result than 4608, you'll have your answer after just two.

As mentioned in #3-4, this fast encoding shortcut will very rarely give you the second-best block size. We'll talk about finding the best-best in next section.

For reference, my personal media library of 16,498 songs has the following block size distribution:

The two smallest sizes (192–256) didn't make the chart because they weren't used at all. Most files compressed best using one of the top three sizes (2304–4608).

It is worth noting that FLAC uses 4096 by default. While that did prove to be the most common single block size for my library, the majority of files actually preferred other sizes.

Statistics are fun!

Update 2022-09-30

Shortly after this article was first published, FLAC 1.4.0 was released, improving compression (slightly) across the board over the older 1.3.x series.

Curious to see the improvements in action, I re-re-encded all my tracks — and several others I had acquired since publication. The byte savings were meager — about a tenth of a percent — but the block size distribution afterwards was noticeably different:

Statistics are still fun!

Re-Encoding!

Once you've set your metadata, rooted out any fake stereo channels, and discovered the best block size, all that's left to do is re-encode your file!

FLAC has a lot of fiddly options, but the following generic combination will always get you to at least 99.99% of the compression potential:

# Use this for FLAC 1.3.x.
flac -8epV \
	--blocksize YOUR_SIZE \
    --no-padding \
    --no-seektable \
    /path/to/a.flac --force

# For FLAC 1.4.x, you can skip the -e in favor of using more tukey divisions.
# This will run faster and yield better overall compression. A win/win!
flac -8pV \
    -A 'subdivide_tukey(8)' \
	--blocksize YOUR_SIZE \
    --no-padding \
    --no-seektable \
    /path/to/a.flac --force

Unlike the "quick" encoding passes, this will replace the original file with the newly-encoded version. If you'd rather save it to a different location, replace --force with -o /path/to/output.flac.

Best-Best Block Size

As you should recall from the Block Sizes section, the block size we found there may occasionally be second-best rather than best-best. This won't cost much in terms of bytes, but if you want to make sure you've got the best-best, all you need to do is repeat the above for the adjacent sizes.

Start with the next size up, if any. For example, if 4096 was the guessed size, try 4608. Just make sure you use the -o option to save it to a different location! If the second run came out smaller, you're done! Replace the original with the new one and you're good to go.

If not, re-repeat the encoding with the next size down. If that comes out smaller, keep it, if not, the guess-best was best after all!

Best-Best Compression

The generic option combinations we used above will get you within 99.99% of the absolute compression potential, and do so for each and every track thrown at them. But if you don't think 99.99% is good enough, are running the latest FLAC 1.4.x, and have all the time in the world, you could alternatively do the following, which should save you a few more bytes:

# This will take forever!
flac -8epV \
    -A 'subdivide_tukey(32)' \
    --rice 8 \
	--blocksize YOUR_SIZE \
    --no-padding \
    --no-seektable \
    /path/to/a.flac --force

To be clear, doing this is madness. It will take hundreds or thousands of times longer than the generic version provided earlier, and in most cases, with only save an extra couple hundred bytes, maybe a few kilobytes in rare cases.

Unless your computer is an absolute beast, chasing the .99999s isn't worth it. Haha.

Compressability

FLAC does some smart things when it comes to compression, but on the whole, the audio content doesn't affect compressability as much as you might expect.

To illustrate what I mean, let's take a look at bytes per second, i.e. bytes / (total_samples / sample_rate). Songs requiring fewer bytes to represent their audio data are better compressed than those requiring more.

Each FLAC file stores the total_samples and sample_rate in its metadata. You can use metaflac to tease them out:

metaflac --show-total-samples --show-sample-rate a.flac

The bytes value should ideally be the total number of bytes for just the audio portion of the file, but in practice, metadata doesn't contribute much at all. If you don't have the means to manually parse each file to figure out how much space the audio data takes up, just use the total file size.

Anyhoo, here's the breakdown for my own library:

Unlike a typical Bell Curve, 78.30% of the tracks in my library fall within one standard deviation of the mean, and the outliers aren't really outliers at all; they're just different.

The tracks to the left, for example, are mostly mono recordings. They have half as much data to worry about, and so skirt by with fewer bytes per second.

Conversely, the tracks to the right are mostly high-definition recordings. They have higher sample rates and/or bits-per-sample than standard CD-derived material, and as such have a lot more information to worry about.

This is a roundabout way of saying that if you take the time to find the right block size for each track and use fairly aggressive encoding options, they'll all benefit more or less equally.

OCD FTW

While not the focus of this article, obsessive personality types may enjoy applying these same optimizations to their personal media libraries. I am just such a person, so I did, and saved just shy of 4 GiB.

It is worth noting, however, that doing this en masse is incredibly time-consuming. If you're planning to follow suit, I would strongly encourage you to build FLAC from source, setting the -march=native -flto CFLAGS to ensure the binary can take full advantage of your particular CPU. In my own testing, I also found a noticeable performance boost by building with Clang 13/14 rather than GCC, but not Clang 15.

So yeah…

If you're a normal person, this is probably not worth doing for local media at all. Haha.

Josh Stoik
5 August 2022
Previous Introducing Guff, a Combined SASS/SCSS Compiler and CSS Minifier
Next Re-rewriting JS Mate Poe in Rust/Wasm