Optimizing FLAC Audio Files for Production Hosting

Networking contexts (e.g. web sites) are one area where obsessive, tedious optimization is always warranted. Every byte saved before upload is a byte saved during each and every subsequent download.

Compression is the gift that keeps on giving!

Web developers have long understood the benefits of CSS and JS minification, and have recently started paying some attention to image and video sizes as well. But when it comes to audio, particularly lossless formats like FLAC, there's still plenty of room for improvement.

Like most encoding formats, the reference FLAC encoder comes with a lot of obscure tunables to balance compression ratios and processing times. The default settings are good enough (i.e. sane), but imperfect.

If you need squeeze every last byte out of a FLAC file, there's a little more work to be done. This article will walk you through that process.

Making FLAC.

If your audio source is not already in FLAC format, go ahead and convert it to FLAC. Don't spend too much effort here; the default encoder settings will suffice for the time being.

flac -o output.flac input.wav

Most audio programs can output FLAC directly, too, so feel free to simply run a "Save As" from whatever editor you're using.

(If your source file is in a lossy format like MP3, you're doing audio wrong! Haha. Keep it as an MP3, and ignore the rest of the article.)

Tags

FLAC metadata lives at the front of the file. To make edits more write-efficient, PADDING blocks are inserted to preallocate space for additional data. (If a subsequent change can be written within the padding, the audio data won't have to be shifted over to make room.)

Because padding is literal file bloat, we'll be removing it later, so before doing anything else, make sure all your tags are set just the way you want them. (This will save you having to restrip padding afterward.)

Most metadata content is just text, simple key/value pairs. You can add/edit metadata using metaflac, or whatever GUI you prefer.

Because each character you type directly contributes to the overall file size, it pays to be a little stingy. Keep any comments to the point, watch out for dangling whitespace, and replace fancy "smart" quotes and apostrophes with their straight ASCII counterparts (saving two bytes each!).

Now is also the time to apply replaygain, if you're into that. Some editors can handle this for you, but it is recommended to use metaflac for consistency:

# A single file.
metaflac --add-replay-gain /path/to/a.flac

# An album.
metaflac --add-replay-gain /path/to/*.flac

Because we're optimizing for size, you should avoid doing silly things like embedding album cover images or lyrics inside your FLAC files. APPLICATION blocks should generally be avoided too, as they only have relevance for whatever application wrote them. (Remove them if already present.)

Other than that, check one last time for typos, and call it a day!

With the editorial work out of the way, we can now move onto the fun, technical bits!

Fake Stereo

Mono recordings in the wild, particularly when (poorly) mastered for CD, are often represented as stereo with identical left and right channels. FLAC is good at optimizing for this, but your files won't be as small as they would be if the sources were properly mono to being with.

(If you already know your file is using the correct number of channels for its content, or has some number of channels other than two, you can skip ahead to the next section.)

The Internet is full of complicated advice on the matter, but "fake stereo" can be uncovered really easily using the program sox:

sox /path/to/a.flac -n oops stat 2>&1 | grep amplitude

The above command runs an Out of Phase Stereo analysis, which is a fancy way of saying it "diffs" the two channels. If they're equal, i.e. fake stereo, the result should be silence, which the min/max amplitudes (printed after running the above) will reflect:

Maximum amplitude:     0.000000
Minimum amplitude:     0.000000

Mastering artifacts may give you numbers that are only almost zero, like 0.000031. As a rule of thumb, if both the min and max are ±0.000x — zeroes down to the thousandth place — the left and right channels are effectively the same, and you should convert the source to mono.

Most of the time, though, you'll see completely different min/max values, because most stereo files actually contain stereo data! If you see nothing at all, your source wasn't stereo to begin with. In either case, no action is needed; you can skip ahead to the next section.

But if you happen to find fake stereo, you need to remove the redundant channel before proceeding. Our friend sox can help with this too:

# Build a new file with just the left channel.
sox /path/to/a.flac /path/to/b.flac remix 1

# Check the integrity of the new file.
flac -wst /path/to/b.flac

# Make the change permanent.
mv /path/to/b.flac /path/to/a.flac

Block Sizes

FLAC supports ten different block sizes, but only one block size will provide the "best" compression for a given file. Before you jump into the final re-encoding, you need to figure out which one to use.

While the answer itself is unpredictable, there are two patterns you can take advantage of to simplify the search:

  1. The relative differences between block sizes can be found using minimal (quick) encoder settings;
  2. Bigger is usually better, but when smaller is better, it is until it isn't.

In other words, start with the largest block size, then work your way down the list until the resulting file size gets worse. The previous run will then be your best!

# Encode with a block size of 4608 and print the resulting file size.
# Nothing is actually saved here; you're just getting information.
flac -8 --blocksize 4608 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes

# Repeat with a block size of 4096, and compare the size.
flac -8 --blocksize 4096 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes

# And so on down the line. Stop once you see a bigger size.
flac -8 --blocksize 2304 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 2048 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 1152 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 1024 --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 576  --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 512  --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 256  --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes
flac -8 --blocksize 192  --no-padding --no-seektable --stdout --totally-silent a.flac | wc --bytes

The block size that produces the smallest output size is the block size to use!

In practice, it is very unlikely you'll ever have to do all ten passes. For my personal media library of 16,432 songs, the block size distribution looks like this:

The two smallest sizes (192–256) didn't chart because they weren't used at all. Most files compressed best using one of the top three sizes (2304–4608).

It is worth noting that FLAC uses 4096 by default. While that did prove to be the most common single block size for my library, the majority of files actually benefitted from using other sizes.

Statistics are fun!

Re-Encoding!

Once you've set your metadata, rooted out any fake stereo channels, and discovered the best block size, all that's left to do is re-encode your file! After all that build up, there's just one command to run:

flac -8epV \
	--blocksize YOUR_SIZE \
    --no-padding \
    --no-seektable \
    /path/to/a.flac --force

The -8 flag is one we used earlier; it simply tells FLAC to use the best compression level. The -e and -p flags improve upon that a little by using more exhaustive value-searching methods. They're relatively expensive, but worth it compression-wise.

The -V part is merely a sanity check. It makes sure each in/out chunk matches as it goes, aborting if there's any corruption or error. Safety first!

The --no-padding flag removes PADDING blocks. As mentioned earlier, padding is literally just unused space, so removing it is for the best.

The --no-seektable flag removes SEEKTABLE blocks, which contain pre-calculated sign posts to make it easier for decoders to jump around the file willynilly. While historically useful, this optimization is moot with modern hardware, so might as well be turned off to save a kilobyte or two.

The --force at the end is just a confirmation bypass, telling FLAC to write changes back to the original file when it's done. (Since that file already exists, it would complain otherwise.)

And that, friends, is it! Your files should now be a few percent smaller!

OCD FTW

While not the focus of this article, obsessive personality types may enjoy applying these same optimizations to their personal media libraries. I am just such a person, so I did, and saved just shy of 3GiB.

It is worth noting, however, that doing this en masse is incredibly time-consuming. Parallelization helps, but there's no benefit going beyond the number of physical cores. I would also recommend breaking your library up into small chunks, focusing on one artist at a time. This makes it easier to pick up where you left off in the event of a crash, reboot, etc.

If you're a normal person, this is probably not worth doing for local media. Haha.

Josh Stoik
5 August 2022
Previous Introducing Guff, a Combined SASS/SCSS Compiler and CSS Minifier