Brute Forcing FLAC Apodization Chains

In Optimizing FLAC Audio Files for Production Hosting, we covered the things that can be practically1 done to maximize FLAC compression.

That isn't the end of the story, but it is the end of what a modern PC can hope to achieve within its owner's lifetime.

To push things any further, one would need access to Big Tech levels of computing power.

Or, as it happens, some really, really, really small audio files to work with…

Baa.

The three FLAC-encoded sound effects embedded within JS Mate Poe2 are extremely tiny.

As you can see from the above, subjecting them to all the rigmarole from the previous article isn't for nothing, but they aren't — and probably never will be — living their smallest possible lives.

The trouble is that FLAC's final advanced tunable — apodization function(s) — isn't just a list of choices, but a list of choices with parameters that can also be arbitrarily chained together up to thirty-two links deep!

And each one takes time to test.

Even for a less-than-a-second clip like Poe's baaaaa, the sun would burn out long before the near-infinite number of possible configurations could be exhausted.

But what the hell. It's cold outside. Let's try to make a dent!

Reconnoitering the Rim.

The reference encoder supports eighteen different apodization functions, but three of them are obsoleted and needn't be mentioned again:

  • bartlett
  • bartlett_hann
  • blackman
  • blackman_harris_4term_92db
  • connes
  • flattop
  • gauss(STDDEV)3
  • hamming
  • hann
  • kaiser_bessel
  • nuttall
  • rectangle
  • partial_tukey(N/OV/P) [obsolete]
  • punchout_tukey(N/OV/P) [obsolete]
  • subdivide_tukey(N/P)4
  • triangle
  • tukey(P) [obsolete]
  • welch

Apodization(s) affect compression in extremely complex and source-specific ways, but before we can marvel at the seeming randomness of it all, we need to come up with some reasonable constraints for the STDDEV, N, and P parameters in the list.

The number of tukey subdivisions (N) directly — and expontentially — impact encoding times, but any given frame can only be cut up so many times before the chunks become too small to work with. For these files, a range of 2..=64 includes almost every possibility,5 so let's make it so!

Tukey's P values don't have any direct impact on encoding time, so those are more a more straightforward question of scale. As much as I'd love to test every possible value of N a thousand times, my computer simply isn't fast enough. Hundredths it is!

For these particular files, gauss isn't all that useful. I kept it in the mix for the sake of completion, but capped its STDDEV precision to tenths.

And just like that, with a few minor concessions, we've cut a virtual-infinity down to a vaguely digestible list of 6,255 possibilities!

Let's get to work!

Ranking the Single Apodizations.

With 6,255 single apodizations to test — and more to do later — we'll need a bit more structure than the simple BASH loop used for Find the Block Size from the previous article.

Important

The ideal block size discovered by Find the Block Size should be incorporated into the apodization encoding commands.

We'll need to keep track of the output sizes, obviously, but as a favor to our future selves, we should also generate and track checksums for each file, because some apodizations will wind up making identical decisions as others.

(When multiple apodizations produce identical output — not just identical savings — we can exclude all but one of them from any further encoding efforts.)

It probably goes without saying that encoding any file more than six thousand times, even in parallel, will take a very long time.

Speaking from experience, you'll definitely want to make the process resumable! Haha.

I wound up building a Rust program to handle the testing, but for readability, let's flesh out a quick MVP with PHP:

<?php

// Source files and block sizes.
const SOURCES = array(
	__DIR__ . '/baa.wav'=>2304,
	__DIR__ . '/sneeze.wav'=>1152,
	__DIR__ . '/yawn.wav'=>4096,
);

// Set up the apodizations, starting with all the simple ones.
$apodizations = array(
	'bartlett',
	'bartlett_hann',
	…
);

// Dynamically populate the ones with parameters.
for ($p = 0.01; $p <= 0.99; $p += 0.01) {
	// Gauss has a lower ceiling.
	if ($p <= 0.50) { $apodizations[] = "gauss($p)"; }

	// Tukey loops twice.
	foreach ($n = 2; $n <= 64; $n++) {
		$apodizations[] = "subdivide_tukey($n/$p)";
	}
}

// Loop through the sources.
foreach (SOURCES as $src=>$block_size) {
	// Path to save results.
	$dst = "$src.apodizations.json";
	
	// Result storage, prepopulated with last run results if any.
	$results = array();
	if (is_file($dst)) {
		$results = json_decode(trim(file_get_contents($dst)), true);
	}

	// Loop through the apodizations.
	foreach ($apodizations as $apodization) {
		// Skip if already crunched.
		if (isset($results[$apodization])) { continue; }

		// Encode using four threads, ideal block size, current apodization.
		if (false !== $output = shell_exec(sprintf(
			'flac -j4 -8epV -r8 -b%d -A %s --no-padding --no-seektable --stdout --totally-silent -f %s',
			$block_size,
			escapeshellarg($apodization),
			escapeshellarg($src),
		))) {
			// Add the results to the running totals.
			$output = file_get_contents($tmp);
			if (! empty($output) && is_string($output)) {
				$results[$apodization] = array(
					'size'=>strlen($output),
					'chk'=>hash('sha256', $output),
				);
			}
		}
	}

	// Sort the apodizations by name, then bubble the winners up to the top.
	ksort($results); 
	uasort($results, function(array $a, array $b) : int {
		return $a['size'] <=> $b['size'];
	});

	// Save!
	file_put_contents(
		$dst,
		json_encode($results, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES),
	);
}

Several corners were cut for concision, but you get the idea, right?

Each source file gets encoded with each apodization. The results are sorted best to worst and exported as JSON to the source directory for future reference.

Warning

FLAC zeroes out the verification md5sum when the --stdout flag is used. While that doesn't matter for benchmarks like these, if you find a result worth saving, you'll need to repeat the winning command with -o output.flac instead.

In my case, it took about four and a half hours to crunch through all the possibilities for these three — I cannot stress this enough — tiny, tiny, tiny files. Haha.

On the bright side, we found savings!

But we aren't done yet!

Chain-Building: Apodization Pairs.

Testing each of the 6,255 single apodizations was perfectly doable, but however smartly constrained that set was at the time, it's way too big to combine every which way.

Even just for the unique pairings, we'd have to re-encode each file an astonishing 19,559,385 times!

Thankfully, for files this small, there are only so many ways FLAC can achieve a given result. The majority of the results from the first round were not only identical in savings, but identical in bytes.

It is ever so slightly wrong to assume that single apodizations producing exactly the same results as one another will continue producing exactly the same results as one another in other contexts,6 but it's more than true enough for our purposes.

Sorting7 and deduping the results leaves us with only 302 "unique" apodizations for baa.wav, 757 for sneeze.wav, and 1,013 for yawn.wav.

Paired, of course, the numbers jump right back up, but "only" as high as 45,451, 286,146, and 512,578 respectively.

Those totals are at least doable, if for no other reason than I did it. Haha.

But we aren't done yet!

Chain-Building: Bigger and Bigger!

Same as with the single apodizations, pair testing produced a lot of duplicate results, but it also produced a lot more results overall.

Prohibitively many, in fact. Haha.

Thankfully, with two rounds of testing under our belt, we have twice the information to work with, so can bring those numbers down a bit before moving on.

First and foremost, there's little value in extending base chains unless they actually improved upon the compression of their individual parts. If the output size of A+B, for example, isn't smaller than both A and B were individually in the first round, it can be written off as an evolutionary dead end.

Speaking of extending, we needn't add every (deduped) single apodization to every base chain in every round, either.

We only need to add the ones that improved when added to each permutation of the base component(s) in the previous round. For example, it would only make sense to test (A+B)+C if the deduped/improved pair results contained both A+C and B+C.

Implementing that logic goes a long way toward reining in the madness, but the more tests there are, the more improvements there are, the more tests the next round will have, and so on.

Thankfully, that trend only continues up to a point, because as it turns out, diminishing returns diminish.

For all three files, peak terrible occurred between rounds three and five and tapered off quickly thereafter, reaching zero before double-digit chain lengths entered the picture.

All this extra work proved overkill for baa.wav, whose "best" chain happened to be a best pair, but additional savings were found for the others.

Refining the Process.

After re-encoding these files over and over again, millions and millions of times, I couldn't help but wonder, Could I have done less?

I think so.

Combining every (unique) single apodization with every other (unique) single apodization during round two, for example, produces a ton of results that in turn lead to a shitton of tests and results for round three, that in turn fuel a terrible round four, etc.

Best to nip that sort of thing in the bud. Haha.

For both baa.wav and sneeze.wav, it would have been sufficient — and orders of magnitude faster — to combine only the first quartile (by output size) with everything else.

For yawn.wav, a third octile cut got me to within one byte of the known "best", but I wasn't actually able to find any logical cut(s) — including later stages — that didn't accidentally nuke something important along the way.

It would seem individual performance is only loosely correlated with group success. Haha.

Even so, such shortcuts save sooooo much time, they're probably good to start with, even if exhaustive testing is the ultimate goal. Cutoffs can always be broadened as needed…

The Results!

What a fun way to while away the winter!

For the curious, the currently-discovered bests are (currently):

FileBlock SizeApodizationChain
baa.wav2304subdivide_tukey(16/0.07)subdivide_tukey(8/0.02);
subdivide_tukey(16/0.07)
sneeze.wav1152subdivide_tukey(9/0.07)subdivide_tukey(5/0.41);
subdivide_tukey(6/0.01);
subdivide_tukey(7/0.03);
subdivide_tukey(16/0.42)
yawn.wav4096subdivide_tukey(39/0.18)subdivide_tukey(23/0.04);
subdivide_tukey(25/0.59);
subdivide_tukey(32/0.32);
subdivide_tukey(37/0.01);
subdivide_tukey(39/0.20);
welch8

Interesting stuff, right?

Right.

And hey, JS Mate Poe is now slightly smaller for all the effort!

But how much, exactly?

Throughout the article, we've been charting only the audio data sizes because that's what's actually being compressed, but for the final tallies, let's step back and count up all the bytes, since that's what's being embedded into the library:

FileBeforeAfterSavings
baa.wav6,8246,79034
sneeze.wav8,1458,10342
yawn.wav19,19419,09896

Would you look at that, we saved 172 bytes!!!

And because these assets are constant from build to build, those savings are savings forever.

Like Mr Smith said, sometimes diminishing returns are the only returns worth fighting for.

But I wouldn't recommend trying to scale this up to more typical CD-quality media.9

That would be batshit crazy. Haha.

  1. You may not want to spend fifteen minutes shaving a couple megabytes off the storage size of a CD-quality album, but you could.

  2. The little sheep running around your screen right now. You're welcome!

  3. Standard deviation, 0.01..=0.5.

  4. The number of subdivisions, 1.., and optionally, the percentage of the window to taper, 0.0..=1.0. If the latter is omitted, 0.5 is assumed.

  5. The actual upper limit works out to about eighty, but unique results all but dry up in the fifties.

  6. The reference encoder takes a few shortcuts of its own that can, rarely, cause it to make different decisions than it normally would.

  7. Subdividing tukey is noticeably slower than all other functions, so sort those last by N. A straight alphabetical ordering is sufficient for the others.

  8. Sometimes it takes a weird apodization to find some weird bytes in a weird frame.

  9. If you really wanted to do something more for bigger files (after the saner optimizations), test subdivide_tukey(3) through subdivide_tukey(8), individually, without the P parameter. That'll shouldn't take too long…