Brute Forcing FLAC Apodization Chains

In Optimizing FLAC Audio Files for Production Hosting, we covered the things that can be practically1 done to maximize FLAC compression.

That isn't the end of the story, but it is the end of what a modern PC can hope to achieve within its owner's lifetime.

To push things any further, one would need access to Big Tech levels of computing power.

Or, as it happens, some really, really, really small audio files to work with…

Baa.

The three FLAC-encoded sound effects embedded within JS Mate Poe2 are extremely tiny.

As you can see from the above, subjecting them to all the rigmarole from the previous article isn't for nothing, but they aren't — and probably never will be — living their smallest possible lives.

The trouble is that FLAC's final advanced tunable — apodization function(s) — isn't just a list of choices, but a list of choices with parameters that can also be arbitrarily chained together up to thirty-two links deep!

And each one takes time to test.

Even for a less-than-a-second clip like Poe's baaaaa, the sun would burn out long before the near-infinite number of possible configurations could be exhausted.

But what the hell. It's cold outside. Let's try to make a dent!

Reconnoitering the Rim.

The reference encoder supports eighteen different apodization functions, but three of them are obsoleted and needn't be mentioned again:

  • bartlett
  • bartlett_hann
  • blackman
  • blackman_harris_4term_92db
  • connes
  • flattop
  • gauss(STDDEV)3
  • hamming
  • hann
  • kaiser_bessel
  • nuttall
  • rectangle
  • partial_tukey(N/OV/P) [obsolete]
  • punchout_tukey(N/OV/P) [obsolete]
  • subdivide_tukey(N/P)4
  • triangle
  • tukey(P) [obsolete]
  • welch

Apodization(s) affect compression in extremely complex and source-specific ways, but before we can marvel at the seeming randomness of it all, we need to come up with some reasonable constraints for the STDDEV, N, and P parameters in the list.

The number of tukey subdivisions (N) directly — and expontentially — impact encoding times, but any given frame can only be cut up so many times before the chunks become too small to work with. For these files, a range of 2..=64 includes almost every possibility,5 so let's make it so!

The remaining constraints are dictated by external factors. As much as I would love to test every value of N a thousand times, my computer simply isn't fast enough to manage that in anything approaching a reasonable amount of time. There's nothing for it but to set a precision of hundredths for STDDEV and P.

And just like that, with a few minor concessions, we've cut a virtual-infinity down to a vaguely digestible list of 6,300 possibilities!

Let's get to work!

Ranking the Single Apodizations.

With 6,300 single apodizations to test — and more to do later — we'll need a bit more structure than the simple BASH loop used for Find the Block Size from the previous article.

Important

The ideal block size discovered by Find the Block Size should be incorporated into the apodization encoding commands.

We'll need to keep track of the output sizes, obviously, but as a favor to our future selves, we should also generate and track checksums for each file, because some apodizations will wind up making identical decisions as others.

(When multiple apodizations produce identical output — not just identical savings — we can exclude all but one of them from any further encoding efforts.)

It probably goes without saying that encoding any file more than six thousand times, even in parallel, will take a very long time.

Speaking from experience, you'll definitely want to make the process resumable! Haha.

I wound up building a Rust program to handle the testing, but for readability, let's flesh out a quick MVP with PHP:

<?php

// Source files and block sizes.
const SOURCES = array(
	__DIR__ . '/baa.wav'=>2304,
	__DIR__ . '/sneeze.wav'=>1152,
	__DIR__ . '/yawn.wav'=>4096,
);

// Set up the apodizations, starting with all the simple ones.
$apodizations = array(
	'bartlett',
	'bartlett_hann',
	…
);

// Dynamically populate the ones with parameters.
for ($p = 0.01; $p <= 0.99; $p += 0.01) {
	// Gauss has a lower ceiling.
	if ($p <= 0.50) { $apodizations[] = "gauss($p)"; }

	// Tukey loops twice.
	foreach ($n = 2; $n <= 64; $n++) {
		$apodizations[] = "subdivide_tukey($n/$p)";
	}
}

// Loop through the sources.
foreach (SOURCES as $src=>$block_size) {
	// Path to save results.
	$dst = "$src.apodizations.json";
	
	// Result storage, prepopulated with last run results if any.
	$results = array();
	if (is_file($dst)) {
		$results = json_decode(trim(file_get_contents($dst)), true);
	}

	// Loop through the apodizations.
	foreach ($apodizations as $apodization) {
		// Skip if already crunched.
		if (isset($results[$apodization])) { continue; }

		// Encode using four threads, ideal block size, current apodization.
		if (false !== $output = shell_exec(sprintf(
			'flac -j4 -8epV -r8 -b%d -A %s --no-padding --no-seektable --stdout --totally-silent -f %s',
			$block_size,
			escapeshellarg($apodization),
			escapeshellarg($src),
		))) {
			// Add the results to the running totals.
			$output = file_get_contents($tmp);
			if (! empty($output) && is_string($output)) {
				$results[$apodization] = array(
					'size'=>strlen($output),
					'chk'=>hash('sha256', $output),
				);
			}
		}
	}

	// Sort the apodizations by name, then bubble the winners up to the top.
	ksort($results); 
	uasort($results, function(array $a, array $b) : int {
		return $a['size'] <=> $b['size'];
	});

	// Save!
	file_put_contents(
		$dst,
		json_encode($results, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES),
	);
}

Several corners were cut for concision, but you get the idea, right?

Each source file gets encoded with each apodization. The results are sorted best to worst and exported as JSON to the source directory for future reference.

Warning

FLAC zeroes out the verification md5sum when the --stdout flag is used. While that doesn't matter for benchmarks like these, if you find a result worth saving, you'll need to repeat the winning command with -o output.flac instead.

In my case, it took about four and a half hours to crunch through all the possibilities for these three — I cannot stress this enough — tiny, tiny, tiny files. Haha.

On the bright side, we found savings!

But we aren't done yet!

Chain-Building Strategies.

Crunching 6,300 apodizations individually was doable.

On a whim, I decided to jump straight into exhaustively testing all possible apodization pairings, ignoring all those that produced copycat results the first time around.6

That too was doable, but only because I technically did it.

The process took several days to complete and I regretted it immediately.7

Needless to say, that puts exhaustive testing for all remaining chain lengths completely out of reach. Haha.

To make any further progress, we'll need to get creative.

But how, exactly?

My first instinct was to try all combinations of first-place apodizations, but only sneeze.wav had convergent solutions in the top spot, and none of them improved each other.

My next thought was to extend that to second place, then third, and so on, keeping only those results that improved with any given link.

That started to yield some results!

After a little trial and error, the best overall strategy I found goes something like this:

  1. Collect the top ten single apodizations into a vector called "base".
  2. Collect the top fifty single apodizations (including those in the "base") along any remaining one-off weirdos like welch into another vector called "links".
  3. Add the first "link" from "links" to each of the "base" chains (that doesn't already have it), then re-encode the file with each of the new chains. Add any that improved (in ways not already seen by some other combination) to the "base" for the next round.
  4. Repeat with the second "link" from "links" and the (probably) expanded "base" list.
  5. And so on, a hundred or so more times.

Tip

For consistency and efficiency, keep your chains sorted, preferring the shortest/fastest representation for any given result.

For baa.wav, the best chain happens to also be its best pair, but we didn't learn anything new,8 we got there an order of magnitude faster.

For both sneeze.wav and yawn.wav, at least, a whole bunch of improved chains were discovered,9 and for yawn.wav especially, the reductions in processing time (compared to the earlier pairing misstep) were even more significant.

I doubt these bests are the bests, but they've got to be pretty damned close!

The Results!

Well, that was a week of my life I'll never get back. Haha.

For the curious, the currently-discovered bests are:

FileBlock SizeApodizationChain
baa.wav2304subdivide_tukey(16/0.07)subdivide_tukey(8/0.02);
subdivide_tukey(16/0.07)
sneeze.wav1152subdivide_tukey(9/0.07)subdivide_tukey(4/0.33);
subdivide_tukey(21/0.59);
subdivide_tukey(30/0.01)
yawn.wav4096subdivide_tukey(39/0.18)subdivide_tukey(23/0.04);
subdivide_tukey(25/0.59);
subdivide_tukey(32/0.32);
subdivide_tukey(37/0.01);
subdivide_tukey(39/0.20);
welch10

Interesting stuff, right?

Right.

And hey, JS Mate Poe is now slightly smaller for all the effort!

But how much, exactly?

Throughout the article, we've been charting only the audio data sizes because that's what's being compressed, but for the final tallies, let's step back and count up all the bytes:

FileBeforeAfterSavings
baa.wav6,8246,79034
sneeze.wav8,1458,10441
yawn.wav19,19419,09896

We saved 171 bytes!

And because these assets are constant from build to build, those savings are savings forever.

For niche cases like this, I tend to agree with Mr Smith that diminishing returns are worth fighting for.

But I wouldn't recommend trying to scale this up to more typical CD-quality media.

That would be batshit crazy. Haha.

  1. You may not want to spend fifteen minutes shaving a couple megabytes off the storage size of a CD-quality album, but you could.

  2. The little sheep running around your screen right now. You're welcome!

  3. Standard deviation, 0.01..=0.5.

  4. The number of subdivisions, 1.., and optionally, the percentage of the window to taper, 0.0..=1.0. If the latter is omitted, 0.5 is assumed.

  5. The actual upper limit works out to about eighty, but unique results all but dry up in the fifties.

  6. From 6,300 rounds of encoding, only 345 unique results were produced for baa.wav, 802 for sneeze.wav, and 1,054 for yawn.wav.

  7. For yawn.wav alone, this entailed an additional 554,931 rounds of encoding!

  8. Except for a whole bunch of other, more complicated ways to achieve the same savings. Haha.

  9. Interestingly, neither of the winning combinations contained the best single or pair found earlier.

  10. Sometimes it takes a weird apodization to find some weird bytes in a weird frame.