Brute Forcing FLAC Apodization Chains
In Optimizing FLAC Audio Files for Production Hosting, we covered the things that can be practically1 done to maximize FLAC compression.
That isn't the end of the story, but it is the end of what a modern PC can hope to achieve within its owner's lifetime.
To push things any further, one would need access to Big Tech levels of computing power.
Or, as it happens, some really, really, really small audio files to work with…
Baa.
The three FLAC-encoded sound effects embedded within JS Mate Poe2 are extremely tiny.
As you can see from the above, subjecting them to all the rigmarole from the previous article isn't for nothing, but they aren't — and probably never will be — living their smallest possible lives.
The trouble is that FLAC's final advanced tunable — apodization function(s) — isn't just a list of choices, but a list of choices with parameters that can also be arbitrarily chained together up to thirty-two links deep!
And each one takes time to test.
Even for a less-than-a-second clip like Poe's baaaaa, the sun would burn out long before the near-infinite number of possible configurations could be exhausted.
But what the hell. It's cold outside. Let's try to make a dent!
Reconnoitering the Rim.
The reference encoder supports eighteen different apodization functions, but three of them are obsoleted and needn't be mentioned again:
bartlettbartlett_hannblackmanblackman_harris_4term_92dbconnesflattopgauss(STDDEV)3hamminghannkaiser_besselnuttallrectanglepartial_tukey(N/OV/P)[obsolete]punchout_tukey(N/OV/P)[obsolete]subdivide_tukey(N/P)4triangletukey(P)[obsolete]welch
Apodization(s) affect compression in extremely complex and source-specific ways, but before we can marvel at the seeming randomness of it all, we need to come up with some reasonable constraints for the STDDEV, N, and P parameters in the list.
The number of tukey subdivisions (N) directly — and expontentially — impact encoding times, but any given frame can only be cut up so many times before the chunks become too small to work with. For these files, a range of 2..=64 includes almost every possibility,5 so let's make it so!
Tukey's P values don't have any direct impact on encoding time, so those are more a more straightforward question of scale. As much as I'd love to test every possible value of N a thousand times, my computer simply isn't fast enough. Hundredths it is!
For these particular files, gauss isn't all that useful. I kept it in the mix for the sake of completion, but capped its STDDEV precision to tenths.
And just like that, with a few minor concessions, we've cut a virtual-infinity down to a vaguely digestible list of 6,255 possibilities!
Let's get to work!
Ranking the Single Apodizations.
With 6,255 single apodizations to test — and more to do later — we'll need a bit more structure than the simple BASH loop used for Find the Block Size from the previous article.
Important
The ideal block size discovered by Find the Block Size should be incorporated into the apodization encoding commands.
We'll need to keep track of the output sizes, obviously, but as a favor to our future selves, we should also generate and track checksums for each file, because some apodizations will wind up making identical decisions as others.
(When multiple apodizations produce identical output — not just identical savings — we can exclude all but one of them from any further encoding efforts.)
It probably goes without saying that encoding any file more than six thousand times, even in parallel, will take a very long time.
Speaking from experience, you'll definitely want to make the process resumable! Haha.
I wound up building a Rust program to handle the testing, but for readability, let's flesh out a quick MVP with PHP:
<?php
// Source files and block sizes.
const SOURCES = array(
__DIR__ . '/baa.wav'=>2304,
__DIR__ . '/sneeze.wav'=>1152,
__DIR__ . '/yawn.wav'=>4096,
);
// Set up the apodizations, starting with all the simple ones.
$apodizations = array(
'bartlett',
'bartlett_hann',
…
);
// Dynamically populate the ones with parameters.
for ($p = 0.01; $p <= 0.99; $p += 0.01) {
// Gauss has a lower ceiling.
if ($p <= 0.50) { $apodizations[] = "gauss($p)"; }
// Tukey loops twice.
foreach ($n = 2; $n <= 64; $n++) {
$apodizations[] = "subdivide_tukey($n/$p)";
}
}
// Loop through the sources.
foreach (SOURCES as $src=>$block_size) {
// Path to save results.
$dst = "$src.apodizations.json";
// Result storage, prepopulated with last run results if any.
$results = array();
if (is_file($dst)) {
$results = json_decode(trim(file_get_contents($dst)), true);
}
// Loop through the apodizations.
foreach ($apodizations as $apodization) {
// Skip if already crunched.
if (isset($results[$apodization])) { continue; }
// Encode using four threads, ideal block size, current apodization.
if (false !== $output = shell_exec(sprintf(
'flac -j4 -8epV -r8 -b%d -A %s --no-padding --no-seektable --stdout --totally-silent -f %s',
$block_size,
escapeshellarg($apodization),
escapeshellarg($src),
))) {
// Add the results to the running totals.
$output = file_get_contents($tmp);
if (! empty($output) && is_string($output)) {
$results[$apodization] = array(
'size'=>strlen($output),
'chk'=>hash('sha256', $output),
);
}
}
}
// Sort the apodizations by name, then bubble the winners up to the top.
ksort($results);
uasort($results, function(array $a, array $b) : int {
return $a['size'] <=> $b['size'];
});
// Save!
file_put_contents(
$dst,
json_encode($results, JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES),
);
}
Several corners were cut for concision, but you get the idea, right?
Each source file gets encoded with each apodization. The results are sorted best to worst and exported as JSON to the source directory for future reference.
Warning
FLAC zeroes out the verification md5sum when the --stdout flag is used. While that doesn't matter for benchmarks like these, if you find a result worth saving, you'll need to repeat the winning command with -o output.flac instead.
In my case, it took about four and a half hours to crunch through all the possibilities for these three — I cannot stress this enough — tiny, tiny, tiny files. Haha.
On the bright side, we found savings!
But we aren't done yet!
Chain-Building: Apodization Pairs.
Testing each of the 6,255 single apodizations was perfectly doable, but however smartly constrained that set was at the time, it's way too big to combine every which way.
Even just for the unique pairings, we'd have to re-encode each file an astonishing 19,559,385 times!
Thankfully, for files this small, there are only so many ways FLAC can achieve a given result. The majority of the results from the first round were not only identical in savings, but identical in bytes.
It is ever so slightly wrong to assume that single apodizations producing exactly the same results as one another will continue producing exactly the same results as one another in other contexts,6 but it's more than true enough for our purposes.
Sorting7 and deduping the results leaves us with only 302 "unique" apodizations for baa.wav, 757 for sneeze.wav, and 1,013 for yawn.wav.
Paired, of course, the numbers jump right back up, but "only" as high as 45,451, 286,146, and 512,578 respectively.
Those totals are technically doable, if for no other reason than I did it. Haha.
Additional savings were found too, but two is just the beginning.
Chain-Building: Bigger and Bigger!
Same as with the single apodizations, pair testing produced a lot of duplicate results, but it also produced a lot more results overall.
Prohibitively many, in fact. Haha.
Thankfully, with two rounds of testing under our belt, we have twice the information to work with, so can bring those numbers down a bit before moving on.
First and foremost, there's little value in extending base chains unless they actually improved upon the compression of their individual parts. If the output size of A+B, for example, isn't smaller than both A and B were individually in the first round, it can be written off as an evolutionary dead end.
And speaking of extensions, we needn't add every (deduped) single apodization to every base chain in every round, either.
We only need to add the ones that improved when added to each permutation of the base component(s) in the previous round. For example, it would only make sense to test (A+B)+C if the deduped/improved pair results contained both A+C and B+C.
Implementing that logic goes a long way toward reining in the madness, but the more tests there are, the more improvements there are, the more tests the next round will have, and so on.
Thankfully, that trend only continues up to a point, because as it turns out, diminishing returns diminish.
For all three files, peak terrible occurred between rounds three and five and tapered off quickly thereafter, reaching zero before double-digit chain lengths entered the picture.
All this extra work proved overkill for baa.wav, whose "best" chain happened to be a best pair, but additional savings were found for both of the others.
The Results?
What a fun way to while away the winter!
For the curious, the currently-discovered bests are:
| File | Block Size | Apodization | Chain |
|---|---|---|---|
baa.wav | 2304 | subdivide_tukey(16/0.07) | subdivide_tukey(8/0.02);subdivide_tukey(16/0.07) |
sneeze.wav | 1152 | subdivide_tukey(9/0.07) | subdivide_tukey(5/0.41);subdivide_tukey(6/0.01);subdivide_tukey(7/0.03);subdivide_tukey(16/0.42) |
yawn.wav | 4096 | subdivide_tukey(39/0.18) | subdivide_tukey(23/0.04);subdivide_tukey(25/0.59);subdivide_tukey(32/0.32);subdivide_tukey(37/0.01);subdivide_tukey(39/0.20);welch8 |
Interesting stuff, right?
Right.
And hey, JS Mate Poe is now slightly smaller for all the effort!
But how much, exactly?
Throughout the article, we've been charting only the audio data sizes because that's what's actually being compressed, but for the final tallies, let's step back and count up all the bytes, since that's what's being embedded into the library:
| File | Before | After | Savings |
|---|---|---|---|
baa.wav | 6,824 | 6,790 | 34 |
sneeze.wav | 8,145 | 8,103 | 42 |
yawn.wav | 19,194 | 19,098 | 96 |
Would you look at that, we saved 172 bytes!!!
And because these assets are constant from build to build, those savings are savings forever.
Still, it's worth asking, Could I have done less?
Reframing the Problem.
It hadn't been clear at the outset how everything would play out, but I assumed somewhere over the course of all these damn tests, patterns galore would emerge.
That they didn't was revealing; it meant I had missed something!
Ironically, for a hint at what that something was, we need look no further than the following line I had (foolishly) cut from an early draft of this post for being too into-the-weeds:
For each frame of audio data, FLAC picks and applies whichever apodization in the chain it thinks will work best.
Total output size is a perfectly fine metric for comparison, but it obscures the details of how apodization(s) are actually interacting with one another.
To be able to understand — and predict — those interactions, we need to zoom into where the action is actually happening.
We need to track the frame sizes too.
Calculating Frame Sizes.
Short of writing a partial FLAC decoder from scratch, the simplest way to find the number and size of frames in a FLAC file is to rerun the reference encoder with the -a/--analyze option:
flac -a --totally-silent --stdout /path/to/audio.flac
That'll output a bunch of text like this:
frame=0 offset=86 bits=7672 blocksize=1152 sample_rate=22050 channels=1 channel_assignment=INDEPENDENT
subframe=0 wasted_bits=0 type=LPC order=10 qlp_coeff_precision=5 quantization_level=4 residual_type=RICE partition_order=3
qlp_coeff[0]=4
qlp_coeff[1]=-11
qlp_coeff[2]=-1
qlp_coeff[3]=-4
qlp_coeff[4]=1
qlp_coeff[5]=-3
qlp_coeff[6]=4
qlp_coeff[7]=1
qlp_coeff[8]=2
qlp_coeff[9]=2
warmup[0]=0
warmup[1]=0
warmup[2]=0
warmup[3]=0
warmup[4]=3
warmup[5]=-1
warmup[6]=-9
warmup[7]=-46
warmup[8]=-15
warmup[9]=28
parameter[0]=3
parameter[1]=4
parameter[2]=4
parameter[3]=4
parameter[4]=5
parameter[5]=5
parameter[6]=5
parameter[7]=5
frame=1 offset=1045 bits=8560 blocksize=1152 sample_rate=22050 channels=1 channel_assignment=INDEPENDENT
subframe=0 wasted_bits=0 type=LPC order=9 qlp_coeff_precision=5 quantization_level=4 residual_type=RICE partition_order=2
…
Thankfully, all we really need from that output are the "bits=YYY" values from the tab-separated lines beginning "frame=XXX".
The frames are reported sequentially, so it's enough to parse and collect the bit9 values into a set in order of appearance.
Brute Force: Take Two!
Same as before, the first step is to re-encode the file with each individual apodization. The only difference this time is we need to store three dimensions for each:
- Output size.
- Output checksum.
- Individual frame sizes.
Also same as before, sort and dedupe those results by checksum.
Next, loop through the remaining results to find the "ideal" (minimum) possible size for each frame, then loop through them again to determine which and how many of those ideal frames each apodization can reach on its own.
From there, it's mostly just a matter of combining apodizations in such a way that all ideal frames are covered.
For example, if A is ideal for frames 0 and 2 and B is ideal for 1 and 3, A+B should be ideal for 0, 1, 2, and 3.
Reality isn't quite so tidy — FLAC takes shortcuts in multiple-choice situations, so generally picks a less-than-best link for at least one of the frames — but with a little trial-and-error, comparable bests to the ones found the first time around surface very quickly.
How quickly?
| File | Tests to Best |
|---|---|
baa.wav | 6,276 |
sneeze.wav | 11,734 |
yawn.wav | 6,817 |
Fuck me. Haha.
Take Three: The Chain of Frankenstein
As humiliating satisfying as the improvements from Take Two had been, it bothered me that none of the winning chains managed to fully meet the target frame sizes.
I tried reordering the links. I tried throwing redundant options on top to help stack the odds. I tried testing different combinations altogether.
Most of the changes had no effect, and those that did just shifted the mistakes around, often to even less perfect ends. Haha.
As high as my tolerance for fruitless trial and error is, there's no getting around the fact that while FLAC can reach each and every one of these targets individually, its internal chain-handling logic is simply too inaccurate to reach them all in one go.
Thankfully, there is another way.
As we (re)discovered in Take Two, each frame is processed independently of the others to maximize — or fuck up — the compression, but also to make it easier for decoders to skip around a stream willynilly.
Frames are, in effect, self-contained.10
We don't actually need FLAC to produce a single perfect copy; we can get there ourselves by manually stitching together the perfect parts from multiples copies.
And thus we become gods! Haha.
This final approach starts the same as the last one, but instead of testing any combinations, it is sufficient to pick any combination of our choosing that, on paper, hits all the targets.11
Next, re-encode the source with each apodization comprising the Frankenchain, individually, collecting the full file output and frame "offset"s — by rerunning flac --analyze — for each.
From there, it's basically just an exercise in copy-and-paste.
Take the first offset + bits/8 bytes from whichever FLAC output produced the smallest first frame,12, followed by the offset..offset + bits/8 subslice from the FLAC output with the smallest second frame, and so on, until at last, the best last has been copied into place.
That's it!
Save the Frankencopy to disk, then run the following command to check the quality of your welding:
flac -wtV /path/to/frankenfile.flac
It's kind of wild to think that in the course of just a few short months, we've managed to drop the number of tests required to maximally compress each file down from infinity to several tens of millions to ten thousand or so to, finally, just 6,260!13
What's more, the Frankenstein approach doesn't just give us winners, it gives us absolute perfection.
These savings — 21 bytes better than before — are as good as it gets, at least for this particular list of apodizations.
Here are the final standings, for reference.
| File | Before | After | Savings |
|---|---|---|---|
baa.wav | 6,824 | 6,787 | 37 |
sneeze.wav | 8,145 | 8,099 | 46 |
yawn.wav | 19,194 | 19,084 | 110 |
| Total | 34,163 | 33,970 | 193 |
And with that, ladies and gentlemen, I think I'm finally ready to declare this problem well and truly solved. Haha.
-
You may not want to spend fifteen minutes shaving a couple megabytes off the storage size of a CD-quality album, but you could. ↩
-
The little sheep running around your screen right now. You're welcome! ↩
-
Standard deviation,
0.01..=0.5. ↩ -
The number of subdivisions,
1.., and optionally, the percentage of the window to taper,0.0..=1.0. If the latter is omitted,0.5is assumed. ↩ -
The actual upper limit works out to about eighty, but unique results all but dry up in the fifties. ↩
-
The reference encoder takes a few shortcuts of its own that can, rarely, cause it to make different decisions than it normally would. ↩
-
Subdividing tukey is noticeably slower than all other functions, so sort those last by
N. A straight alphabetical ordering is sufficient for the others. ↩ -
Sometimes it takes a weird apodization to find some weird bytes in a weird frame. ↩
-
And yes, those are bits, not bytes! ↩
-
Technically, the master STREAMINFO metadata block has min and max frame size hints. They have no effect on the overall stream validity, but for the sake of completeness, you might want to patch them after cobbling together the Frankenchain to reflect the true values. If you do this before adding tags or other meta blocks to the result — and encoded the parts with
--no-paddingand--no-seektable— you'll always find them at bytes12..15and15..18(after the magic mime (4), block header (4), min block size (2), and max block size (2)). Bit ordering is Big Endian. ↩ -
Start with any one of the apodizations with the highest coverage. Compare its mask against all the others, picking any one that brings the most number of additional ideal frames to the table. Rinse and repeat until, cumulatively, all ideal frames are covered. For small files like these, a handful of links should do the trick. ↩
-
By starting at zero rather than the offset, the leading metadata will be copied over too. ↩
-
All files required 6,255 tests to get through the single apodizations, then one more for each link in the Frankenchain. For both
baa.wavandsneeze.wav, four links got the job done, butyawn.wavrequired seven. ↩