Making Up Numbers: Constrained Numerical Types in Rust

I recently discovered an obscure bug in my lossless JPEG/PNG image compression tool flaca.

To cut a long rant story short, a compiler compatibility conflict had emerged in the linked zopfli compression library and, since Google had effectively abandoned that project — because of course it did — I had no choice but to rewrite and replace the whole damn thing.

Porting that shitshow from C to Rust was a long and terrible and interesting journey, but not so different from the one Carol Nichols covered extensively several years ago.

But after all that, while addressing some of my port's inevitable performance regressions, I stumbled onto a neat trick that not only fixed a whole slew of issues all at once, but also brought some much-needed contextual clarity and continuity to the complex and confusing codebase.

I'll give you a hint: it fits in a breadbox.

The Difference Indifference Makes.

It isn't obvious at a glance, but most of the numbers zopfli passes around from method to method are logically constrained to fit predictable, fixed ranges.

A length literal ("litlen"), for example, will always have a value between 0..=258. A length symbol will always be zero or 257..=285. A distance symbol will always be between 0..=29. The ZopfliHash hashes and indices — and most everything else — are effectively 15-bit unsigned integers.

And so on.

For developers, logical constraints like these make it possible to have methods like the following, which uses a same-sized lookup table to cheaply fetch the "extra bits value" for a given litlen:

/* Gets value of the extra bits for the given length, cfr. the DEFLATE spec. */
static int ZopfliGetLengthExtraBitsValue(int l) {
  static const int table[259] = {
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 2, 3, 0,
    1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5,
    6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6,
    7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
    13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2,
    3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
    10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
    29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
    18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6,
    7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
    27, 28, 29, 30, 31, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 0
  };
  return table[l];
}

In C, lookups like this are virtually free because the language is utterly unswayed by leftist propaganda like type consistency or memory safety.

It doesn't know that litlens are logically constrained, but it doesn't give a shit either.

To see what's involved — or rather what's not involved — one need look no further than the resulting assembly:

ZopfliGetLengthExtraBitsValue:
        movsx   rdi, edi
        mov     eax, DWORD PTR table.0[0+rdi*4]
        ret

In Rust, such lookups are still plenty fast, but since shits are actually given, they're not quite free.

It won't just read arbitrary regions of memory willynilly; it double-checks the memory is owned and allocated first.

Unless the compiler can prove a particular access is infallible during its optimization passes, an extra check will be embedded directly into the lookup code and executed at runtime whenever the method is invoked:

ZopfliGetLengthExtraBitsValue:
        cmp     rdi, 258
        ja      .LBB0_2
        lea     rax, [rip + .L__unnamed_1]
        movzx   eax, byte ptr [rdi + rax]
        ret
.LBB0_2:
        push    rax
        lea     rdx, [rip + .L__unnamed_2]
        mov     esi, 259
        call    qword ptr [rip + core::panicking::panic_bounds_check::h9bb22f08a42e1ac8@GOTPCREL]

In most cases, that (tiny) indirection has no observable impact on a program's overall performance, but since zopfli is essentially one giant hot loop, nothing it does is done in moderation.

At the scale of millions and millions of shits given, the impact can be, well, kinda shit.1

But that's just one obvious example. The difference between total indifference and basic consideration can be felt in other, subtler ways too.

Why People Hate Math.

Constrained numerical ranges can often benefit from mathematical shortcuts that would otherwise produce nonsensical values for more open-ended inputs.

My high-performance, ultra-niche datetime library utc2k, for example, extracts hours from seconds using this bit of obscure witchery, which only works because these seconds happen to fit within a single day:

let hours = ((seconds * 0x91A3) >> 27) as u8;
seconds -= hours as u32 * HOUR_IN_SECONDS;

Zopfli uses similar tricks when working with distance literals, since 32,768 values is a bit much to jam into a static lookup table.2

The following method, for example, converts a literal into its equivalent symbol:

/* Gets the symbol for the given dist, cfr. the DEFLATE spec. */
static int ZopfliGetDistSymbol(int dist) {
  if (dist < 5) {
    return dist - 1;
  } else {
    int l = (31 ^ __builtin_clz(dist - 1)); /* log2(dist - 1) */
    int r = ((dist - 1) >> (l - 1)) & 1;
    return l * 2 + r;
  }
}

Viewed in isolation, there is nothing wrong with this method.3 Whether written in C or Rust, it'll boil down to comparable assembly, and execute with comparable performance.

But given that distance symbols can only ever be in the range of 0..=29, its responses will only be logically valid if the distances passed to it are also logically valid.

C, of course, doesn't give a shit one way or another, but Rust is more skeptical. Any subsequent use of a symbol returned by this method will be subject to the sorts of costs discussed in the previous section.

It would be easy enough to refactor the Rust version of the method to enforce correct return values. Something like the following would work:

#[inline] // Request inlining.
pub const fn zopfli_get_dist_symbol(dist: u16) -> u8 {
	if dist < 5 { dist.saturating_sub(1) as u8 } // Floor to zero.
	else if 24_577 <= dist { 29 }                // Ceil to 29.
	else {
		let l = (dist - 1).ilog2();
		let r = ((dist as u32 - 1) >> (l - 1)) & 1;
		(l * 2 + r) as u8
	}
}

But unfortunately such effort would be counter-productive.

The added complexity and branching would outweigh the savings from any potential — and unlikely — elisions.

If only the compiler could be made to understand the program's intentions…

Modern Primitives.

Lest you get the wrong idea, compilers are really, really smart.

They might have trouble following a variable's value as it gets passed around from method to method in a convoluted shell game, but they'll never be confused about the basic properties of its type.

The compiler can happily eliminate any and all bounds checking for the following contrived example, because in no universe will a u8 fall outside the index range of this lookup table:

fn lookup(idx: u8) -> usize {
	const FOO: [usize; 1024] = [0; 1024];
	FOO[idx as usize]
}

Can you spot where this is going?

To elevate the logical constraints of our custom "types" to explicit constraints innately understood by the compiler, we simply need to remove the quotes and actually define custom types!

Let's start with the example from the previous section and define an enum to hold our distance symbols:

#[repr(u8)]
#[derive(Debug, Clone, Copy)]
/// # Distance Symbol.
pub enum Dsym {
	D00 = 0_u8,
	D01 = 1_u8,
	D02 = 2_u8,
	D03 = 3_u8,
	…
	D28 = 28_u8,
	D29 = 29_u8,
}

The compiler, gods bless it, will now know that each and every Dsym it encounters, no matter its origin or destination, will have a value somewhere between 0..=29.

All we need to do is adjust our code to use Dsym instead of generic primitives wherever distance symbols are involved:

pub const fn zopfli_get_dist_symbol(dist: u16) -> Dsym {
	if dist < 5 {
		unsafe { std::mem::transmute(dist.saturating_sub(1) as u8) }
	}
	else if 24_577 <= dist { Dsym::D29 }
	else {
		let l = (dist - 1).ilog2();
		let r = ((dist as u32 - 1) >> (l - 1)) & 1;
		std::mem::transmute((l * 2 + r) as u8)
	}
}

And before you say anything, yes, this has the same added complexity I complained about in the previous section, but the balance is different now, because it returns a fully-qualified Dsym.

All future usage of its return values, no matter how crazy, will be exempt from any and all Rust-inserted checks.

What's more, because the compiler now understands exactly what a Dsym is and isn't, it can apply additional optimizations on a case-by-case basis that it wouldn't dare do for the more ambiguously-defined C values.

Once litlens, length symbols, and all the other special numbers have been given the same treatment, Rust won't just perform on par with C, it'll perform better!4

And as if that weren't enough, the switch to descriptive, well-defined types will make the code more intelligible to human eyes too! It'll be apparent at a glance that a particular value is a LitLen, say, rather than just some random number.

Neat trick, huh?

Better Living Through Build Scripts.

In practice, you'll probably want to automate the construction of definitions like these, if for no other reason than it'll save you a lot of tedium and typos.

Rust build scripts are ideal for this sort of thing.

You could use a dedicated crate like codegen, or simply slap together and save a simple string.

For my zopfli port, I opted for the latter:

// Build Distance Symbol Enum.
let mut dsym: String = r"#[repr(u8)]
#[derive(Debug, Clone, Copy)]
/// # Distance Symbols.
pub enum Dsym {".to_owned();

// Add each variant.
for i in 0..30 {
	write!(&mut dsym, "\n\tD{i:02} = {i}_u8,").unwrap();
}

// Close it off.
dsym.push_str("\n}\n");

// Save it.
let out_dir = std::env::var("OUT_DIR").expect("Missing OUT_DIR.");
File::create(out_dir.join("dsym.rs"))
	.and_then(|mut f| f.write_all(dsym.as_bytes()).and_then(|_| f.flush()))
	.expect("Failed to write dsym.rs");

Finally, to use that "code" as code in the project, simply include it in lib.rs or any other source file:

include!(concat!(env!("OUT_DIR"), "/dsym.rs"));

That's all there is to it!

---

1. It is difficult to place an exact number on this because the inclusion or elision of bounds checks can have trickle-down effects on other compiler optimizations like inlining decisions, but at this point in time, my usual ninety-second benchmark was taking closer to one hundred ten seconds.

2. Or it would be if not for the insane number of distance-to-symbol conversions a typical zopfli run executes. For my own port, a massive pre-computed lookup table proved significantly faster than on-the-fly computation.

3. Other than the use of signed types.

4. My ninety-second benchmark dropped to eighty seconds after this retyping was applied!

Josh Stoik
13 May 2024
Previous Building a Custom XanMod Kernel on Ubuntu 23.10