HTMinL v0.4.9
License: WTFPL
Released: 2021-03-25

HTMinL

HTMinL is a CLI tool for x86-64 Linux machines that simplifies the task of minifying HTML in-place for production environments.

This software is a work-in-progress.

Feel free to use it, but if something weird happens — or if you have ideas for improvement — please open an issue!

Features

HTMinL is a fast, in-place HTML minifier. It prioritizes safety and code sanity over ULTIMATE COMPRESSION, so may not save quite as many bytes as Node's venerable html-minifier, but it is also much less likely to break shit.

And it runs magnitudes faster…

Unlike virtually every other minifier in the wild, HTMinL is not a stream processor; it constructs a complete DOM tree from the full source before getting down to business. This allows for much more accurate processing and robust error recovery.

See the minification section for more details about the process, as well as the cautions section for important assumptions, requirements, gotchas, etc.

Installation

This application is written in Rust and can be installed using Cargo.

For stable Rust (>= 1.51.0), run:

RUSTFLAGS="-C link-arg=-s" cargo install \
    --git https://github.com/Blobfolio/htminl.git \
    --bin htminl \
    --target x86_64-unknown-linux-gnu

Pre-built .deb packages are also added for each release. They should always work for the latest stable Debian and Ubuntu.

Usage

It's easy. Just run htminl [FLAGS] [OPTIONS] <PATH(S)>….

The following flags and options are available:

-h, --help           Prints help information
-l, --list <list>    Read file paths from this list.
-p, --progress       Show progress bar while minifying.
-V, --version        Prints version information

Paths can be specified as trailing command arguments, and/or loaded via text file (with one path per line) with the -l option. Directories are scanned recursively for .htm/.html.

Some quick examples:

# Minify one file.
htminl /path/to/index.html

# Tackle a whole folder at once with a nice progress bar:
htminl -p /path/to/html

# Or load it up with a lot of places separately:
htminl /path/to/html /path/to/index.html …

Minification

Minification is primarily achieved through (conservative) whitespace manipulation — trimming, collapsing, or both — in text nodes, tags, and attribute values, but only when it is judged completely safe to do so.

For example, whitespace is not altered in "value" attributes or inside elements like <pre> or <textarea>, where it generally matters.

Speaking of "generally matters", HTMinL does not make any assumptions about the display type of elements, as CSS is a Thing. Just because a <div> is normally block doesn't mean someone hasn't styled one to render inline. This often leaves some whitespace around tags, but helps ensure styled layouts display correctly.

Additional savings are achieved by stripping:

The above list is non-exhaustive, but hopefully you get the idea!

With the exception of CSS — which has its whitespace fully minified — inline foreign content like Javascript and JSON are passed through unchanged. This is one of the biggest "missed opportunities" for byte savings, but also where minifiers tend to accidentally break things. Better a few extra bytes than a broken page!

Cautions

While care has been taken to balance savings and safety, there are a few design choices that could potentially break documents, worth noting before you use it on your project:

Benchmarks

These benchmarks were performed on a Intel® Core™ i7-10610U with four discrete cores, averaging 100 runs. To best approximate feature parity, html-minifier was run with the following flags:

--collapse-boolean-attributes
--collapse-whitespace
--decode-entities
--remove-attribute-quotes
--remove-comments
--remove-empty-attributes
--remove-optional-tags
--remove-optional-tags
--remove-redundant-attributes
--remove-redundant-attributes
--remove-script-type-attributes
--remove-style-link-type-attributes

In terms of size reduction, html-minifier is slightly better, shaving off an extra 1-4%.

But in terms of execution time, HTMinL is hundreds of times faster.

It is important to note that html-minifier is not designed for this particular use case — recursive in-place HTML minification with random non-HTML assets sprinkled about — which goes a long way toward explaining the gross difference in runtime cost.

The per-file entry attempts to correct for this by averaging the times of each individual file, but even there, HTMinL is about forty times faster.

Not too shabby!