The Art of Database Partitioning

So you have a database, and within that database, you have beautifully optimized tables. Everything is nice and efficient, your queries are fast, and life is grand. Things continue on this way for a good long while, until one day, you cross some magic threshold where suddenly the volume of data is starting to make things noticeably sluggish.

Welcome to Scale, bitch.

Scale is that fun reality in which a spider may jump consequence-free from a second-story window, while an elephant may not. An application’s Scale Animal is basically the product of its inefficiencies (a constant) and the number of operations being performed.

Even though your database is the Platonic Ideal, there is one inefficiency you can’t escape: spacetime. As Einstein discovered, there is a fixed upper limit to how fast information can travel. As the amount of data stored in a table grows, it will begin to take longer to read all that crap into memory and filter and sort a result set. You can’t argue with physics, but that doesn’t mean you can’t restructure things to moot the problem.

Enter Partitioning

Database partitioning is what it sounds like. Instead of storing all of a table’s data in one big-ass file, that data is spread across many smaller files according to some sort of key-based rule. When a table has been properly partitioned, a query need not worry about the entire dataset; it can focus its efforts entirely on the contents of a single partition. That means less data to read into memory, less data to parse, and faster query results.

Of course, partitions don’t come free. They bring extra complexity that MySQL now has to deal with, and if the data a query needs is spread across multiple partitions, it will take longer for it to crunch the results. In general, it isn’t worth partitioning a table unless it has millions of rows.

About Keys

Partitions can be divided up any which way, but the condition must be part of a unique key (usually the primary), and should make things as easy as possible for the most common types of queries being run against a given table. For something like a large user table, this probably means breaking data up into ID ranges, while for something like an activity log, it probably makes more sense to divide things up by date instead.

This article will demonstrate the latter use case as it is both common and not something MySQL is very good at dealing with. Haha.

Let’s give all this theory an actual table structure:

# Store CPU use data.
CREATE TABLE `stats_cpu` (
	`server_id` smallint(6) NOT NULL,
	`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
	`idle` int(10) unsigned NOT NULL DEFAULT '0',
	`used` int(10) unsigned NOT NULL DEFAULT '0',
	`window` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'seconds',
  PRIMARY KEY (`server_id`,`date_created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

This is one of a handful of tables from a centralized server health-monitoring system that tracks various metrics (in this case CPU use). Each server reports its CPU use details once per minute. That means this table (and others like it) grows by 1440 records per day, per server. With hundreds of servers reporting, the numbers get a little crazy.

Scale.

This volume does not really affect write performance, but reads can be significantly impacted, particularly since many queries pull stats for a relative window like “the past 5 minutes”, which changes every minute (i.e. cache is no help).

Date Partitions

The table contains a single, composite primary key: server_id and date_created. Either could be used for partitioning, but because data relevance tends to drop off with age, a date-based approach makes the most sense. From a performance standpoint, it is also beneficial to partition on a field that appears toward the end of its composite index, as the early parts are already well-optimized by MySQL.

Range-based partitions require a numeric value, so timestamp fields will not work as-are, however we can leverage MySQL functions like UNIX_TIMESTAMP() to make it all work. Functions are a potential performance bottleneck, so it is important to benchmark potential solutions before implementing them on the real tables. Certain functions, like YEAR() and TO_DAYS() are specifically optimized for partitions, so if you can use those, do.

To get started, let’s alter the table to give it two generic partitions. These won’t be immediately useful, but as we’ll see shortly, partitions can be reorganized and redivided as needed.

# Alter the table.
ALTER TABLE `stats_cpu`
PARTITION BY RANGE (UNIX_TIMESTAMP(`date_created`))
(
	PARTITION p0 VALUES LESS THAN (0),
	PARTITION pfuture VALUES LESS THAN MAXVALUE
);

The partition definitions are ordered top/down like an if/elseif/else chain. The first partition, p0, is sort of a catch-all for bad data; UNIX_TIMESTAMP(`date_created`) should never be less than 0, but its inclusion here gives us an intuitive lower boundary. MAXVALUE is a reserved keyword that stands in for the maximum value, whatever that happens to be at any moment.

With just these two partitions, nothing much has changed to the table’s storage. p0 will be empty; all data will wind up in pfuture.

Cut It Up

So we’ve decided to partition the table by date ranges, and we’ve laid the foundation. But now we need to think carefully about just what kind of ranging would work best. There is no generic answer; it all depends on the nature of the application and its queries.

If, for example, the application runs the vast majority of its queries against data from the last 24 hours, then the solution might simply be to declare a partition for today’s data (ptoday), and a partition for everything else (pold). The pold chunk will be monstrously large, but if queries against it are rare, who cares? The much-used ptoday chunk, on the other hand, will be as small as possible.

But if the application queries data from all different periods, as this one happens to do, a more step-based partitioning scheme will work better. The ideal step size depends entirely on the specifics of the application, but in general you’ll want to limit the number of partitions being created, while also ensuring there are enough partitions to chop the data into workable chunks. For this particular application, dividing data by month worked well, so let’s do that.

Right now, it is August of 2017, so let’s start by creating a partition to hold values for August:

# Reorganize pfuture.
ALTER TABLE `stats_cpu`
REORGANIZE PARTITION pfuture INTO (
	PARTITION p201708 VALUES LESS THAN (UNIX_TIMESTAMP('2017-09-01 00:00:00')),
	PARTITION pfuture VALUES LESS THAN (MAXVALUE)
);

The partition name, p201708 is arbitrary, but your life will be easier if the partition names follow some sort of standard. Since we’re dividing this table up by month, that means names like p201709, p201710, etc.

It is also worth pointing out that the conditional date is for September, not August, since the condition is “less than”.

Automation

This type of partitioning requires periodic updates, because each partition is explicitly defined. You could pre-make monthly partitions going well into the future and forget about it (until the future comes), but you could also schedule a script to do the dirty work for you as needed. The following is an example in PHP that can run daily.

<?php

// When and what.
$tomorrow = date('Y-m-d', strtotime('+1 day'));
$partition = 'p' . date('Ym', strtotime($tomorrow));
$max = date('Y-m-01', strtotime('+1 month', strtotime(substr($tomorrow, 0, -2) . '01')));

// Check whether a partition already exists by asking
// INFORMATION_SCHEMA.
if (false === db::get_var("
	SELECT `PARTITION_NAME`
	FROM INFORMATION_SCHEMA.PARTITIONS
	WHERE
		`TABLE_NAME`='stats_cpu' AND
		`TABLE_SCHEMA`='your_database_name' AND
		`PARTITION_NAME`='$partition'
")) {
	// Create the partition.
	db::query("
		ALTER TABLE `stats_cpu`
		REORGANIZE PARTITION pfuture INTO (
			PARTITION $partition VALUES LESS THAN (UNIX_TIMESTAMP('$max')),
			PARTITION pfuture VALUES LESS THAN (MAXVALUE)
		)
	");
}

Alternative Approach

Ranges are just one kind of partitioning. If you’d rather not manually curate the partitioning table, you could instead give MySQL a formula it can hash, tell it how many partitions to use, and it will handle the rest.

For example, the following will split up data (roughly) by day, splitting storage across 20 partitions:

# 86400 = 60 * 60 * 24, i.e. seconds per day.
ALTER TABLE `cpu_stats`
PARTITION BY HASH(UNIX_TIMESTAMP(`date_created`) DIV 86400)
PARTITIONS 20;

Note, the DIV operator is supported but / is not as results must be an integer.

Partition Management

While outside the scope of this article, it is also worth mentioning that many of the operations one would normally perform against a whole table can also be performed against a given partition. This includes things like analyze, optimize, truncate, and drop. The latter is particularly beneficial in systems that need to periodically purge old records, as the performance of a DROP statement versus a DELETE ... WHERE is night and day.

To see examples or more information about partition operations, hop on over to the associated reference page.

Closing Thoughts

When partitions help, they really help. Like everything else related to database design, it’ll take plenty of benchmarking to find the magic balance, but it’s worth it to help maintain that new table smell.