Randomizing Weighted Choices in Javascript

There are many circumstances in which you might wish to choose a random value from a fixed list of choices. In Javascript, the simplest way to accomplish this would be something like:

const randomIndex = Math.floor(Math.random() * arr.length);
const randomValue = arr[randomIndex];

The Math.random() function returns a random float between 0.0 and 1.0. The lower boundary is inclusive, the upper boundary is exclusive (e.g. it will never return 1.0). By multiplying this against the length of the array, then flooring the value to make sure we wind up with a proper integer, we have ourselves a random index.

This approach works perfectly well when all choices are weighted equally, but what if we want to treat some values as more likely than others?

The Setup

The first thing we'd need to do is decide how we want to format our data so that choices can have weight. There is no Absolute Best Practice here, but ideally the solution should strike a balance between runtime performance and code readability. Something like this:

const data = [
    ['Apples', 10],
    ['Bananas', 2],
    ['Carrots', 5],
    ['Dates', 1],
    ['Eggplant', 3],
    ['Figs', 1],
    ['Gourds', 1],
];

Here we have an array of arrays, the inner ones containing a value in at the top followed by a weight. This allows us to say that "Apples" should be 10x more common than, say, "Dates", without us having to type "Apples" ten separate times into a flat array. Splitting values in this manner helps keep code readable, and at scale, helps reduce code bloat.

We could just as easily have thrown together an array of objects, like { value: 'Apples', weight: 10 }, instead of an array of arrays. Objects are more expensive than arrays so it probably isn't worth going this route when you've only got a "value" and a "weight" to store, but it's an option.

At any rate, with a data structure now in hand, how would we go about selecting a random value?

Approach #1: Flatten the Array

The most obvious route might simply be to build a new flat array, then use the same code at the beginning of the article to return a random selection.

// Storage for our flat array.
let out = [];

// Loop through the master entries.
for (let i = 0; i < data.length; ++i) {
    // Push the value over and over again according to its
    // weight.
    for (let j = 0; j < data[i][1]; ++j) {
        out.push(data[i][0]);
    }
}

// And done!
return out[Math.floor(Math.random() * out.length)];

This approach is intuitive, but array allocation is expensive, and in this case, isn't actually needed at all.

Approach #2: Pre-Sum the Weights

If we think about the problem for a moment, it becomes apparent that the one piece of information we really need is the total. In our previous example, that's what out.length gave us. But we don't need an array just for its size; we can use a number for that!

// First, we loop the main dataset to count up the total weight.
// We're starting the counter at one because the upper boundary
// of Math.random() is exclusive.
let total = 0;
for (let i = 0; i < data.length; ++i) {
    total += data[i][1];
}

// Total in hand, we can now pick a random value akin to our
// random index from before.
const threshold = Math.random() * total;

// Now we just need to loop through the main data one more time
// until we discover which value would live within this
// particular threshold. We need to keep a running count of
// weights as we go, so let's just reuse the "total" variable
// since it was already declared.
total = 0;
for (let i = 0; i < data.length - 1; ++i) {
    // Add the weight to our running total.
    total += data[i][1];

    // If this value falls within the threshold, we're done!
    if (total >= threshold) {
        return data[i][0];
    }
}

// Wouldn't you know it, we needed the very last entry!
return data[data.length - 1][0];

This approach has a number of advantages over the original:

The total is stored as a number, requiring the same (insignificant) amount of memory regardless of how many choices the list contains.
We've reduced the number of loops to just two, again regardless of how many choices happen to be in the list.
Because we're bailing as soon as an answer is found, it is almost never necessary to run through the entire second loop.

Update 2020-12-30: thanks to reader @Brad for adjusting the example logic to account for zero weights.

Performance

Even with simple, small datasets like our example here, the second approach is hilariously more efficient.

	Josh Stoik 19 October 2019
Previous	Better Binary Batter: Mixing Base64 and Uint8Array
Next	Developing a Web Site Without Node/NPM in 2020