So, you think you want a random number, huh?
Welcome to the rabbit hole, my friend.
T H E P R O B L E M W I T H M O D U L O
Using remainder
%
directly with
rand()
gets you biased values. The reason has to do with the Pigeonhole Principle — named because we can visualize pigeons settling into a box of cubby holes.
Suppose RAND_MAX were 19 and (max-min+1) were 6. If I just used remainder, I would get:
0..19 → 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 (pigeons)
0..19 % 6 → 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 (cubby holes)
Notice how you are more likely (4:20 probability) to get a 0 or 1 than you are (3:20 probability) to get a 2, 3, 4, or 5. This is called
bias, and it is basically a guaranteed consequence of using remainder directly on the result of
rand()
(
since RAND_MAX is always going to be a value of the form 2ⁿ-1).
This looks like it shouldn’t matter much, but it actually has big consequences, even in games and systems where the user does not directly see the result of rolling his dice. I’m more likely to get asparagus than broccoli, and I can’t stand asparagus. Monster A is more likely to appear than Monster D, and Monster A is fiddly to defeat. The computer is more likely to roll a rock or paper than scissors, so playing paper every time increases your odds of winning. You get the idea.
The solution to this problem should be fairly obvious:
we need to make sure that each cubby hole fits the same number of pigeons.
O B S E R V A T I O N O N E
The first observation we can make is that we can simply
ignore values we don’t want:
1 2 3 4 5
|
int value;
do value = rand();
while ((value < min) or (value > max));
std::cout << value << "\n";
| |
This works because
rand()
is still just as likely to give us any random value as any other, but we are not packing uneven numbers of pigeons into their holes. We only pay attention to one set of pigeons and their holes, and enforce the idea of
one pigeon per hole.
Simply ignoring pigeons we don’t want does not change the likelihood of any pigeon coming along. It does change the likelihood of a pigeon we want coming along, so we may have to wait, but when a pigeon we want does arrive it is just as likely as any other of our desired pigeons to have arrived.
Hence, there is no bias.
You may have noticed the drawback, though, and it can be significant. If we only want, say, 6 pigeons out 32,000, we can be stuck waiting a while.
Perhaps you can see where we are going with this:
O B S E R V A T I O N T W O
We can put as many pigeons in each hole as we want
as long as each hole gets the same number of pigeons.
Again suppose that RAND_MAX is 19, and (max-min+1) is 6. That gives us
three whole divisions of [0..RAND_MAX] that we can map to a number in the domain [min..max]:
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
└───────────────┘ └───────────────┘ └───────────────┘
Do you see why?
floor(20 / 6) → 3
Notice how there are two values that do not map into a full [min..max] domain. These are the two pigeons we will ignore.
The trick, then, is first to map our [min..max] to [0..N]:
|
int N = max - min; // [0..N] ←→ [min..max]
| |
To convert back, all we need to do is add
min
.
|
int value = value_0_to_N + min;
| |
And second, we must divvy up [0..RAND_MAX] with the correct number of subranges to determine which numbers in [0..RAND_MAX] to ignore:
|
int our_rand_max = (RAND_MAX + 1) / (N+1) * (N+1); // warning!
| |
The above code has a potential problem, but it is unlikely to ever occur in any C-library implementation. Even so, let us be clean about it and avoid the potential overflow error (if RAND_MAX were the maximum
int
value):
1 2 3
|
int our_rand_max = (N == RAND_MAX)
? RAND_MAX
: RAND_MAX - (RAND_MAX % (N+1)) + 1;
| |
Yeah, it gets messy. And, it only works if N ≤ RAND_MAX. If (N > RAND_MAX) we've got new problems, but for now we will simply pretend they can’t ever happen.
Unless someone wants an answer here, but it involves modifying how we get a random value to begin with.
1 2 3 4 5 6 7 8 9 10 11
|
int random_int_in_range( int min, int max )
{
int N = max - min; // [0..N] ←→ [min..max]
int our_rand_max = (N == RAND_MAX)
? RAND_MAX
: RAND_MAX - RAND_MAX % (N+1) + 1;
int value;
do value = rand();
while (value > our_rand_max);
return value % (N+1) + min;
}
| |
Okay, that’s messy enough, but it is how to do it in C.
C + + M A K E S T H I S E A S Y
In fact, C++ makes this so easy it hurts to look at.
1 2 3 4 5 6 7 8
|
#include <random>
std::minstd_rand rng( std::random_device{}() ); // our RNG, properly seeded
int random_int_in_range( int min, int max )
{
return std::uniform_int_distribution <> ( min, max )( rng );
}
| |
That’s it!
C++ even makes it easy to get
floating-point numbers:
1 2 3 4
|
int random_float_in_range( double min, double max )
{
return std::uniform_real_distribution <> ( min, max )( rng );
}
| |
You don’t even have to use
minstd_rand
. You can use any random number generator you wish. Heck, you could just use the
random_device
as your RNG.
1 2 3 4 5 6 7 8 9
|
int random_int_in_range( int min, int max )
{
return std::uniform_int_distribution <> ( min, max )( std::random_device{}() );
}
int random_float_in_range( double min, double max )
{
return std::uniform_real_distribution <> ( min, max )( std::random_device{}() );
}
| |
There is a caveat here:
random_device
is an evil, heartless bastard child that can throw for all kinds of stupid reasons. Fortunately, if you’ve got yourself a modern compiler (no more than 4 years old), then it should behave. If not then you are better off seeding a different RNG using the clock in <chrono>. Let me know if you need to know more.
Again, if anyone wants a C-version, post and we can cover it.
Hope this helps.