I had an interesting discussion this afternoon which highlighted how important it is to understand conditional probability and how code structure can produce unexpected results.
The code contained three lists and a random number generator. The random number generator was meant to randomly chose which list to take something from with equal probability (that is 1/3 chance of selecting from each of the lists). However, somehow one of their lists was being selected far more than the others.
The code looked similar to the following:
if (random.nextInt(3) == 0) {
// select from list 1
} else if (random.nextInt(3) == 1) {
// select from list 2
} else {
// select from list 3
}
Can you spot the bug? Why would this not generate equal probabilities of selecting from each list?
I wrote some of my own code to test this code structure almost 100 million times:int count0 = 0;
int count1 = 0;
int count2 = 0;
// loop almost 100 million times
for (int i = 0; i < 99999999; i++) {
if (r.nextInt(3) == 0) {
count0++;
} else if (r.nextInt(3) == 1) {
count1++;
} else {
count2++;
}
}
Which generated the results:Count0 = 33333271
Count1 = 22221788
Count2 = 44444940
Which demonstrates this code doesn’t generate equal probabilities.
The problem is the second random number generation:
if (r.nextInt(3) == 0) {
count0++; // this has 1/3 probability (0.333333333333333)
} else if (r.nextInt(3) == 1) { // this else will run 2/3 of the time
// and resolve true 1/3 of the times it runs
count1++; // 1/3 * 2/3 this line will run... i.e. 2/9 probability (0.22222222222)
} else {
count2++; // 2/3 * 2/3 this line will run... i.e. 4/9 (0.4444444444444)
}
Restructuring the code to only generate the random number once and storing this in a temporary variable makes a huge difference:
int rInt = r.nextInt(3); // only generate a random number once.
if (rInt == 0) {
count0++; // this will run 1/3 of the time.
} else if (rInt == 1) {
count1++; // this will run 1/3 of the time.
} else {
count2++; // this will run 1/3 of the time.
}
And the results of this are:Count0 = 33333907
Count1 = 33332586
Count2 = 33333506
Which looks much better.
(Note: alternatively you could make the second random number generation be out of 2 rather than 3 but this would be slower as the random number generator will take a few CPU cycles to calculate.)