Code Structure, Random Number Generation and Conditional Probability

I had an interesting discussion this afternoon which highlighted how important it is to understand conditional probability and how code structure can produce unexpected results.

The code contained three lists and a random number generator. The random number generator was meant to randomly chose which list to take something from with equal probability (that is 1/3 chance of selecting from each of the lists). However, somehow one of their lists was being selected far more than the others.

The code looked similar to the following:

if (random.nextInt(3) == 0) {
  // select from list 1
} else if (random.nextInt(3) == 1) {
  // select from list 2
} else {
  // select from list 3
}

Can you spot the bug? Why would this not generate equal probabilities of selecting from each list?

I wrote some of my own code to test this code structure almost 100 million times:
int count0 = 0;
int count1 = 0;
int count2 = 0;

// loop almost 100 million times
for (int i = 0; i < 99999999; i++) {
  if (r.nextInt(3) == 0) {
    count0++;
  } else if (r.nextInt(3) == 1) {
    count1++;
  } else {
    count2++;
  }
}

Which generated the results:
Count0 = 33333271
Count1 = 22221788
Count2 = 44444940

Which demonstrates this code doesn’t generate equal probabilities.

The problem is the second random number generation:

if (r.nextInt(3) == 0) {
  count0++; // this has 1/3 probability (0.333333333333333)
} else if (r.nextInt(3) == 1) { // this else will run 2/3 of the time
                                // and resolve true 1/3 of the times it runs

  count1++; // 1/3 * 2/3 this line will run... i.e. 2/9 probability (0.22222222222)
} else {
  count2++; // 2/3 * 2/3 this line will run... i.e. 4/9 (0.4444444444444)
}

Restructuring the code to only generate the random number once and storing this in a temporary variable makes a huge difference:

int rInt = r.nextInt(3); // only generate a random number once.
if (rInt == 0) {
  count0++; // this will run 1/3 of the time.
} else if (rInt == 1) {
  count1++; // this will run 1/3 of the time.
} else {
  count2++; // this will run 1/3 of the time.
}

And the results of this are:
Count0 = 33333907
Count1 = 33332586
Count2 = 33333506

Which looks much better.

(Note: alternatively you could make the second random number generation be out of 2 rather than 3 but this would be slower as the random number generator will take a few CPU cycles to calculate.)