NOTE 15M: How accurately does the infinite sites model describe this sample? The basic assumption of the infinite sites model is that every mutation is at a new site. One consequence of this is that each SNP segregates for only two alleles: If three or four segregate, then there must have been multiple mutations at that site. In this case, there are expected to be 283 segregating sites out of 10,000 bp. Therefore, the chance that any one mutation occurs at a site that is already segregating is 283/10,000 = 0.0283. However, the chance that all 283 are at different sites is

So, although the approximation is very accurate for any particular site, there are likely to be a few sites with multiple mutations. This calculation is similar to finding the chance that 20 people all have different birthdays; there is about 40% chance that some will in fact share the same birthday.