NOTE 13A: Actually, the common ancestor may have lived some time before the physical separation of the two populations, and so the rate will be lower than this. The estimated rate is uncertain both because we do not know exactly when the common ancestor lived and because the number of mutations that have occurred is random.
NOTE 13B: Any number of other scenarios could be imagined, involving more recombination events and/or more mutations. Which scenario is most likely depends on the relative plausibility of recombination or mutation. If we know the relative rates of mutation and recombination, then we can weight them appropriately. Under the hypothesis shown here, some mutations occur twice (e.g., position 12 changes between a and c, and also between j and m). That is not implausible for a mutation caused by loss or gain of a restriction site, as shown here, but would be unlikely for (say) insertion of a transposable element at a particular site. Again, if we knew the relative plausibility of different kinds of mutation, we could better estimate the genealogy.
(See Table P13.3 and Figure P13.4.)
NOTE 13C: The typical distance between stop codons in random sequence is far shorter than the typical length of a protein, and so the simplest method of identifying a protein-coding gene is to look for a long open reading frame. Similarly, a duplicate copy of a gene whose function is no longer maintained by selection will soon accumulate stop codons, producing a nonfunctional pseudogene.
NOTE 13D: Such ultraconserved segments of the genome must have been maintained by purifying selection for long times, and so must have some conserved function; see, e.g., Bejerano et al. (2004).
NOTE 13E: We have assumed that all codons are equally likely, which is not at all the case. The fraction of synonymous mutations will vary with codon usage; see p. 543. Programs that estimate phylogenetic relationships from coding sequence must take such biases into account.
NOTE 13F: The robustness of the genetic code may be an adaptation, which evolved to reduce sensitivity to mutation and to errors in translation—especially, the tendency of transfer RNAs to mispair at the third position. However, it may also reflect the historical origin of the code: For example, originally, U at the first position might have coded for a single hydrophobic amino acid, and only later was the code elaborated to include five different hydrophobic amino acids. These issues are discussed briefly on pages 105, 118, and 560; for a review, see Knight et al. (2001).