




Evolution Chapter 15 Answers
Answer 15.1

i) 
After one generation, there is a 1/3 chance that there are no copies and 2/3 chance that there are two. If there were no copies, then in the second generation, there would still be none. If there are two copies, both may die (chance 1/9), or one may divide and the other may die (chance 2 × 1/3 × 2/3 = 4/9), or both may divide, giving four copies (chance 4/9). The chance of there being no copies is therefore 1/3 + 2/3 × 1/9 = 11/27; the chance that there are two copies is 2/3 × 4/9, and the chance that there are four copies is 2/3 × 4/9. Continuing in this way, we get the probability distribution over time; however, successive generations take more and more calculation. (See Table P15.2.)

ii) 
The expected number of copies increases by 2 × 2/3 = 4/3 in every generation. After ten generations, the expected number is (4/3)^{10} = 17.76. However, this is an average over a very broad distribution.

iii) 
The probability that the mutant allele will be lost is the chance that it dies before reproducing (1/3) plus the chance that it reproduces (2/3) and that both its offspring leave no descendants (Q^{2}). Thus, Q = 1/3 + (2/3)Q^{2}. This has solution Q = 1/2. (Q = 1 is a trivial solution: If an allele is lost, it stays lost.) (This can be found using the formula for a quadratic equation or by drawing graphs of Q and 1/3 + (2/3)Q^{2} and finding where they cross.) This derivation is explained in Chapter 28.

iv) 
Now, there is probability 0.01 of zero copies, 0.97 of one, and 0.02 of two, after 1 minute. The expected number of copies is 0.97 × 1 + 0.02 × 2 = 1.01, and so after t minutes, the expected number is 1.01^{t} ~ exp(0.01t). (See Fig. P15.2.)

v) 
We now have a recursion Q = 0.01 + 0.97Q + 0.02Q^{2}, which still has solution Q = 1/2. The probability of loss is the same, whether we have reproduction in discrete generations or in continuous time.

vi) 
The overall expectation, exp(0.01t), is the sum of probability Q=1/2 of ultimate loss, times zero, plus probability P = (1 – Q) = 1/2 of not being lost, times the expected number given that the allele is not lost, , exp(0.01t) = (1/2) × 0 + (1/2) × . Therefore, = exp(0.01t)/(1/2) = 2 exp(0.01t). (This is an application of Bayes’ Theorem; see Box 28.4.)

Answer 15.2

i) 
In the first generation, the variance of allele frequency is p_{0}q_{0}/2N = (0.3 × 0.7)/(2 × 20) = 0.00525. The distribution will be approximately normal, with standard deviation _{} = 0.073. After five generations, the distribution is approximately normal, with variance five times as large (5 × 0.00525 = 0.02625, s.d. 0.162). However, this overlaps the lower boundary at p = 0, and so just assuming that the allele frequencies diffuse outward at a constant rate has become a poor approximation.
The exact formula for the variance at time t is given in Box 15.1 as p_{0}q_{0}(1 – (1 – 1/2N)^{t}) (i.e., 0.00525, 0.0250, 0.134 at t = 1, 5, 40 generations). At large times, populations fix for either Q or P, ultimately with probability 0.7, 0.3, respectively; allele frequencies in any populations that are not fixed are roughly uniformly distributed. The form of the distribution is shown in Figure 15.3A for a starting frequency of p_{0} = 0.5; it is shown for the present example in Figure P15.3.

ii) 
We need to find the expected proportion of heterozygotes formed by random mating within a population with allele frequency p, E[2pq]. This is derived in Box 15.1 as 2(p_{0}q_{0}var(p)) = 2(p_{0}q_{0}(1 –1/2N)^{t}), or 0.420, 0.410, 0.370, 0.153 at t = 0, 1, 5, 40 generations. Because the overall allele frequency over the whole collection of populations stays constant at p_{0} = 0.3, the heterozygosity of a random offspring with parents from different populations also stays constant at 2p_{0}q_{0} = 0.42. In this sense, the overall genetic diversity stays constant but shifts from variation within populations to variation between them.

Answer 15.3

i) 
For a neutral allele, the allele frequency stays constant on average (M = 0), and the variance of fluctuations in allele frequency is V = p(1 – p)/2N_{e}. So, from the equation in Chapter 28:

ii) 
If we substitute ψ = C e^{–t/2Ne} into the first equation, we find that it is a solution.

iii) 
Substituting into the second equation,
We see that this is satisfied by A_{0} = C_{0} – (C/2) exp(–t/2N_{e}), where C_{0} is some unknown constant; it can be satisfied similarly for A_{1}.

iv) 
The constraints on the sum, the mean, and the variance are
The first two equations lead to C_{0} = q_{0}, C_{1} = p_{0} (i.e., the ultimate probabilities of loss or fixation equal the initial allele frequencies). The last constraint is then
so that
Using q_{0}^{2} + 2p_{0} q_{0} + p_{0}^{2} = 1, we find that C = 6p_{0}q_{0}. For example, at t = 4N_{e}, the chance that a population will still be polymorphic is 6p_{0}q_{0} exp(–2) ~ 0.17 for p_{0} = 0.3. NOTE 15B

Answer 15.4

i) 
The additive genetic variance decreases by a factor (1 – 1/2N_{e}) per generation (see p. 418). Initially, V_{A(0)} = V_{E}, and so after 30 generations, V_{A(30)} = (1 – 1/2N_{e})^{30}, V_{E} = 0.362V_{E}, and the heritability is V_{A}/(V_{A} + V_{E}) = 0.266.

ii) 
Mutation increases additive variance by V_{m}, whereas drift reduces it by –V_{A}/2N_{e}. When mutation balances drift, V_{m} = V_{A}/2N_{e}, and so V_{A} = 2N_{e}V_{m}. Here, this is 2 × 15 × 0.005V_{e} = 0.15V_{e}. The heritability is then h^{2} = 0.15/1.15 = 0.13. NOTE 15C

iii) 
The variance of the trait mean increases by V_{A(0)}/N_{e} in the first generation. In the next generation, it increases by a further V_{A(1)}/N_{e} = (V_{A(0)}/N_{e}) (1 – 1/2N_{e}). After t generations it will equal
After 30 generations, the variance in trait mean will be 2V_{E} × 0.362 = 0.724V_{e}; its s.d. tends to 0.851 _{}; because V_{P} = 2V_{E} initially, the s.d. of the trait mean equals 0.602 _{}. In the indefinite future, the variance of the trait mean tends to a maximum of 2V_{A(0)}; its s.d. is equal to the initial phenotypic standard deviation. NOTE 15D

Answer 15.5

i) 
and so
NOTE 15E
Roughly equal amounts of drift occur during the brief bottlenecks as during the longer periods of larger size.

ii) 
From Box 15.2,
and so N_{e} = 36.

iii) 
From Box 15.2, 1/N_{e} = (2 + V)/4N. The mean offspring number is
so that the population is in steady state. The variance of offspring number, V, is
three times the variance of a Poisson distribution with the same mean. Thus, N_{e} = 4 × 100/(2 + 6) = 50. NOTE 15F

Answer 15.6

i) 
First, consider the chance that a randomly chosen gene in the daughter is IBD with a random gene in the queen. This is the chance that the daughter’s gene came from the queen times the chance that it was the same gene as we chose from the queen: f_{QD}= 1/4. Now, consider the chance that two genes, one from each daughter, are IBD. This is the chance that they both came from the father (1/4), in which case they must be the same gene, plus the chance that they both came from the queen, times the chance that they are the same gene (1/4 × 1/2 = 1/8). So, f_{DD} = 3/8.

ii) 
If the queen mates with many different males, then if two genes in the daughters both came from the father, they will have come from different fathers and will be unrelated. So, the coefficient of kinship between daughters falls to 1/8, the chance that they both came from the same gene in the queen. (See Fig. P15.4.) NOTE 15G

Answer 15.7

i) 
We use the formula in Box 15.3 for the probability of identity by descent:
There are two loops, one via the grandfather and one via the grandmother, and each contain six individuals. Measuring the probability of identity relative to the grandparents’ generation, f_{A} = 0 by definition. Thus,
(See Fig. P15.5.)

ii) 
The chance that the offspring is homozygous for the recessive disease allele is the chance that its genes are not IBD times the frequency of homozygotes in a random pair of genes, plus the chance that they are IBD times the chance that the ancestral gene was a disease allele: fp + (1 – f)p^{2} = 0.00031. This is much higher than with no inbreeding: p^{2} = 0.000025.

iii) 
Now, we just multiply by (1 + f_{A}) for both of the two loops, with f_{A} = 1/16, giving (17/16)(1/32) + (17/16)(1/32) = 17/256, not much greater. The incidence of the disease is now 0.00036.

iv) 
We need to work out the average IBD between the two genes in a randomly chosen individual. If that individual does not come from an inbred marriage, then they will not be IBD. If it does, they will be IBD with probability f = (1/16)(1 + f_{A}). The overall probability of identity, therefore, is f = (1/10)(1/16)(1 + f_{A}). But, the average probability of identity in the ancestor, f_{A}, must equal the current identity if the population is at steady state. So, f = f_{A} = 1/159. (The correction for inbreeding in the ancestor has made hardly any difference.) The incidence of the disease is fp + (1 – f)p^{2} as before, which is 0.000056. Roughly equal numbers of cases are from inbred and outbred marriages (the two terms in the expression fp + (1 – f)p^{2}). NOTE 15H

Answer 15.8

i) 
The expected time back to the MRCA of two lineages is just 2N_{e} generations.

ii) 
There are three pairs of lineages that might coalesce, so the expected time back to the first coalescence is (2N_{e}/3) generations. The expected time back to the MRCA of the sample of three is (2 + 2/3)N_{e} = (8/3)N_{e}.

iii) 
With five lineages, there are 5 × 4/2 = 10 pairs of lineages, so the expected time is 2N_{e}(2/(5 × 4)) = 2N_{e}/10. In total,
NOTE 15I NOTE 15J

iv) 
The chance that this new lineage does not coalesce with the five others, during the time T _{5} for which they exist, is exp(–5T_{5}/2N_{e}). Similarly, the chance that it also does not coalesce with four lineages while those four exist is exp(–4T_{4}/2N_{e}). So, the chance that there is now an older MRCA is the chance that the sixth lineage does not coalesce during T_{5}, T_{4}, T_{3}, or T_{2}: exp(–5T_{5}/2N_{e})exp(–4T_{4}/2N_{e})exp(–3T_{3}/2N_{e})exp(–2T_{2}/2N_{e}).
A crude approximation is that T_{5} = 2N_{e}/10, T_{4} = 2N_{e}/6, T_{3} = 2N_{e}/3, T_{2} = 2N_{e}. This gives exp(–1/2)exp(–2/3)exp(–1)exp(–2) = exp(–4.167) = 0.0155. The exact result uses the fact that the expectation of exp(–aT_{k}/(2N_{e}), when T_{k} is exponentially distributed with rate (k(k – 1)/2)/(2N_{e}), is (k(k – 1)/2)/(a + k(k – 1)/2). Hence, the probability is 1/15 = 0.067. This is somewhat higher, because this calculation has allowed for random variation in the coalescence times.

Answer 15.9

i) 
The expected time to coalescence of a randomly chosen pair is 2N_{e} generations, and so they have had on average 4N_{e} generations to diverge. Therefore, the probability that two sites differ between the pair is π = 4N_{e} μ = 0.01.

ii) 
If we assume that every mutation is at a new site (i.e., the infinite sites model), then the expected number of segregating sites is just the number of mutations that have occurred within the genealogy. This is the total length of the genealogy times the total mutation rate, μL, over the length of the sequence, L. Now, we know that the time taken for n lineages to coalesce averages E[T_{n}] = 2N_{e}/(n(n – 1)/2) generations, because there are n(n – 1)/2 pairs that can coalesce. The corresponding length of genealogy is nT_{n}. If we add up over the times for which n, n – 1,..., 2 lineages are present and multiply by the mutation rate, we then have
For n = 10, this equals 2.83 × 4N_{e}μ × L = 283 segregating sites. NOTE 15K

iii) 
If all lineages coalesced in a bottleneck T = 10^{6} generations ago, then the pairwise diversity would still be π = 2Tμ = 0.01. The number of segregating sites is the total mutation rate times the length of the genealogy, nT × μL = 500, about twice the expectation for a constantsized population. This reflects the fact that with a star genealogy, each mutation appears only once and so contributes minimally to pairwise diversity. The expectations become more different for larger samples. With n = 100 genes, for example, we expect 4N_{e}μL(1/99 + 1/98 + ··· + 1) = 518 segregating sites with a constantsized population but 5000 segregating sites with a star genealogy. NOTE 15L NOTE 15M

Answer 15.10

i) 
The time, T, to common ancestry of a pair of sequences is exponentially distributed with mean E[T] = 2N_{e}. The number of differences is Poissondistributed, with mean 2μLT, where L = 1000. Therefore, the overall mean is 2μLE[T] = 4N_{e}μL = 12. (This is just the length of sequence times the nucleotide diversity, Lπ.) The variance of the number of differences is due to variation in coalescence times (var(2μLT) = (2μL)^{2}var(T) = (2μL)^{2}(2N_{e})^{2} = (4N_{e}μL)^{2}) plus variation in the number of mutations, given T; the latter is Poissondistributed and has variance equal to its mean, 4N_{e}μL. So, the total variance is 4N_{e}μL+(4N_{e}μL)^{2} = 12 + 144 = 156; the standard deviation is 12.5, almost all variance is due to variation in coalescence time, rather than the Poisson distribution of number of mutations.

ii) 
If all lineages coalesced 2 × 10^{6} generations back, then pairwise diversity would be the same, but the only variance in number of differences would be due to the Poisson distribution: 4N_{e}μL = 12. (See Fig. P15.6.) NOTE 15N

Answer 15.11

i) 
Table P15.3 shows the frequency of the “1” allele, p, for each site (right column) and classifies the 20 individuals into 7 haplotypes (bottom row). Summing 2pq across all the sites and then dividing by the length of sequence gives (0.25 + 0.25 + ...)/12000 = 0.00047.

ii) 
There are seven haplotypes. The total number of possibilities is 2^{15} = 32,768. The most common would be expected to be all zeros and to have a frequency among haploid genomes equal to the product of the frequencies of the “0” alleles: (1 – 0.50)(1 – 0.20) … = 0.0084. In fact, this haplotype has frequency 0.10.

iii) 
The numbers of copies of the four possible twolocus genotypes for the first two rows are
Thus, 11 is at frequency 0.2, but would be expected to be at 0.5 × 0.2 = 0.1. Therefore, D is the difference between observed and expected frequencies and equals 0.1. This is the maximum possible D for these allele frequencies, so D/D_{max} = 1. The correlation measure is r = 0.5.
For the first and last rows,
Thus, D = 0.3 – 0.35 × 0.5 = 0.125, D′ = 0.125/(0.5 × 0.35) = 0.714, and r = 0.524.

iv) 
Even after simplifying by deleting all rows that are the same and all columns that are the same, the problem still takes some thought:
It is impossible to relate these 20 sequences by a genealogy. The easiest way to see this is to look at rows 3 and 6 in the reduced table (bold), which show all four combinations 00, 01, 10, 11. Thus, there is one allele that distinguishes B,F,G from A,C,D,E and another allele that distinguishes A,B,D,F from C,E,G. These groups overlap, which is impossible unless there has been recombination. In general, seeing a pair of sites that show all four possible combinations (i.e., that are not in the maximum possible linkage disequilibrium) excludes a genealogy. (We could have seen this from iii), where the first and last rows were shown to be in less than maximal LD, i.e., showed all four types.)

Answer 15.12

i) 
The total length of the genealogy is 40,000 generations, and a recombination event anywhere on this ancestry may produce a different genealogy. The rate of such events is 1/40,000, and so the average map distance will be ~0.0025 cM. (Recall that 1 cM corresponds to a recombination rate of 0.01.) More precisely, the map distance will be exponentially distributed with mean 0.0025 cM.

ii) 
The recombination will occur at some time t in the past and is equally likely to be anywhere on either branch. The recombinant lineage will wander back and may coalesce with either of the two branches at a total rate 2/(2N_{e}). (If it coalesces with the original branch, the common ancestor will be the same, but if it coalesces with the other branch, the common ancestor will be more recent.) The chance that it does not coalesce with either branch during time (20,000 – T) is exp(–(20,000 – T) × 2/(2N_{e})), in which case, the common ancestor must be earlier than 20,000 generations. It is convenient to measure time relative to 2N_{e} =20,000 generations, and so we can define T = t/(20,000). The average of this probability over a uniformly distributed time is then
A rough answer, avoiding integration, would be to just set T to its average, 1/2, giving a probability exp(–1) = 0.367.

Answer 15.13
As we trace back, lineages on the same genome recombine apart at a rate r = 10^{–4} per generation (0.01 cM means a recombination rate of 10^{–4}). Conversely, lineages on different genomes coalesce, coming together in a single individual, at a rate 1/2N_{e}. Suppose that there is a chance x that they are together. They split apart at a rate rx but come together at a net rate (1 – x)/2N_{e}. At equilibrium, these rates balance, and so rx = (1 – x)/2N_{e}. Therefore, the chance that they are together in the same genome is x = 1/(1 + 2N_{e}r). In this example, x = 1/3. NOTE 15O

References
Clayton G.A. and Robertson A. 1955. Mutation and quantitative variation. Am. Nat. 89: 151–158.
Lynch M. and Hill W.G. 1986. Phenotypic evolution by neutral mutation. Evolution 40: 915–935.


