Hmm.
Insects and bacteria would have an advantage in approximating this.
Yes. But a million individuals will suffice to give you an answer so close to that of an infinite number that it doesn't matter.
The binomial distribution is used when (a) there are two possible outcomes of a trial, (b) the probability of each outcome remains the same across all trials, and (c) all trials are independent of each other. Here, the two possible outcomes (
i.e., the allele sampled in a gamete) are
A and
a, the probability of sampling a gamete with the
A allele is
p, there are
n = 2N trials (i.e., gametes sampled), and
k of these trials result in the
A allele. The term [
n! / (
k! (
n -
k)!)] gives the number of ways that one can observe exactly
k "successes" (defined here as
A alleles) and
n -
k "failures" (defined here as
a alleles). The term
pk (1-
p)
n -
k is the exact probability of observing any given order of
k "successes" and
n -
k "failures." Therefore, the product of these terms gives the exact probability of observing
k "successes" and
n -
k "failures," given that one is unconcerned about their order.
Using the formula for the binomial distribution, we can calculate the exact probability that
k = 2
pN for a range of
N. Doing so yields the following results:
At first glance, these results might seem backward. According to the table, the probability that the allele frequencies will remain unchanged is higher for the smaller populations! However, that's only part of the story. In all of these cases, it's more likely that the allele frequencies will change, and it is actually the magnitude of the change that matters. To see what this means, let's focus on those populations where
N = 5,
N = 50, and
N = 500. Figures 1 through 3 show the probabilities of allele frequencies in the next generation of each of these populations.
Population size (N)..........Pr(
k |
p,
n = 2
N)
2............................................ .375
5............................................ .246
10.......................................... .176
50......................................... .080
100........................................ .056
500........................................ .025
1,000.................................... .018
10,000................................. .006
As you see, the likely error drops off asymptotically as the population size increases.
Other assumptions:
...there is no migration, gene flow, admixture, mutation or selection...
No mutation? I thought mutation was how natural selection worked.
If the Hardy Wienberg Equillibrium holds, it's an indication that natural selection is
not acting on that particular gene. Hence, if it doesn't hold, it indicates natural selection at work.
Where p2 represents the frequency of the homozygous dominant genotype, q2 represents the frequency of the recessive genotype and 2pq is the frequency of the heterozygous genotype.
This doesn't really seem to estimate the ratio of helpful to counterproductive mutations. Maybe I misunderstood.
If the equation doesn't hold, the allele with a greater than expected frequency has a higher selection value than the other. In real life, most alleles aren't "good" or "bad"; one just might have a better fitness than the other, with neither actually harmful.