Lecture 20: Tests of Neutrality November 6, 2015.

28
Lecture 20 : Tests of Neutrality November 6, 2015

Transcript of Lecture 20: Tests of Neutrality November 6, 2015.

Page 1: Lecture 20: Tests of Neutrality November 6, 2015.

Lecture 20 : Tests of Neutrality

November 6, 2015

Page 2: Lecture 20: Tests of Neutrality November 6, 2015.

Last TimeMutation and selection

Infinite alleles and stepwise mutation models

Introduction to neutral theory

Page 3: Lecture 20: Tests of Neutrality November 6, 2015.

Today Sequence data and quantification of variation

Infinite sites model

Nucleotide diversity (π)

Sequence-based tests of neutrality

Ewens-Watterson Test

Tajima’s D

Hudson-Kreitman-Aguade

Synonymous versus Nonsynonymous substitutions

McDonald-Kreitman

nov6_neutraltest

Page 4: Lecture 20: Tests of Neutrality November 6, 2015.

The main power of neutral theory is it provides a theoretical expectation for genetic

variation in the absence of selection.

Page 5: Lecture 20: Tests of Neutrality November 6, 2015.

Equilibrium Heterozygosity under IAM

Frequencies of individual alleles are constantly changing

Balance between loss and gain is maintained

4Neμ>>1: mutation predominates, new mutants persist, H is high

4Neμ<<1: drift dominates: new mutants quickly eliminated, H is low

Page 6: Lecture 20: Tests of Neutrality November 6, 2015.

Effects of Population Size on Expected Heterozgyosity Under Infinite Alleles Model (μ=10-5)

Rapid approach to equilibrium in small populations

Higher heterozygosity with less drift

Page 7: Lecture 20: Tests of Neutrality November 6, 2015.

Fate of Alleles in Mutation-Drift Balance

Time to fixation of a new mutation is much longer than time to loss

Generations from birth to fixation

Time between fixation events

Page 8: Lecture 20: Tests of Neutrality November 6, 2015.

Fate of Alleles in Mutation-Drift-Selection Balance

Purifying Selection

Neutrality

Balancing Selection/Overdominance

Which case will have the most alleles on average at any given

time?What will this depend upon?Highest HE?

Page 9: Lecture 20: Tests of Neutrality November 6, 2015.

Assume you take a sample of 100 alleles from a large (but finite) population in mutation-drift

equilibrium.

A.

Number of Observations of Allele

Num

ber o

f Alle

les

2

4

6

8

10

2 4 6 8 10

B.

2 4 6 8 10

C.

2 4 6 8 10

What is the expected distribution of allele frequencies in your sample under neutrality and the Infinite Alleles

Model?

Page 10: Lecture 20: Tests of Neutrality November 6, 2015.

Allele Frequency Distributions Neutral theory allows a

prediction of frequency distribution of alleles through process of birth and demise of alleles through time

Comparison of observed to expected distribution provides evidence of departure from Infinite Alleles model

Depends on f, effective population size, and mutation rate

Hartl and Clark 2007

Black: Predicted from Neutral Theory

White: Observed (hypothetical)

Page 11: Lecture 20: Tests of Neutrality November 6, 2015.

Ewens Sampling Formula

i

10

2

3

12

0

)(N

i ikE

3211)(

3

0

12

0

i

N

i iikE

.

Probability the i-th sampled allele is new given i alleles already sampled:

Probability of sampling a new allele on the first sample:

eH

1

Probability of observing a new allele after sampling one allele:

Probability of sampling a new allele on the third and fourth samples:

12...

211

N

Expected number of different alleles (k) in a sample of 2N alleles is:

Example: Expected number of alleles in a sample of 4:

eN4Population mutation rate: index of variability of population:

Page 12: Lecture 20: Tests of Neutrality November 6, 2015.

Ewens Sampling Formula Predicts number of different

alleles that should be observed in a given sample size if neutrality prevails under Infinite Alleles Model

Small , E(n) approaches 1

Large , E(n) approaches 2N

can be predicted from number of observed alleles for given sample size

Can also predict expected homozygosity (fe) under this model

12...

211

)(12

0

N

inE

N

i

where E(n) is the expected number of different alleles in a sample of N diploid individuals,

and = 4Ne.

1

1

14

1

ee N

f

Page 13: Lecture 20: Tests of Neutrality November 6, 2015.

Ewens-Watterson Test

Compares expected homozygosity under the neutral model to expected homozygosity under Hardy-Weinberg equilibrium using observed allele frequencies

Comparison of allele frequency distributions

fe comes from infinite allele model simulations and can be found in tables for given sample sizes and observed allele numbers

2iHW pf

Page 14: Lecture 20: Tests of Neutrality November 6, 2015.

Ewens-Watterson Test Example

Drosophila pseudobscura collected from winery

Xanthine dehydrogenase alleles

15 alleles observed in 89 chromosomes

fHW = 0.366

Generated fe by simulation: mean 0.168

feHartl and Clark 2007

How would you interpret this result?

Page 15: Lecture 20: Tests of Neutrality November 6, 2015.

Most Loci Look Neutral According to Ewens-Watterson Test

Exp

ecte

d H

omoz

ygos

ity

f e

Hartl and Clark 2007

Page 16: Lecture 20: Tests of Neutrality November 6, 2015.

DNA Sequence Polymorphisms DNA sequence is ultimate view of standing genetic variation: no

hidden alleles

Is this really true?

What about back mutation?

Signatures of past evolution are contained in DNA sequence

Neutral theory presents null model

Departures due to:

Selection

Demographic events

- Bottlenecks, founder effects- Population admixture

Page 17: Lecture 20: Tests of Neutrality November 6, 2015.

Sequence Alignment Necessary first step for comparing sequences within and

between species

Many different algorithms

Tradeoff of speed and accuracy

Page 18: Lecture 20: Tests of Neutrality November 6, 2015.

Quantifying Divergence of Sequences

Nucleotide diversity (π) is average number of pairwise differences between sequences

ijij

ji ppN

N

1

where

N is number of sequences in sample,

pi and pj are frequency of sequences i and j in the sample,

and

πij is the proportion of sites that differ between sequences i and j

Page 19: Lecture 20: Tests of Neutrality November 6, 2015.

Sample Calculation of π

A->B, 1 differenceA->C, 1 differenceB->C, 2 differences

5 10 15 20 25 30 35A

B

C

01867.0

)35/2)(33.0)(33.0()35/1)(33.0)(33.0()35/1)(33.0)(33.0(2

3

ijij

ji ppN

N

1

On average, there are 18.67 polymorphisms per kb between pairs of haplotypes in the population

Page 20: Lecture 20: Tests of Neutrality November 6, 2015.

Tajima’s D Statistic

Infinite Sites Model: each new mutation affects a new site in a sequence

Expected number of polymorphic sites in all sequences:

mE

)(

eN4where m is length of sequence, and

where n is number of different sequences compared

m

Page 21: Lecture 20: Tests of Neutrality November 6, 2015.

Sample Calculation of S

Two polymorphic sitesS=2

5 10 15 20 25 30 35A

B

C

5.12

1

1

111

11

n

i ia 33.1

5.1

2

1

a

SS

01867.0 65.0)35)(01867.0( m

Page 22: Lecture 20: Tests of Neutrality November 6, 2015.

Tajima’s D Statistic Two different ways of estimating same parameter:

Deviation of these two indicates deviation from neutral expectations

m 1a

SS

Sd

)(dV

dD where V(d) is variance of d

Page 23: Lecture 20: Tests of Neutrality November 6, 2015.

Tajima’s D Expectations D=0: Neutrality

D>0

Balancing Selection: Divergence of alleles (π) increases

OR

Bottleneck: S decreases

D<0

Purifying or Positive Selection: Divergence of alleles decreases

OR

Population expansion: Many low frequency alleles cause low average divergence

Sd

Page 24: Lecture 20: Tests of Neutrality November 6, 2015.

Balancing Selection

Balancing

selection

‘balanced’ mutation

Neutral mutation

Slide adapted from Yoav Gilad

Should increase nucleotide diversity () Decreases polymorphic sites (S)

initially. D>0Sd

Page 25: Lecture 20: Tests of Neutrality November 6, 2015.

Recent Bottleneck

Rare alleles are lost Polymorphic sites (S) more severely affected than

nucleotide nucleotide diversity () D>0

Standard neutral model

Sd

Page 26: Lecture 20: Tests of Neutrality November 6, 2015.

Positive Selection and Purifying Selection

sweep

S

Slide adapted from Yoav Gilad

Advantageous mutation

Neutral mutation Should decrease both nucleotide

diversity () and polymorphic sites (S) initially.

S recovers due to mutation recovers slowly: insensitive to

rare alleles D<0

s sTime

recovery

Sd

Page 27: Lecture 20: Tests of Neutrality November 6, 2015.

Standard neutral model

Often two main haplotypes, some

rare alleles

Rapid Population Growth will also result in an excess of rare alleles even for neutral loci

Slide adapted from Yoav Gilad

Tim

e

Rapid population size increase

Most alleles are rare

eN4

Most alleles are rare Nucleotide diversity ()

depressed Polymorphic sites (S)

unchanged or even enhanced : 4Neμ is large

D<0

Sd

Page 28: Lecture 20: Tests of Neutrality November 6, 2015.

How do we distinguish these two forms of divergence (selection vs demography)?