Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

27
Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium November 12, 2012

description

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012. Last Time. Sequence data and quantification of variation Infinite sites model Nucleotide diversity ( π ) Sequence-based tests of neutrality Tajima ’ s D Hudson-Kreitman-Aguade - PowerPoint PPT Presentation

Transcript of Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Page 1: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Lecture 22: Signatures of Selection and Introduction to

Linkage Disequilibrium

November 12, 2012

Page 2: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Last Time

Sequence data and quantification of variation

Infinite sites model

Nucleotide diversity (π)

Sequence-based tests of neutrality

Tajima’s D

Hudson-Kreitman-Aguade

Synonymous versus Nonsynonymous substitutions

McDonald-Kreitman

Page 3: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Today

Signatures of selection based on synonymous and nonsynonymous substitutions

Multiple loci and independent segregation

Estimating linkage disequilibrium

Page 4: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Using Synonymous Substitutions to Control for Factors Other Than

Selection

dN/dS or Ka/Ks Ratios

Page 5: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Types of Mutations (Polymorphisms)

Page 6: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

First and second position SNP often changes amino acid

UCA, UCU, UCG, and UCC all code for Serine

Third position SNP often synonymous

Majority of positions are nonsynonymous

Not all amino acid changes affect fitness: allozymes

Synonymous versus Nonsynonymous SNP

Page 7: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Synonymous & Nonsynonymous Substitutions Synonymous substitution rate can be used

to set neutral expectation for nonsynonymous rate

dS is the relative rate of synonymous mutations per synonymous site

dN is the relative rate of nonsynonymous mutations per non-synonymous site

= dN/dS

If = 1, neutral selection

If < 1, purifying selection

If > 1, positive Darwinian selection

For human genes, ≈ 0.1

Page 8: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Complications in Estimating dN/dS Multiple mutations in a

codon give multiple possible paths

Two types of nucleotide base substitutions resulting in SNPs: transitions and transversions not equally likely

Back-mutations are invisible

Complex evolutionary models using likelihood and Bayesian approaches must be used to estimate dN/dS (also called KA/KS or KN/KS depending on method) (PAML package)

http://www.mun.ca/biology/scarr/Transitions_vs_Transversions.html

CGT(Arg)->AGA(Arg)

CGT(Arg)->AGT(Ser)->AGA(Arg)

CGT(Arg)->CGA(Arg)->AGA(Arg)

Page 9: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

dn/ds ratios for 363 mouse-rat comparisons

interleukin-3: mast cells and bone marrow cells in immune system

Hartl and Clark 2007

Most genes show purifying selection (dN/dS < 1)

Some evidence of positive selection, especially in genes related to immune system

Page 10: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

McDonald-Kreitman Test Conceptually similar to HKA test

Uses only one gene

Contrasts ratios of synonymous divergence and polymorphism to rates of nonsynonymous divergence and polymorphism

Gene provides internal control for evolution rates and demography

Page 11: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Aligned 11,624 gene sequences between human and chimp

Calculated synonymous and nonsynonymous substitutions between species (Divergence) and within humans (SNPs)

Identified 304 genes showing evidence of positive selection (blue) and 814 genes showing purifying selection (red) in humans

Bustamente et al. 2005. Nature 437, 1153-1157

Positive selection: defense/immunity, apoptosis, sensory perception, and transcription factors

Purifying selection: structural and housekeeping genes

Application of McDonald-Kreitman Test:

Page 12: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Genes showing purifying (red) or positive (blue) selection in the human genome based on the McDonald-Kreitman Test

Bustamente et al. 2005. Nature 437, 1153-1157

Page 13: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

How can you differentiate between effects of selection and demographic effects on sequence

variation?

Will this work for organellar DNA?

Page 14: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Extending to Multiple Loci

So far, only considering dynamics of alleles at single loci

Loci occur on chromosomes, linked to other loci!

“The fitness of a single locus ripped from its interactive context is about as relevant to real problems of evolutionary genetics as the study of the psychology of individuals isolated from their social context is to an understanding of man’s sociopolitical evolution”

Richard Lewontin (quoted in Hedrick 2005) Size of region that must be considered depends on Linkage Disequilibrium

Page 15: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Gametic (Linkage) Disequilibrium (LD) Nonrandom association of alleles at different loci into

gametes

Haplotype: Genotype of a group of closely linked loci

LD is a major factor in evolution

LD itself provides insights into population history

Estimation of LD is critical for ALL population genetic data

Page 16: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Nomenclature and concepts

Two loci, two alleles

Frequency of allele i at locus 1 is pi

Frequency of allele i at locus 2 is qi

A1

A2

B1

B2

p1

p2

q1

q2

111

n

ii

n

ii qp

Page 17: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Nomenclature and concepts

Genotype is written as

A1

A2

B1

B2

A1 A2B1 B2

A1 and B1 are in coupling phase

A1 and B2 are in repulsion phase

Page 18: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Gametic Disequilibrium

Easiest to think about physically linked loci, but not necessarily the case

A1 A2B1 B2

A1B1 A1B2 A2B1A2B2

Meiosis

p1q1 p1q2 p2q1 p2q2What Are Expected Frequencies of Gametes

in a Population Under Independent Assortment?

Page 19: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

What are expected frequency of Gametes with complete linkage?

A1

A2

B1

B2

p1

p2

q1

q2

A1 A2B1 B2

A1B1 A1B2 A2B1A2B2

Meiosis

x11 x12x21 x22

Page 20: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Linkage disequilibrium measure, D

Independent Assortment: With LD:

Substituting from above table:

21122211 xxxxD

Page 21: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Problem: D is sensitive to allele frequencies

Example, if D is positive: p1=0.5, q2=0.5, Dmax=0.25

butp1=0.1, q2=0.9, Dmax=0.09

Solution: D' = D/Dmax

ranges from -1 to 1

Dmax Calculation:

If D is positive, Dmax is lesser of p1q2 or p2q1

If D is negative, Dmax is lesser of p1q1 or p2q2

Can’t have negative gamete frequencies

Maximum D set by allele frequencies

Page 22: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

LD can also be estimated as correlation between alleles

r can also be standardized to a -1 to 1 scale

It is equivalent to D’ in this case

2121

2

qqpp

Dr

''

2121

max

2121 D

qqpp

D

qqpp

D

r

Page 23: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Recombination

Shuffling of parental alleles during meiosis

A1 A2B1 B2

Occurs for unlinked loci and linked loci

Rate of recombination for linked markers is partially a function of physical distance

A1

A2

B1

B2

A1

A2 B1

B2

Page 24: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

What is the expected recombination rate for unlinked

loci?

A1 A2B1 B2

A1B1 A1B2 A2B1A2B2

Meiosis

cr

r

nn

nc

Where nr is number of repulsion phase gametes, and

nc is number of coupling phase gametes

Coupling CouplingRepulsion Repulsion

Page 25: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

LD is partially a function of recombination rate

Expected proportions of gametes produced by various genotypes over two generations

Where c is the recombination rateand D0 is the initial amount of LD

First generation (Second generation)

Page 26: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Recombination degrades LD over time

211222111 '''' xxxxD ))(())(( 021012022011 cDxcDxcDxcDx

01 )1( DcD

0)1( DcD tt 0DeD ct

t

Where t is time (in generations) ande is base of natural log (2.718)

Page 27: Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium

Effects of recombination rate on LD

Decline in LD over time with different theoretical recombination rates (c)

Even with independent segregation (c=0.5), multiple generations required to break up allelic associations

Genome-wide linkage disequilibrium can be caused by demographic factors (more later)