Evolutionary Genetics: Part 7 Recombination –Linkage ...

Post on 16-Oct-2021

6 views 1 download

Transcript of Evolutionary Genetics: Part 7 Recombination –Linkage ...

Evolutionary Genetics: Part 7

Recombination – Linkage Disequilibrium

S. peruvianum

S. chilense

Winter Semester 2012-2013

Prof Aurélien TellierFG Populationsgenetik

Color code

Color code:

Red = Important result or definition

Purple: exercise to do

Green: some bits of maths

Population genetics: 4 evolutionary forces

random genomic processes

(mutation, duplication, recombination, gene conversion)

natural

selection

random demographic

process (drift)

random spatial

process (migration)

molecular diversity

Recombination

Recombination and crossing over

Physical map

Genetic map

Independent segregation (Mendel’s law)

Non-independent segregation

This is genetic linkage

Non-independent segregation

� Recombination rate

� In general:

� The recombination rate of two loci on different chromosomes = 0.5

� The recombination rate between loci on same chromosome 0<ρ<0.5

� The recombination rate of two loci on the same chromosome increases

monotonically with distance

� BUT there are recombination hotspots (or cold spots) in the genome

number of recombined gametes

total number of gametesρ =

Non-independent segregation

Recombination and crossing-over

Genetic map length - Morgan

Model without recombination

A

B

Your

chromosomes

A

B

A

B

Inherited from

your mother

From your grandfather or

your grandmother

Inherited from

your father

Model with recombination

A

B

Your

chromosomes

A

B

A

b

a

B

Inherited from

your mother

From your

grandfather

From your

grandmother

Inherited from

your father

Model with recombination

� So two loci on the same chromosome can come

� From a single parent if there is no recombination

� From two parents if there is recombination

� With recombination, the chromosome of your parents are mosaics of

pieces of chromosomes from their parents

� We define ρ as the probability that a recombination event happens

P[two loci have the same parent] = 1-ρ

Model with recombination

� we define ρ as the probability that a recombination event happens

� P[two loci have the same parent] = 1-ρ

Coalescence with recombination

� Take one linage

� Tracing it back in time, recombination events can happen

� Recombination happens with probability ρ at every generation

P[recombination event t generation ago]=ρ(1-ρ)t-1

� This is again a geometric (exponential) distribution

� Backward in time:

� There can be

� coalescence of two lineages

� or recombination event

� recombination creates two lineages backward in time: one with locus A

and the other with locus B

Coalescence with recombination

� The number of lineages is increased by recombination, so it can take a while to

find the MRCA

� However, if the number of lineages increases (k), this will increase also the rate

of coalescence, so an MRCA will be found

Coalescence with recombination

� Along the genome, a serie of sites have a coalescent tree

� In fact, recombination slowly breaks link between sites

� The higher the recombination, the more independent are the loci

� Virtually, every locus has its own MRCA

� If recombination rates vary along the genome, this means that loci have

different recombination in their tree

Coalescence without recombination

� Along the genome, ONLY ONE tree for all loci

� The higher the recombination, the more independent are the loci

� Recombination is important, otherwise, each chromosome would be only one

data point (= one tree)

� This is the case for: Y-chromosome in humans, Mitochondrial DNA,

Chloroplast DNA where there is no recombination (= one tree for all loci)

� Why is this a problem if no recombination?

Coalescence without recombination

� Why is this a problem if no recombination?

� This is the case for: Y-chromosome in humans, Mitochondrial DNA,

Chloroplast DNA where there is no recombination (= one tree for all loci)

� Understanding the evolution in the genome requires to have independent

information about ONE evolutionary process (= different trees which come from

the same evolutionary scenario)

� Information comes from the variance between loci

� If all loci are linked, what is neutral evolution? If some genes are under

selection?

Coalescence with recombination

� How far along the genome do you have to go to find a recombination event?

� define r as the per site (bp) recombination rate

� if two sites are distant of d, the recombination rate ρ = rd

� the coalescence rate is 1/2N, we want at least 50% chance to have a

recombination event

P[recombination before coalescence] =

� this can be simplified as 4Nrd > 1 or d >1/4Nr

� For humans, Ne=104 and r= 10-8, we get d > 2500bp

� In Drosophila where Ne=106, the distance is 100 times shorter

2 11 0.5

2 1/ 2 4 1

rd

rd N Nrd= − ≥

+ +

Recombination and data

Linkage disequilibrium

Recombination in data: 4 gamete rule

� There is one rule to recognize if recombination happened

� the four gamete rule

� Did recombination happen on the right or on the left of the 2nd site?

Recombination in data: LD

� Linkage Disequilibrium (LD) is measured as D

� Two loci A and B with alleles A1 and A2, B1 and B2

� Frequencies are: A1B1 = p11 ; A1B2 = p12 ; A2B1 = p21 ; A2B2 =p22

Recombination in data: LD

� The A1B1 and A2B2 gametes are called coupling gametes

� The A1B2 and A1B2 gametes are called the repulsion gametes

� LD is a measure of the excess of coupling over repulsion gametes

� If D>0, there are more coupling gametes than expected at equilibrium

� If D<0, there are more repulsion gametes than expected

Recombination in data: LD

� Linkage Disequilibrium (LD)

Recombination in data: LD

Recombination in data: LD

� Linkage Disequilibrium (LD) is measured as D and r2

� The change in D in a single generation is: ∆D = –ρD

� After t generations:

� Dt = (1 –ρ)t D0

� This is again and again a geometric function of time

�This means that the ultimate state of the population is D=0

� BUT there is memory of LD in time

� LD decreases away from a given site in the genome also following a

geometric function

Recombination in data: haplotypes

� Linkage Disequilibrium (LD) can be seen in the presence of haplotypes

� Example: (Plos Genetics 2006)

� Do you expect long or short haplotypes under recombination?

� If genes can show different recombination rates, what does this

mean for haplotypes?

� Length and frequency of haplotypes are important signatures to

detect deviation from neutral evolution!!!

Recombination in data

� Using DnaSP

� Using the TNFSF5 and the droso files

� Look at the haplotypes ( Generate => Haplotype Data File)

� Why are haplotypes important to study recombination? What about the

infos on distance between sites?

� Can you look at recombination? Measure of LD, r2 and also the number

of four-gamete rule

� Use Analysis => Recombination

� Decay of LD from sites?