Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

24

Transcript of Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Page 1: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.
Page 2: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Maximum-likelihood estimation of admixture proportions from

genetic data

Jinliang Wang

Page 3: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

P0

P1 P2

PhP1 P2

n1 n2

Php1 p2

Nh N2N1

ShS1 S2

ξ

ψ

t1 = ξ/2n1 t2 = ξ/2n2

T1 = ψ/2N1

Th = ψ/2Nh

T2 = ψ/2N2

Ω = {p1, t1,t2,T1,Th,T2}

Page 4: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

P0

P1 P2

PhP1 P2

n1 n2

Php1 p2

Nh N2N1

ShS1 S2

ξ

ψ

t1 = ξ/2n1

t2 = ξ/2n2

T1 = ψ/2N1

Th = ψ/2Nh

T2 = ψ/2N2

Ω = {p1, t1,t2,T1,Th,T2}

w

c1 ch c2

x1 xh x2

y1 yh y2

C = (c1,c2,c3)

Page 5: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Likelihood function

d)Pr(

),,|,Pr(

),,,,,|,,Pr(

),,|,,Pr()Pr(

2121

2112121

2121

w

wttxx

TTTpxxyyy

yyyccc

hh

hhC

Page 6: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Likelihood function

d)Pr(

),,|,Pr(

),,,,,|,,Pr(

),,|,,Pr()Pr(

2121

2112121

2121

w

wttxx

TTTpxxyyy

yyyccc

hh

hhC

Random sampling

Admixture and genetic drift

Genetic driftGenetic drift

Prior on Prior on ww

Page 7: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Allele frequencies in P0

)Pr(w

P0

w

Page 8: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Genetic drift after population split

P0

P1 P2

n1 n2

ξw

x1 x2

),,|,Pr( 2121 wttxx t1 = ξ/2n1

t2 = ξ/2n2

Page 9: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Genetic drift in independent populations

Page 10: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Genetic drift: the diffusion approximation

2

12121 ),|Pr(),,|,Pr(

iii wtxwttxx

ti = ξ/2ni

ii

aii

n

aaxaaH

waaHaaawwwtx

4

)1(exp),2,2,1(

),2,2,1()12)(1()1(),Pr(1

Crow and Kimura (1970) p. 382

Page 11: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

P0

P1 P2

x1

Php1 p2xh x2

The admixture event

2211 xpxpxh

),,,,,|,,Pr( 2112121 hh TTTpxxyyy

Page 12: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

P0

P1 P2

PhP1 P2

Ph

Nh N2N1

ψ

T1 = ψ/2N1

Th = ψ/2Nh

T2 = ψ/2N2

x1 xh x2

y1 yh y2

Genetic drift since admixture event

),,,,,|,,Pr( 2112121 hh TTTpxxyyy

2211 xpxpxh

Page 13: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

PhP1 P2

ShS1 S2c1 ch c2

y1 yh y2

C = (c1,c2,c3)

Random sampling

hi

iihh ycyyyccc.2.1

2121 )|Pr(),,|,,Pr(

Page 14: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Likelihood function

d)Pr(

),|Pr(

),|Pr(

)|Pr()Pr(

2

1

1

1

w

twx

Txy

yc

iii

h

jjjj

h

jjjC

Random sampling

Admixture and genetic drift

Genetic driftGenetic drift

Prior on Prior on ww

Page 15: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

0

5

10

15

20

25

30

NewOrleans

New York Pittsburg Maywood nrChicago

Houston Detroit Baltimore Philadelphia2

Philadelphia1

Charleston,South

Carolina

Jamaica

Eu

rop

ea

n a

nc

es

try

African-American Admixture Proportions

Page 16: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Profile log-likelihoods for New York

Proportion of European ancestry

Drift before admixture event

Drift since admixture event

Page 17: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Application to canid populations:Grey wolf and coyote in North America

Page 18: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

CommonAncestor

Grey Wolf Coyote

Wolf-like

HybridGrey Wolf Coyote

Coyote-like

Hybrid0

10

20

30

40

50

60

70

Grey wolf-like hybrid Coyote-like hybrid

Wo

lve

rin

e a

nc

es

try

Page 19: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

Discussion

Suitable data

Assumptions of the method given the model

Comparing the model to other scenarios

Aspects of the data used for inference

Page 20: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

DiscussionSuitable data

Human data

Genotypes of 10 nuclear loci. Chosen because they are either African or European specific or highly differentiated between the two.

Canid data

10 microsatellite loci. Neither species-specific nor highly differentiated between wolves and coyotes.

Page 21: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

DiscussionAssumptions of method given the model

Alleles are inherited independently across loci in the admixture event

Drift acts independently on alleles across loci

Alleles in a sampled individual are independent across loci

Page 22: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

DiscussionAssumptions of method given the model

The prior distribution on w is flat, not U-shaped

Admixture occurs instantaneously

The effect of mutation on perturbing allele frequency is negligible

Page 23: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

DiscussionComparing the model to other scenarios

Modern ‘pure’ populations need to be sampled

Thus the ‘structure’ of the population is assumed to be known

If we cannot sample modern ‘pure’ populations assumes we cannot make inference on the admixture proportions

Page 24: Maximum-likelihood estimation of admixture proportions from genetic data Jinliang Wang.

DiscussionAspects of the data used for inference Inference proceeds solely on the basis of allele

frequencies

Linkage disequilibrium is Firstly, not used for inference Secondly, assumed to be negligible

LD might be exploited Enhance inference when modern ‘pure’ populations are

sampled Relax the necessity to sample modern ‘pure’

populations at all