Download - Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Transcript
Page 1: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Modelling Genetic Variation in the Brain

Thomas Nichols, PhD Department of Statistics,

Warwick Manufacturing Group University of Warwick

joint with

Becky Inkster Institute of Psychiatry King’s College London (GSK3β & WNT pathway VBM)

Maria Vounou, Giovanni Montana Statistics Section, Dept. of Mathematics Imperial College (Sparse Reduced Rank Regression)

Page 2: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Outline

•  Background – Structural brain imaging & VBM – Genetics –  “Imaging Genetics”

•  Candidate SNP VBM •  Multivariate SNP analyses

Page 3: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Neuroimaging Background: Structural Brain Image Data

•  Morphometry – Quantification of shape/volume of

brain structures •  Traditional Morphometric Analysis

– Laborious hand-tracing of structures – Accurate, but imperfect inter-rater

reliability •  Voxel Based Morphometry

– Automated morphometry method

Page 4: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Voxel Based Morphometry

T1-weighted MRI Gray Matter

Subject Space

Original MRI → Segment → Warp to Atlas Space → Modulate → Smooth

Subject Space

Page 5: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Voxel Based Morphometry

T1-weighted MRI

Atlas Space

Gray Matter

Atlas Space

Modulated Gray Matter

Atlas Space

Modulation

Gives units of subject GM volume in atlas space

Allows analysis in common space while retaining individual differences

Original MRI → Segment → Warp to Atlas Space → Modulate → Smooth

Page 6: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Smoothing •  Accounts for imperfect

registration of individuals to atlas –  Even identical twins have

different cortical foldings –  Exact match impossible

•  Discards fine spatial details in exchange for reduced noise –  Generally searching for

moderate scale differences Done! •  3D image is n=1

–  A single (100,000-dimensional) phenotypic measurement on 1 individual

Atlas Space

Smoothed, Modulated GM

Voxel Based Morphometry Original MRI → Segment → Warp to Atlas Space → Modulate → Smooth

Modulated Gray Matter

Atlas Space

Page 7: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Genetics Background

•  Genotype – The genetic constitution of an organism or cell – 46 chromosomes in humans – 23 pairs of homologous chromosomes

•  One each from each parent

•  Gene – A series of basepairs (DNA bits) which code

for a trait – Four different possible basepairs, the

nucleotides •  Adenine, Thymine, Cytosine, & Guanine

Page 8: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

AATGTGATAGCTT

AATGTGACAGCTT

Genetics Background •  Single Nucleotide Polymorphisms (SNP)

–  Locations where single base-pair differences bases have been found in the population

•  SNP Example –  If some of the population has sequence…

–  And if remaining has… –  We have found a SNP!

•  SNP data –  Homologous chromosomes –  For each SNP, for each individual: 0, 1 or 2 count

AATGTGATAGCTT

AATGTGACAGCTT

SNP

Page 9: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y0

0.5

1

1.5

2

2.5x 108 Number of basepairs per Chromosome

3,079,843,747 Base Pairs † Genetics Background

•  Millions of SNPs

•  Thanks to correlation (linkage disequilibrium), only need ≈500k to “tag” all variation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y0

500

1000

1500

2000

2500

3000

3500

4000

4500Number of Genes per Chromosome

32,185 Genes †

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y0

2

4

6

8

10

12

14

16

18x 105 Number of SNPs per Chromosome

20,296,765 SNPs *

* From Entrez SNP database † From Wikipedia

Page 10: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Genetics Background •  SNPs vs. genes

– Each gene often has several variants – 1 or more (but not many) SNPs typically

needed to identify a gene – SNPs may not lie directly on coding portion of

gene •  Due to linkage disequilibrium (correlation), close is

good enough •  Non-coding, regulatory region may be causal

Exon

SNPs

Exon Exon Exon Exon Exon

Location on chromosome

Page 11: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Imaging Genetics

•  Motivation –  Brain structure heritable –  Objective, reproducible phenotype

•  Important in psychiatry –  Current best measures are coarse,

with weak reproducibility »  e.g. HAM-D (depression), MMSE

(cognition, AD) –  Sensitive

•  Brain anatomy/function closer to disease process than other measures

–  Use to collaborate other findings •  E.g. Large WGA finds modest

significance Use brain imaging to build confidence in finding

Brain Phenotype h2 Whole brain volume 0.78 Total gray matter volume 0.88 Total white matter volume 0.85 Glahn, Thompson, Blangero. Hum Brain Mapp 28:488-501, 2007

Thickness of Cortical GM (r2)

Heritability of GM Thickness (h2 & corrected P-value)

Thompson et al, Nature Neuro, 4(12):1253-1258,. 2001

Thompson & Toga, Annals of Medicine 34(7-8):523-36, 2002

Page 12: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Candidate ROI Many ROI Voxelwise

Candidate SNP

Candidate Gene

Genome-wide SNP

Genome-wide Gene

Imaging Genetics Menu

Genetics

Imaging

(Jason Stein/Andy Saykin/Bertrand Thirion)

[Joyner et al. 2009] 4 ROIs, 11 SNPs

[Potkin et al. 2009] 1 BOLD ROI

317, 503 SNPs

[Filippini et al. 2009] 29,812 voxels

1 SNP

[Stein et al. 2010] 31,622 voxels 448,293 SNPs

[Hibar et al. 2011] 31,622 voxels 18,044 SNPs

Page 13: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Outline

•  Background – Structural brain imaging & VBM – Genetics –  “Imaging Genetics”

•  Candidate SNP VBM •  Multivariate SNP analyses

Page 14: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

GSK3β Background •  High heritability of depression (Kendler et al.

2006; Sullivan et al., 2000). •  Meta-analytical evidence from MRI studies for a

role of hippocampal integrity in depression (Campbell et al., 2004).

•  There is strong genetic regulation of neurodevelopment (reviewed by Wilson and Rubenstein, 2000; O’Leary et al., 2002).

•  The Wnt signaling pathway is one network of proteins that play a role in embryogenesis

•  GSK3β plays a key role in Wnt pathway

Page 15: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Wnt Signaling Pathways

regulates the development of the hippocampus

Page 16: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

GSK HiTDIP Study •  Major Depressive Disorder (MDD) Association Study

–  “High Throughput Human Disease Specific Targets” –  7,000 SNPs covering 2,000 genes with tractable targets –  1000 cases, 1000 controls

•  Imaging Subset –  200 cases, 200 controls (of 1000 & 1000) scanned with

anatomical MRI protocol –  ‘Optimized VBM’ with SPM5’s segmentation tool –  324 images passed QC

•  366 subjects’ data delivered •  42 subjects set aside

(clinical exclusion, pathologies or failed segmentation)

•  Glycogen synthase kinase 3β (GSK3β) –  Plays key role in WNT pathway, influential in development

Page 17: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Modelling Candidate SNPs

•  Mass Univariate Modelling – Fit same univariate linear model at each voxel

•  Quantitative Trait Multiple Regression – Linear model fit at each voxel

•  Regressors – Genetic – Group (Case/Control) – Demographic / nuisance variables

Page 18: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

SNP Models for Gray Matter Data

•  Recessive

•  Dominant

•  Additive

•  Genotypic

Gra

y M

atte

r Vol

ume

SNP Count 0 1 2

Y

Xj

Gra

y M

atte

r Vol

ume

SNP Count 0 1 2

Y

Xj G

ray

Mat

ter V

olum

e

SNP Count 0 1 2

Y

Xj

Gra

y M

atte

r Vol

ume

SNP Count 0 1 2

Y

Xj

Page 19: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Mass Univariate Modelling Genetic Effects

•  Concerns about leverage/influence – 100’s not 1000’s of subjects

•  100 subjects + 10% MAF → 1 subject with rare genotype expected!

– Rare SNP can make a few subjects very influential

•  An ever-greater problem as sample size shrinks

Gra

y M

atte

r Vol

ume

Allele Count

0 1 2

Y

Xj

Page 20: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Mass Univariate Modelling Genetic Effects

•  Ad hoc solution –  If expected rare genotype frequency <10%

merge genotypes •  If MAF > 0.31 (=√0.1)

– 2DF Genotypic model •  Additive + Nonadditive Parameterization

–  Additive [ -1 0 +1 ] tested – Nonadditive [ -1/2 +1 -1/2 ] not tested

(orthogonalize w.r.t. additive regressor * )

•  If MAF < 0.31 – Use dominant/recessive model

Page 21: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Mass Univariate Modelling Nuisance Effects

•  Age & Gender –  Substantial normal variation in GM w/ Age

•  Total Gray matter –  Accounts for differences in head size –  Discounts global changes to find localized changes

•  Scanner (Pre/Post Upgrade) –  Upgrade 2/3-through study altered image contrast

•  Medication (Yes/No, for cases only) –  Neurotrophic effects reported for some Rx

Page 22: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Model Diagnosis for Imaging •  Why bother?

–  Largish n, continuous data, Central Limit Theorem should carry us

–  Type I Error generally OK due to robustness of t-test/ANOVA-like models

•  Sensitivity! –  Decreased sensitivity due to inflated error

variance σ –  Suboptimal sensitivity due to non-normality

•  How!? –  100,000 voxels, 400 subjects –  100,000 QQ plots to look at all 40 million

data points?

Failed GM segmentation due to data formatting

error

Warping artefacts seen in modulated

GM

Page 23: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Model Diagnosis for Imaging •  Model summaries

–  Images of diagnostic stats •  Scan summaries

–  Vectors of ad hoc measures

•  Dynamic graphical tool –  Explore many summaries

simultaneously –  Easily jump from

summary image to plots, from plots to residual images

•  End Result –  Swiftly localize and

understand problems

Statistic Assesses Null Distn

Cook-Weisberg Var(εi) = σ2 Chi-Squared

Shapiro-Wilk ε ~ Normal (tabulated)

Outlier Count Artifacts Binomial

Std. Deviation Artifacts

Summary Interpretation

Global intensity Whole-brain signals or artifacts

Outlier Count Artifacts

Any preprocessing parameters e.g. head size

Suggests cause of artifacts

Experimental predictors

For investigating mismodelled signal in residuals

Luo & Nichols NeuroImage 19:1014–1032, 2003 http://go.warwick.ac.uk/tenichols/software

Model Summaries

Scan Summaries

Page 24: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Model Diagnosis w/ SPMd Scan Summaries Model Summaries

Model Detail Scan Detail

Page 25: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Outline / Motivation •  Data

–  Intro to Voxel Based Morphometry data •  Model

–  Quantitative trait regression w/ Mass Univariate Model •  Diagnosis

–  100,000 Q-Q plots anyone? •  Inference

–  Cluster size under nonstationarity –  Candidate screening procedure

•  Results –  GSK3β in MDD

•  Future Directions

Page 26: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Inference On Images: Voxel-wise vs. Cluster-wise

•  Voxel-wise –  Reject Ho, point-by-point, by statistic magnitude

•  Cluster-wise –  Define contiguous blobs with arbitrary threshold uclus

–  Reject Ho for each cluster larger than kα

Cluster not significant

uclus

space

Cluster significant kα kα

statistic image

Page 27: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Cluster Inference & Stationarity •  Cluster-wise preferred over voxel-wise

–  Generally more sensitive Friston et al, NeuroImage 4:223-235, 1996

–  Spatially-extended signals typical •  Problem w/ VBM

–  Standard cluster methods assume stationarity, constant smoothness

–  Assuming stationarity, false positive clusters will be found in extra-smooth regions

–  VBM noise very non-stationary •  Nonstationary cluster inference

–  Must un-warp nonstationarity –  Reported but not implemented

•  Hayasaka et al, NeuroImage 22:676– 687, 2004

–  Now available as SPM toolbox •  http://fmri.wfubmc.edu/cms/software#NS

VBM: Image of FWHM Noise

Smoothness

Nonstationary noise…

…warped to stationarity

Page 28: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Inference in Imaging Genetics: Creeping Multiple Testing Problem

•  Even just with candidate analyses, Can end up searching over… –  Genes –  SNPs within a gene –  Space (voxels or clusters) –  Different contrasts on GLM

•  Main effect? By clinical subgroup? Interactions?

•  Can quickly lose confidence in results –  E.g. 0.005 FWE-corrected is great… …Unless it’s the 25th statistic image you’ve seen

Page 29: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Inference in Imaging Genetics Multiple Testing Strategy

•  Define strict primary outcome –  For given gene, use single SNP

•  Best (large) association study significance, otw •  Best nonsynonymous exonic available, otw •  Best 5’ intronic available

–  For each SNP, only consider main effect of gene •  If fitting gene x group interaction, test for average effect

–  Any association is more likely than a disease-specific association –  Even if disease-specification association, opposing sign of effect unlikely w/ VBM

–  1-number summary per gene •  Minimum nonstationary cluster FWE-corrected P-value for association (1 DF

F-stat) –  Bonferroni correction for number of genes

•  Primary outcomes then have strong FWE control –  Over brain, over genes –  (1-α)100% confidence of no false positives anywhere

•  Secondary outcomes –  Interactions, sub-group results –  Use same FWE-inferences, but mark as post-hoc

Need Becky to check correctness/terminology here!

Page 30: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Results: Model Diagnosis Outlier Detection with Shapiro -Wilk

R

Two outliers

-log10 P Shapiro-Wilk Mean Smoothed Mod. GM

Page 31: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Results: Model Diagnosis Characterising Outliers with Standardized Residual Images

Subject 193 Subject 194 Outlier

Subject 195

R

Note: Compare standardized residuals to +/-6.128 (Bonferroni for 324 images, each with 173,823 voxels, at each a 2-sided test)

R

Subject 194 raw T1

Severe enlargement of inferior horn of lateral ventricle

Page 32: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Results: Outlier

Exploration

Subject 194 Outlier

Randomly Selected Control

Inferior Horn of Lateral Ventricle In most of us, this is a pencil-lead-thick fluid-filled space

In this subject it was a pencil-thick

Clinical collaborator verified it as abnormal & subject was removed

Page 33: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

GSK3β and Structural Differences 2 SNPs in strong linkage disequilibrium showed significant associations with GM differences in MDD patients: rs6438552 rs12630592 Brain regions where SNP clusters show co-localization.

GSK3β-Gray Matter association in bilateral superior temporal gyrus (STG) and right hippocampus

R L

Inkster, B., Nichols, T. E., Saemann, P. G., Auer, D. P., Holsboer, F., Muglia, P., & Matthews, P. M. (2009). Association of GSK3 Polymorphisms With Brain Structural Changes in Major Depressive Disorder. Archives of General Psychiatry, 66(7), 721-728.

Page 34: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

‘AA genotype group’ associated with decreased GM concentration in right STG

rs6438552 is a putative functional SNP

i.e. it regulates the selection of splice acceptor sites in vitro.

P = 0.0004 (corrected for whole brain search and multiple SNP testing)

Inkster, B., Nichols, T. E., Saemann, P. G., Auer, D. P., Holsboer, F., Muglia, P., & Matthews, P. M. (2009). Association of GSK3 Polymorphisms With Brain Structural Changes in Major Depressive Disorder. Archives of General Psychiatry, 66(7), 721-728.

Page 35: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Wnt Signaling Pathways

regulates the development of the hippocampus

WNT3A

FZD3

KRM1

DVL2

CTNNB1

AXIN2

TCF4

LEF1

SMAD1

PPARgC1a

EMX2

ZEB2

Page 36: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

WNT pathway genes

R

R

R

ZEB2

FZD3

DVL2

AXIN2

GSK3β

SMAD1

PPARGCA1

EMX2

Inkster, B., Nichols, T. E., Saemann, P. G., Auer, D. P., Holsboer, F., Muglia, P., & Matthews, P. M. (2010). Pathway-based approaches to imaging genetics association studies: WNT signaling, GSK3beta substrates and major depression. NeuroImage, 53(3), 908-917.

Page 37: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Outline

•  Background – Structural brain imaging & VBM – Genetics –  “Imaging Genetics”

•  Candidate SNP VBM •  Multivariate SNP analyses

Page 38: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Possible Mass-Univariate Analyses •  Full cross analysis

– Massive multiple testing problem!

•  Candidate SNP – Full image result – Must have right SNP

•  Voxel/Region QTL

– Whole genome association

– Must have right ROI

500,000 SNPs

100,

000

voxe

ls

≈ 1010 tests!

500,000 SNPs

100,

000

voxe

ls

≈ 106 tests

500,000 SNPs 10

0,00

0 vo

xels

≈ 105 tests

Page 39: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Multivariate Regression

•  Silly… –  If N > NG, fit equivalent

to NV univariate models fit independently

– Much redundancy in C •  rank{C} ≤ min(NV, NG)

≪ NV ∙ NG

= Y X

C

E + N × NV

Images

N × NG

Genotypes

NG × NV

Regression Coefficients Error

N × NV

N # subjects NV # voxels/ROIs NG # genes/SNPs

Page 40: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Reduced Rank Regression

= Y X

B

E + A N × NV N × NG

N × r

N × NV r × NV

Images Genotypes Image Coefficients Error

Genotype Coefficients

•  Fix rank r •  Approximate

C ≈ B A B & A each rank r

N # subjects NV # voxels/ROIs NG # genes/SNPs

Page 41: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Sparse Reduced Rank Regression

= Y X

B

E + A N × NV N × NG

NG × r

N × NV r × NV

Images Genotypes Sparse Image Coefficients Error

Sparse Genotype

Coefficients

•  Fix rank r •  Approximate

C ≈ B A B & A each rank r

•  Enforce sparsity N # subjects NV # voxels/ROIs NG # genes/SNPs Vounou, M., Nichols, T. E., & Montana, G. (2010). Discovering genetic associations with high-dimensional

neuroimaging phenotypes: A sparse reduced-rank regression approach. NeuroImage, 53(3), 1147-59.

Page 42: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Sparse Reduced Rank Regression - Estimation

•  RRR –  Y = X A B + E – For fixed rank r, find A & B that minimize

M = tr { (Y−XBA) Γ (Y−XBA)’ } for some NV × NV matrix Γ, e.g. Γ = I

•  SRRR – For rank 1, find a & b that minimize

M = tr { (Y−Xba’) Γ (Y−Xba’)’ } + λa||a||1 + λb||b||1

– Then subtract Xba’ from the data, and repeat – Need to specify final rank r, λa & λb

•  Can set λa & λbin terms of #|a|>0 & #|b|>0

Page 43: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Simulation: Phenotype & SNPs •  Simulated MRI data

–  ADNI T1 images through SPM5 VBM pipeline –  NV = 111 ROIs, placed on VBM data from 189 MCI

ADNI subjects •  GSK CIC Atlas, based on Harvard-Oxford atlas

–  Estimate covariance Σ after adjusting for age & gender

–  Simulate ROI data (for arbitrary N) with covariance Σ •  Evaluate with realistic genetic population w/

FREGENE –  Simulates sequence-level data in large population –  Provides 10K individuals, 20Mb chromosome (~180K

SNPs) •  Chadeau-Hyam, et al. BMC Bioinformatics, 9:364, 2009

Page 44: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Simulation: Phenotype & SNPs •  FREGENE SNP simulation

–  Population of 10,000 evolved over 200,000 generations –  20Mb simulated –  37,748 SNPs with MAF>0.05 –  Select k=10 causative SNPs

•  From all possible having MAF=0.2 •  Used to induce phenotypic effect

–  But then dropped from consideration •  Represents realistic setting, where causative SNP is not seen, but effect

captured through local LD –  From population of 10,000, repeatedly sample cohorts of size N

•  Simulated association in MRI data –  Add genetic effect to Frontal and Temporal ROIs with causative

SNPs •  γ = 0.06, 0.08, or 0.1 reduction in mean GM in affected ROI •  Calibrated to Filipini et al. (2009)

–  10% reduction in GM ApoE ε4/ε4 subjects relative to subjects with no ε4 alleles

Page 45: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Out

of A

frica

(OoA

) sp

lit &

bot

tlene

ck

Expansion Founding population in Africa

Expansion

Asi

an &

E

urop

ean

split

Expansion

Cha

deau

-Hya

m, e

t al.

BM

C B

ioin

form

atic

s, 9

:364

, 200

9

FREGENE: Evolutionary model of world population

Page 46: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Why try so hard? Why not rand{0,1,2}500,000 ?

•  Linkage disequilibrium (LD) –  SNPs not independent –  Highly structured,

heterogeneous dependence

•  Population sub-structure –  Ethnic differences &

migration patterns induce systematic variation

•  Multivariate analysis –  Want realistic multivariate

structure in our simulations

The

Wel

lcom

e Tr

ust C

ase

Con

trol C

onso

rtium

, Nat

ure

447,

661

-678

, 200

7.

Page 47: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Realistic Phenotype

•  All pairwise GM correlations among NV = 111 ROIs

Page 48: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Realistic Genotypes

•  Correlation of first 1000 simulated SNPs

Page 49: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

•  “True positive” with missing causative SNP – Declare true positive

if LD coefficient close enough

•  LD-linked SNPs – Of 1990 SNPs –  51 linked (r>0.8) to

one or more the 10 causative SNPs

Simulation Setting: Horse shoes & Imaging Genetics

Page 50: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

SRRR Simulation Results •  Power to detect 1 or more SNPs (NG=1990)

•  For ranks r = 1,2,3 dominates Mass Uni. – Better for higher r

Page 51: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

SRRR Simulation Results •  Power to detect 1 or more SNPs (NG=1990)

•  For ranks r = 1,2,3 dominates Mass Uni. – Better for higher r; here r = 3, high eff. size.

Page 52: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

SRRR Simulation Results

•  Power to detect 1 or more ROIs •  Less difference

– Power can be manipulated by varying λ by rank

Page 53: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

SRRR: Multivariate vs. Mass-Univariate

•  Does this NG=1990 result generalize?

•  For up to 40k SNPs –  r = 3, med. effect

size, N=1000 – Power 2-5 greater – Absolute power still

tiny

Page 54: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

SRRR Simulation Results •  Power to detect 1 or more SNPs (NG=1990)

•  For ranks r = 1,2,3 dominates Mass Uni. – Better for higher r; here r = 3

Page 55: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Sparse Reduced Rank Regression for SNP – MRI Association

•  Detailed simulation of imaging & genetic correlations structure – Suggests multivariate approach will out-

perform mass-univariate – Power tiny, in any event

•  Much work to do – Haven’t addressed how to optimize phenotype – Haven’t tried to estimate penalty parameters λa, λb or r

•  Currently investigating stability selection – See #316 Le Floch et al

Page 56: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously

Conclusions •  VBM

–  Powerful, “automated” anatomical analysis –  Need careful raw data, preprocessing & model QC

•  Imaging Genetics –  Mash-up of two large data, massive multiple testing

problems •  Candidate SNP VBM

–  Given a SNP, just like a traditional imaging analysis –  Multiple SNPs possible too, but need combining

methods •  Multivariate Sparse Reduced Rank Regression

–  Promising, but little power unless have 1,000’s of subjects

Page 57: Modelling Genetic Variation in the Brain - Warwick · 2011. 12. 2. · • Scan summaries – Vectors of ad hoc measures • Dynamic graphical tool – Explore many summaries simultaneously