The$Nonparanormal$SKEPTIC$$ and$...

Post on 24-Jun-2020

4 views 0 download

Transcript of The$Nonparanormal$SKEPTIC$$ and$...

The  Nonparanormal  SKEPTIC    and  

Its  Applica9on    

Outline  

•  The  Nonparanormal  SKEPTIC    •  inferring  biochemical  networks  

the  precision  matrix  

•  inverse  of  the  covariance  matrix  •  Θ  •  if  the  the  data  is  mul9variate  normal:  

node   1   2   3   4   5   6   7  

1   0   ~   ~   ~   0   0   0  

2   ~   0   0   0   0   0   0  

3   ~   0   0   0   ~   0   0  

4   ~   0   0   0   0   0   0  

5   0   0   ~   0   0   ~   0  

6   0   0   0   0   ~   0   ~  

7   0   0   0   0   0   ~   0  

1

2

3

4

5

6

7

2  problems  

•  dimension  >>  #  observa9ons  •  data  is  not  mul9variate  normal  

dimension  >>  #  observa9ons  

•  log  likelihood    

log  detΘ  –  tr(SΘ)-­‐  (terms  involving  the  mean)  

Max  

data  is  not  mul9variate  normal  

•   trick  1  (the  nonparanormal):  

 •  trick  2  (nonparametric  correla9on):  

Lafferty,  J.  (2009).  The  Nonparanormal :  Semiparametric  Es9ma9on  of  High  Dimensional  Undirected  Graphs,  10,  2295–2328.  

Pris'onchus  pacificus  

•  satellite  model  organism  of  C.  elegans  •   necromenic  associa9on  with  Scarab  beetles  •  global  distribu9on  – diverse  habitats  –   diverse  but  structured  gene9c  background  

Image courtesy of Sommer Lab

Collaboration: Ralf J. Sommer, Director, MPI, Tuebingen, Germany

data  set    

•  ~450  strains  •  2  replicates  each  •  posi9ve  and  nega9ve  ioniza9on  high  resolu9on  lcms  (metabolome)  

•  restric9on  site  associated  dna  maker  snp  calls  (genome)  

rad  seq  

restric9on  enzyme  

adapter  

restric9on  enzyme  

adapter  

genomic  DNA  

sequencing  

SNP  calling  

Poland,  J.  a,  Brown,  P.  J.,  Sorrells,  M.  E.,  &  Jannink,  J.-­‐L.  (2012).  Development  of  high-­‐density  gene9c  maps  for  barley  and  wheat  using  a  novel  two-­‐enzyme  genotyping-­‐by-­‐sequencing  approach.  PloS  One,  7(2),  e32253.  doi:10.1371/journal.pone.0032253    

snp  data  set  snp_locus

_1  snp_locus

_2  snp_locus

_3  …  

sample_1  

sample_2  

sample_3  

…  

1%  genomic    coverage  

#  alleles   count  1   194  2   2947  3   1  

column:  hkp://www.waters.com/webassets/cms/category/media/snapshot/ACQUITY_Column.jpg  mass  spectrometer:  hkps://encrypted-­‐tbn2.gsta9c.com/images?q=tbn:ANd9GcSJGwVjgNgUcS9gVvxiupz6-­‐wrL5jrVypj09BYwFnIfvHGSfFXXdg  

total  ion  chromatogram  

mass  spectrometer  

liquid  chromatography  coupled  mass  spectrometry  (lcms)  

chromatography    column  

peak_1  (m,rt)  

peak_2  (m,rt)  

peak_3  (m,rt)  

…  

sample_1  

sample_2  

sample_3  

…  

 ~2,000  features  

lc-­‐ms  

xcms  

PC  2  

PC  3  

PC  4  

PC  2   PC  3  PC  1  

ascaroside  centric  metabolic  network  

(466.2,  5.78)  

ascaroside  centric  metabolic  network  

Start Node End Node Shortest

Path

Shortest Path To Random Node From

Start Node

Shortest Path To Random Node

From End Node Correlation

ascr#9 pasc#12 1 9.18 10.74 -0.128061447

pasc#9 pasc#12 2 13.28 10.88 -0.076858659

ascr#9 pasc#9 3 9.6 12.52 -0.626094706

advantages  of  this  method  

•  requires  no  prior  knowledge  •  unsupervised  •  group  wise  interference  •  generalizable  •  efficient  •  func9onal