Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(•...

22
Analysis of Social Media MLD 10802, LTI 11772 William Cohen 1016010

Transcript of Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(•...

Page 1: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Analysis  of  Social  Media  MLD  10-­‐802,  LTI  11-­‐772  

William  Cohen  10-­‐16-­‐010  

Page 2: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Review  -­‐  LDA  

•  Latent  Dirichlet  AllocaEon  

z

w

β

M

θ

N

α •  Randomly initialize each zm,n

•  Repeat for t=1,….

•  For each doc m, word n

•  Find Pr(zmn=k|other z’s)

•  Sample zmn according to that distr.

“Mixed membership”

Page 3: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Outline  

•  StochasEc  block  models  &  inference  quesEon  •  Review  of  text  models  

– Mixture  of  mulEnomials  &  EM  –  LDA  and  Gibbs  (or  variaEonal  EM)  

•  Block  models  and  inference  •  Mixed-­‐membership  block  models  •  MulEnomial  block  models  and  inference  w/  Gibbs  •  BeasEary  of  other  probabilisEc  graph  models  

–  Latent-­‐space  models,  exchangeable  graphs,  p1,  ERGM  

Page 4: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Parkkinen  et  al  paper  

Page 5: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Another  mixed  membership  block  model  

Page 6: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Another  mixed  membership  block  model  

z=(zi,zj) is a pair of block ids

nz = #pairs z

qz1,i = #links to i from block z1

qz1,. = #outlinks in block z1

δ  =  indicator  for  diagonal  

M  =  #nodes  

Page 7: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Another  mixed  membership  block  model  

Page 8: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Another  mixed  membership  block  model  

Page 9: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Outline  

•  StochasEc  block  models  &  inference  quesEon  •  Review  of  text  models  

– Mixture  of  mulEnomials  &  EM  –  LDA  and  Gibbs  (or  variaEonal  EM)  

•  Block  models  and  inference  •  Mixed-­‐membership  block  models  •  MulEnomial  block  models  and  inference  w/  Gibbs  •  Beas0ary  of  other  probabilis0c  graph  models  

–  Latent-­‐space  models,  exchangeable  graphs,  p1,  ERGM  

Page 10: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Latent  Space  Model  

•  Each  node  i  has  a  latent  posiEon  in  Euclidean  space,  z(i)  

•  z(i)’s  drawn  from  a  mixture  of  Gaussians  •  Probability  of  interacEon  between  i  and  j  depend  on  the  distance  between  z(i)  and  z(j)  

•  Inference  is  a  liYle  more  complicated…  [Handcock  &  Ra]ery,  2007]  

Page 11: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Airoldi’s MMSBM

Page 12: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •
Page 13: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •
Page 14: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Outline  

•  StochasEc  block  models  &  inference  quesEon  •  Review  of  text  models  

– Mixture  of  mulEnomials  &  EM  –  LDA  and  Gibbs  (or  variaEonal  EM)  

•  Block  models  and  inference  •  Mixed-­‐membership  block  models  •  MulEnomial  block  models  and  inference  w/  Gibbs  •  BeasEary  of  other  probabilisEc  graph  models  

–  Latent-­‐space  models,  exchangeable  graphs,  p1,  ERGM  

Page 15: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Exchangeable  Graph  Model  

•  Defined  by  a  2k  x  2k  table  q(b1,b2)  •  Draw  a  length-­‐k  bit  string  b(n)  like  01101  for  each  node  n  from  a  uniform  distribuEon.  

•  For  each  pair  of  node  n,m  – Flip  a  coin  with  bias  q(b(n),b(m))  –  If  it’s  heads  connect  n,m  

complicated •  Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so ui’s are correlated.

•  Pass each ui thru a sigmoid so it’s in [0,1] – call that pi  

• Pick bi using pi

Page 16: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Exchangeable  Graph  Model  

•  Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so ui’s are correlated.

•  Pass each ui thru a sigmoid so it’s in [0,1] – call that pi  

• Pick bi using pi

If α is big then ux,uy are really big (or small) so px,py will end up in a corner.

0 1

1

Page 17: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Exchangeable  Graph  Model  

•  Pick k-dimensional vector u from a multivariate normal w/ variance α and covariance β – so ui’s are correlated.

•  Pass each ui thru a sigmoid so it’s in [0,1] – call that pi  

• Pick bi using pi

If α is big then ux,uy are really big (or small) so px,py will end up in a corner.

0 1

1

Page 18: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

The  p1  model  for  a  directed  graph  •  Parameters,  per  node  i:  

–  Θ:  background  edge  probability  

–  αi:  “expansiveness”  –  how  extroverted  is  i?  

–  βi: “popularity” – how much do others want to be with i?

–  ρij:  “reciprocaEon”  –  how  likely  is  i  to  respond  to  an  incomping  link  with  an  outgoing  one?  

++=↔

+++=←

+++=→

=

)Pr(log

)Pr(log

)Pr(log

)....Pr(log

ij

ijij

jiij

ij

jijijiji

λ

θβαλ

θβαλ

λ

Logistic-regression like procedure can be used to fit this to data from a graph

+ ρij

Page 19: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

ExponenEal  Random  Graph  Model  

•  Basic  idea:  –  Define  some  features  of  the  graph  (e.g.,  number  of  edges,  number  of  triangles,  …)  

–  Build  a  MaxEnt-­‐style  model  based  on  these  features  

•  General:    –  includes  Erdos-­‐Renyi,  p1,  …  

•  Issues  –  ParEEon  funcEon  is  intracEble  –  AlternaEve:  model  condiEonal  pseudo-­‐likelihood  of  a  each  edge  (i.e.,  Pr(edge|rest  of  graph)  

Page 20: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Kroneker  product  graphs  

Page 21: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Kroneker  product  graphs  

Page 22: Analysis(of(Social(Mediawcohen/10-802/10-16-prob-graphs.pdfReview(4(LDA(• LatentDirichletAllocaon(z w β M θ N α • Randomly initialize each z m,n • Repeat for t=1,…. •

Kroneker  product  graphs  

•  Good  fit  to  many  commonly-­‐observed  network  properEes  – scale-­‐free  degree  distribuEon  – diameter  – …  

•  Gradient  descent  can  be  used  to  fit  an  “iniEator  matrix”  to  a  real  adjacency  matrix