Principal Component Analysis of Tree Topology

Yongdai KimSeoul National University2011. 6. 5

Presented by J. S. Marron, SAMSI

UNC, Stat & OR

Dyck Path Challenges

Data trees not like PC projections• Branch Lengths ≥ 0• Big flat spots

UNC, Stat & OR

Brain Data: Mean – 2 σ1 PC1

Careful about values < 0

UNC, Stat & OR

Interpret’n: Directions Leave Positive Or-thant

(pic here)

UNC, Stat & OR

Visualize Trees

Important Note:Tendency TowardsLarge Flat SpotsAnd Bursts of Nearby Branches

UNC, Stat & OR

Alternate Approaches: Branch Length Representation Tree Pruning Non-negative Matrix Factorization Bayesian Factor Model

UNC, Stat & OR

Discussed by Dan

UNC, Stat & OR

Discussed by Dan

UNC, Stat & OR

Discussed by Lingsong

UNC, Stat & OR

Non-neg’ve Matrix Factoriza-tion

Ideas:

Linearly Approx. Data (as in PCA)

But Stay in Positive Orthant

UNC, Stat & OR

Contents

1. Introduction

2. Proposed Method

3. Bayesian Factor Model

4. PCA

5. Estimation of Projected Trees

Introduction Given data , let be branch

length vectors.

Dimension p = # nodes in support (union) tree.

For tree , define tree topology vector , p-dimensional binary vector where

Goal: PCA method for

1, , nT T 1, , nv v

iT iy ( 0),ik iky I v 1, , .k p

1, , ny y

UNC, Stat & OR

Visualize Trees

Important Note:Tendency TowardsLarge Flat SpotsAnd Bursts of Nearby Branches

UNC, Stat & OR

Goal of Bayes Factor Model

Model Large Flat Spots as yi = 0

Proposed Method

1. Gaussian Latent Variable Model

2. Est. Corr. Matrix: Bayes Factor Model

3. PCA on Est’ed Correlation Matrix

4. Interpret in Tree Space

Proposed Method

1. Gaussian Latent Variable Model

~ ( , ), 1, ,i PX N i n

Assume ( 0, 0), 1, , where( , , )' and ( ) is the parent node of k

ik ik pa k

y I x x k pR pa k

Proposed Method

2. Estimation of the correlation matrix by Bayesian factor model

• Estimate and by Bayesian factor model

Proposed Method

3. PCA with an estimated correlation matrix

• Apply the PCA to an estimated

Proposed Method

3. Estimation of projected tree

• Define projected trees on PCA directions

• Estimate the projected trees by MCMC algorithm

Bayesian Factor Model

1. Model

2. Priors

3. MCMC algorithm

4. Convergence diagnostic

1. Model

| , , ~ ( , ), 1, ,

where q is a positive integer, , , arep dimensionalvectors, ~ (0, ), 1, , ,and ( , 1, , ).Let ( , 1, , ) and

i i iq P il li

il z l

z z l k

X z z N z W i n

W Wz N l q

diag k pdiag l q 2 1 for identifiability.

2. Prior•

• This prior has been proposed by Ghosh and Dun-son(2009)

2~ (0, ) iidk N

~ (0,1), iid and ~ , where ( , , )'.

l l lp

w N l k w l kW w w

2, , 1, , ~ ( , ) iidz l z zl q InvGamma a b

3. MCMC algorithm• Notation

• Step 1. generate

1 1 12 2 2

Let { , } where , , ( , , ) ,

( , , ) , ( , , ), and ( , , ).

k z ε

k k qk n n

p z z zq n

R X,Z,W ,σ σ ,y,μW (w , ,w ) X (X , ,X ) Z Z Z

W W W σ σ σ y y y

2, ( ) ,1

- If 1 and 1, generate ~ ( , ) conditional on 0. - If 0 and 1, generate ~ ( , ) conditional on 0.

qik i pa k ik k il lk kl

ik i pa k ik k il lk kl

y y X N z wX

| { }:ik ikX R X

3. MCMC algorithm • Step 2. generate

2 | { }, 1,..., ~ ( , ), k k k kR m p N

, ( )1

, ( )2 2 21 1, ,

qi pa k ni

k i pa k ik il lkk lk k

yy X z w

, ( )2 1

i pa ki

3. MCMC algorithm

min{ , } min{ , }| { }, 1,..., ~( ( , ),0 ),k k k q k k q q kW R W k p N

' 1min{ , } 2 2

( 1 )1( ) k k k k nk k q k k k

Z XI Z Z

1 , ( )Here ( ,..., ) , 1 , 1,...,

and min , , 1,...,k k nk k i pa k

X X X diag I y i n

diag I l k q l q

' 1min{ , } 2

1( )k k q k k kk

3. MCMC algorithm

| { }, i=1,...n ~ N ( , ),i i q i iZ R Z

11 1 '' ( - )( W )i i i ii i z XWW

-1'( W )~

where ( ( ) 1, 1,..., ).

i i i z

i i pa k

diag I y k p

3. MCMC algorithm

2 2 2, , 1| { }, 1, , ~ ( /2, /2)n

z l z l z z iliR l q InvGamma a n b z

4. Convergence diagnostic.

• 100000 iteration of MCMC algorithm after 10000 burn-in iteration

• 1000 posterior samples obtained at every 100 itera-tion

• Trace plots, ACF (Auto Correlation functions) and his-tograms of the three selected s and a selected

(Note ).

2,( ) ( , 1, ) ' , z lCov X Wdiag l qW'( , )k kCov X X

3. Convergence diagnostic: Three s

• 1000 posterior samples obtained at every 100 iter-ation

• Trace plot, acf functions and histograms of the three selected s

4. Convergence diagnostic: A

• 1000 posterior samples obtained at every 100 itera-tion

• Trace plot, acf functions and histograms of the three selected s(25%, 50%, 75%) and

2,( ) ( , 1, ) 'z lCov X Wdiag l qW

'( , )k kCov X X

• Scree plot

Apply the standard PCA to estimate correlation to obtain the many PCA direction vectors

, , with the corresponding eigenvectors 0

Visualizing Modes of Variation•

Assume that follows the multivariate Gaussiandistribution with mean and covariance

where =diag( , k=1, ,p) and

X l l ll

k l lkl

With fixing (W, 1, , ), ( , 1, , ) and obtained from PCA, the posterior distribution of as well as conditional on ys are obtained by MCMC algorithm.

Latent variable : | ~ (

i i p il ll

l q l qX

X Z N z W

and ~ (0, ( , 1, , ))

i lZ N diag l q

All of the other parameters , , and are estimable.i iX Z

Obtain the posterior distribution of the projection ',given .

li l l i l

X W X Wy

( ) ( ) ( ), ( )

Define the th projected tree {( 0, 0)| }

l l lik ik i pa k i

l Ty median I X X y

Center Point, μ

Approximately μ + 0.5 PC1

Center Point, μ

Approximately μ - 0.5 PC1

Visualizing Modes of Variation• Hard to Interpret• Scaling Issues?• Promising and Intuitive• Work in Progress …• Future goals• Improved Notion of PCA• Tune Bayes Approach for Better Interpretation• Integrate with Non-Neg. Matrix Factorization• ……..

Principal Component Analysis of Tree Topology

Documents

Transcript of Principal Component Analysis of Tree Topology

AppliedTopologyOptimization - SINTEF · Said Zeidan Topology Optimization AppliedTopologyOptimization The topology optimization method ... Loads on cantilever beam

Symmetry, Topology and Phases of Matter · 2016-05-21 · Topology and Band Theory I I. Introduction - Insulating state, topology and band theory II. Band Topology in One Dimension

TOPOLOGY OPTIMIZATION BY THE HOMOGENIZATION - CMAP

Multiway Search Tree (MST)

Tensor topology - homepages.inf.ed.ac.ukhomepages.inf.ed.ac.uk/cheunen/cvqt/slides/Enriquemoliner.pdf · Tensor topology Pau Enrique Moliner1 Chris Heunen 1 Sean Tull 2 1University

Topology and Shape Optimization with · PDF file4 Ι r 1 Topology Optimization “Topology optimization is a phrase used to characterize design optimization formulations that allow

Green tree

Bubble Tree Convergence for Harmonic Maps

Topology Lecture Notes - pmelvin.blogs.brynmawr.edu

10 Min Guide: Tea Tree Oil

89399626 Lant Cinematic Principal Strung Normal

Story tree

Lightning Talks II Tech Topology Conference - …people.math.gatech.edu/~etnyre/TechTopology/2017/SatLightningFinal… · Lightning Talks Session I Tech Topology Conference December

A Topology Control in Wireless Networksnetworks.cs.ucdavis.edu/presentationEiman/Eiman_Topology...5-Conn 4-Conn Option 2 Tree Star Topology) Network Delay 0 20 40 60 80 100 120 140

The Complexity of Tree Transducer Output Languages

Kostas Tsingos - Ifigenia Tsoka family tree

Correlations, Principal Component Analysis

The tree 14o axarnon

Trail Through The Tree Tops.

Teratsia Carob Tree