Towards Identifying Lateral Gene Transfer Events

Towards Identifying Towards Identifying Lateral Gene Transfer Lateral Gene Transfer

EventsEventsL. Addario-Berry, M. Hallett, J. Lagergren

Presented By: Jeff Mathew

RoadmapRoadmapKey termsτ-transfer problemH-moves and I-moves algorithmTree generation for simulationExperimental resultsConclusions and future work

Lateral transfer scenarioLateral transfer scenarioLGT = HGTRoot of scenario tree must correspond

to root of gene treeThe scenario tree is connected and

respects the direction of evolution implied by the arcs of T and S.

αα-activity-activityAn α-active scenario for a gene tree and

species tree allows at most alpha copies of a gene to simultaneously exist in the genome of an ancestral taxon.

Authors focus on 1-active scenarios though intractability results have been proved earlier for α ≥ 1.

ττ-transfer problem-transfer problemInput: Species tree S, gene tree T, integer τ

Output: A τ* lateral transfer scenario for S and T, τ* ≤ τ

Intractability result◦ The decision version of the α-Active, τ-Transfer

Problem (does there exist a α-active scenario with cost ≤ τ?) is NP-complete.

τ is the number of lateral transfer events needed to explain the difference between S and T

AlgorithmAlgorithm2 Phase approachPhase 1

◦While H-fat or I-fat vertices remain Perform H-fat move or I-fat move

At the end of phase 1, we are guaranteed that the scenario is 1-active. What about cycles?

Phase 2◦Remove minimum number of LGT

events from each candidate to make it acyclic.

Running Time: 24τ n2

Simulating species treesSimulating species treesCreate random species tree S on

n-leaves. Θ(log n) expected depthS is supposed to reflect the

actual evolutionary relationships between taxa◦S is ultrametric. Therefore, edge-

weights correspond to time.◦Randomly assign weights to every

edge such that every root-to-leaf path has weighted sum 1.

Simulating gene treesSimulating gene treesBegin with generated ultrametric species

treeLateral transfer events occur according to a

Poisson process with mean rate λMoving from root to leaves, for each vertex

x0 with children x1 and x2, examine both edges◦ If the Poisson process provides us with a lateral

transfer event along (x0, x1), we add it and point it to a randomly chosen edge alive at that point in time.

◦ Else add a speciation event for x1◦ Repeat the analysis for (x0, x2)

Degenerate CasesDegenerate CasesSimulation can result in plausible

biological events that are not detectable by the algorithm.

Useless transfers: LGTs that don’t change the gene tree

Transfer-loss events: One child of a node is a LGT event. Another child is a loss event.

ResultsResults Ω = number of repetitions τ = true number of LGT events τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process

Finding the saturation Finding the saturation pointpointThe point when the average τ‘ stops

increasing.Random trees from a large pool were

chosen as gene trees and species trees◦Trials suggest that saturation point is slightly

above n/2, i.e., when τ > n/2, the algorithms stops detecting new LGT events

Thus, if τ’ > n/2, the correspondence between T and S via LGT events is not very meaningful.

ResultsResults Ω = number of repetitions τ = true number of LGT events τ‘ = minimum cost LGT scenario found by algorithm λ = mean rate of LGTs from Poisson process

ConclusionsConclusionsEmpirically verified feasibility of the τ-

transfer algorithm Degenerate events such as transfer-

loss events that result in over-estimates of transfers occur with low probability

Achieved near-optimal scenarios when λ is low enough not to cause saturation

The cycle elimination phase of the algorithm is extremely rare in practice implying a O(22τ n2) running time.

Future work and open Future work and open problemsproblemsUse weighted gene trees and species

trees◦ Species trees are nearly ultra-metric while

gene trees are notDo fast algorithms exist when the input is

a set of gene trees with no species tree?Tractability on larger phylogeniesCan we consider gene duplication, lateral

gene transfers, and other events simultaneously?

Can we use probabilistic models that assign likelihood events to various events and optimize over such models in a tractable manner?

Towards Identifying Lateral Gene Transfer Events

Documents

Transcript of Towards Identifying Lateral Gene Transfer Events