Towards Identifying Lateral Gene Transfer Events
Embed Size (px)
description
Transcript of Towards Identifying Lateral Gene Transfer Events

Towards Identifying Lateral Gene Transfer EventsL. AddarioBerry, M. Hallett, J. Lagergren
Presented By: Jeff Mathew

RoadmapKey termstransfer problemHmoves and Imoves algorithmTree generation for simulationExperimental resultsConclusions and future work

Lateral transfer scenarioLGT = HGTRoot of scenario tree must correspond to root of gene treeThe scenario tree is connected and respects the direction of evolution implied by the arcs of T and S.

activityAn active scenario for a gene tree and species tree allows at most alpha copies of a gene to simultaneously exist in the genome of an ancestral taxon. Authors focus on 1active scenarios though intractability results have been proved earlier for 1.

transfer problemInput: Species tree S, gene tree T, integer
Output: A * lateral transfer scenario for S and T, *
Intractability resultThe decision version of the Active, Transfer Problem (does there exist a active scenario with cost ?) is NPcomplete.
is the number of lateral transfer events needed to explain the difference between S and T

Algorithm2 Phase approachPhase 1While Hfat or Ifat vertices remainPerform Hfat move or Ifat moveAt the end of phase 1, we are guaranteed that the scenario is 1active. What about cycles?Phase 2Remove minimum number of LGT events from each candidate to make it acyclic.Running Time: 24 n2

Simulating species treesCreate random species tree S on nleaves. (log n) expected depthS is supposed to reflect the actual evolutionary relationships between taxaS is ultrametric. Therefore, edgeweights correspond to time.Randomly assign weights to every edge such that every roottoleaf path has weighted sum 1.

Simulating gene treesBegin with generated ultrametric species treeLateral transfer events occur according to a Poisson process with mean rate Moving from root to leaves, for each vertex x0 with children x1 and x2, examine both edgesIf the Poisson process provides us with a lateral transfer event along (x0, x1), we add it and point it to a randomly chosen edge alive at that point in time.Else add a speciation event for x1Repeat the analysis for (x0, x2)

Degenerate CasesSimulation can result in plausible biological events that are not detectable by the algorithm.Useless transfers: LGTs that dont change the gene treeTransferloss events: One child of a node is a LGT event. Another child is a loss event.

Results = number of repetitions = true number of LGT events = minimum cost LGT scenario found by algorithm = mean rate of LGTs from Poisson process

Finding the saturation pointThe point when the average stops increasing.Random trees from a large pool were chosen as gene trees and species treesTrials suggest that saturation point is slightly above n/2, i.e., when > n/2, the algorithms stops detecting new LGT eventsThus, if > n/2, the correspondence between T and S via LGT events is not very meaningful.

Results = number of repetitions = true number of LGT events = minimum cost LGT scenario found by algorithm = mean rate of LGTs from Poisson process

Results = number of repetitions = true number of LGT events = minimum cost LGT scenario found by algorithm = mean rate of LGTs from Poisson process

Results = number of repetitions = true number of LGT events = minimum cost LGT scenario found by algorithm = mean rate of LGTs from Poisson process

ConclusionsEmpirically verified feasibility of the transfer algorithm Degenerate events such as transferloss events that result in overestimates of transfers occur with low probabilityAchieved nearoptimal scenarios when is low enough not to cause saturationThe cycle elimination phase of the algorithm is extremely rare in practice implying a O(22 n2) running time.

Future work and open problemsUse weighted gene trees and species treesSpecies trees are nearly ultrametric while gene trees are notDo fast algorithms exist when the input is a set of gene trees with no species tree?Tractability on larger phylogeniesCan we consider gene duplication, lateral gene transfers, and other events simultaneously?Can we use probabilistic models that assign likelihood events to various events and optimize over such models in a tractable manner?
**Lateral gene transfers are the same thing as horizontal gene transfersClarificationsS is still a tree; we have a mixed graph with S and lateral transfer edges. Mixed graph cannot contain directed cycles Cannot have 2 outgoing lateral transfers from the same vertex*An lphaactive scenario for a gene tree and species tree allows at most lpha copies of a gene to simultaneously exist in the genome of an ancestral taxon.
Figures 1 (ii) and (iii) give an intuitive graphical explanation of activity level. At any point during the evolution represented by the shaded region in these diagrams, there exist two copies of the gene in the genome of these ancestral organisms. This scenario is said to be 2 active. We justify our focus on 1activity here by at least 2 observations: it is computationally the most feasible level and earlier experimental papers make the assumption that higher activity levels are caused only by gene duplication events.***We experimented with several alternative approaches for generating ultrametric species trees andfound that this produces trees that minimize the difference between arc weights under the Lnorm. This is important during the gene tree creation phase as it tends to distribute the lateraltransfer events more evenly throughout the species tree.*The exponential distribution occurs naturally when describing the lengths of the interarrival times in a homogeneous Poisson processes. varies between 0 and 1 and is proportional to the number of LGT events in the tree
Suppose that arc is the tail and is the head of the lateral transfer. If there already exist events scheduled for time t, t t, along arc , then all such events are aborted. Thegene lineages associated with these events are lost. This corresponds to the foreign gene knockingout the resident gene in the genome of the organism. Such a protocol is necessary if we are toguarantee the 1activity constraint of the scenario.*Useless transfer: At the point of evolution marked by X, there is a lateral transfer between two arcs in the species tree that share a common parent. Clearly, the gene tree is not changed by such transfers. In other words, the root of the species tree has children ABC and DEF and the root of the gene tree has children ABC and DEF even though a transfer has occurred at X. In the subtree labeled Y of the species tree, we show an example of two uselesstransfers that together do not cause the gene tree to disagree with the species tree. This subtree of the species tree has the ancestor AB and C as siblings. In the gene tree, A remains closer to B due to the later lateral transfer and C remains being a sibling with the ancestor AB via the earlier lateral transfer.
Transferloss event: At the point marked Z in the diagram, a lateral transfer occurs from taxon n to taxon 2. Between point Z and Z, this lineage is lost. Note that one child of the vertex of the gene tree at point Z is a transfer event and one child is a loss event; we term this a transferloss event. Let T be a gene tree and S be a species tree and let be the true number of lateral transfer events that occurred during the period of evolution (the true number of lateral transfer events generated by our simulation of evolution). Let ' be the minimum cost scenario found by our algorithm. When a transferloss event occurs, it may be the case that ' < , ' = or ' > . We term these helpful, harmless, and harmful resp. The example in shows that a single harmful transferloss event can cause the algorithm to require (n) lateral transfers to explain the disagreement between the gene and species tree. It is easy to verify that the minimum cost scenario for this particular example requires n2 transfer events. It is equally easy to create examples of helpful and harmless transferloss events.*In Figure (a), we can see that the number of transfers in a simulation rises linearly as a function of for a fixed n. This is consistent with a Poisson process and our species tree generation routine. As the mean rate parameter grows, the number of transfers will eventually become sufficiently large so that no further transfers will be detected be the algorithm. This is trivially the case if exceeds n2 for a species tree with n leaves. Figure (b) plots the estimated number of transfers versus the . As ' is consistently less than , we may conclude the majority of transfers are not harmful transferloss events. When = 0.6, the average value for is approximately 11 and ' is 8.6 with variance 2.11. Note that for > 0.6, one can see slight saturation occurring.***The last 2 figures together reaffirm the intuition that harmful transferloss events are very rare and, when they do occur, their effect is negated by the occurrence of useless transfers and the effects of saturation. These experiments allow us to conclude that harmful transferloss events are rare, or if they do occur, there exist alternative scenarios with approximately the same overall cost and/or the number of helpful transferloss events and useless transfers is sufficiently high. We also note that over some 10, 000 trials, we did not find a scenario that required cycle elimination (the second phase of the algorithm). Although it is possible to construct an example where the algorithm will require this phase (see Figure 12 in the appendix), it appears that these scenarios are extremely rare or the rate of useless transfers and helpful transferloss events issufficiently high that a scenario with cost ' is created. *Shows number of scenarios with certain costs. = + k where k = 03Number of scenarios with cost is 2.09Number of scenarios with cost + 1 is ~13Number of scenarios with cost + 2 is ~32Number of scenarios with cost + 3 is 51.32
Another interesting finding since you would expect the number of minimum cost scenarios to increase exponentially
***