Estimating fitness landscapes John Pinney [email protected].

25
Estimating fitness landscapes John Pinney [email protected]

Transcript of Estimating fitness landscapes John Pinney [email protected].

Page 1: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Estimating fitness landscapes

John [email protected]

Page 2: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Genotype network

Page 3: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Genotype network

0 = ‘Wild type’

Page 4: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Genotype network

0

Δ1

Page 5: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Genotype network

0

Δ1

Δ2

Δ3

Δ4

Δ5

Page 6: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Genotype network

0

Δ1Δ2Δ3Δ4Δ5

Δ1Δ2Δ3Δ4

Δ1Δ2Δ3

Δ1Δ2

Δ1

Page 7: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Genotype network

+Fitness values at every node

=Fitness landscape

Page 8: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

With an accurate fitness landscape we could predict:

mutational trajectories e.g. under drug treatment.

rates of emergence of drug resistance.

optimal drug combinations to prevent emergence of drug resistance.

Page 9: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

At best, fitness data for only relatively few genotypes will be available.

Page 10: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

How can we estimate unobserved values?

How can we tell if these estimates are good enough for real applications of fitness landscapes?

Page 11: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

How can we estimate unobserved values?

Specific mutations are expected to contribute to fitness in different ways

=>Machine learning based on mutations as features.

Page 12: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

HIV-1 drug resistance database

http://hivdb.stanford.edu/

A great resource for exploring genotype-phenotype relationships.

Includes a large amount of sequence data from clinical and lab studies from early 1990s onwards.

Page 13: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.
Page 14: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.
Page 15: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

In vitro data

Viruses with known sequence are assayed to assess their ability to reproduce in vitro in the presence of various drugs.

Most of these isolates were obtained from patients who may have been untreated or on any number of drug regimes.

=> some biases in sequence coverage

Genotypes are described using mutations relative to a particular consensus sequence (e.g. subtype B)

Page 16: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Summary of Phenosense results for a variety of protease inhibitors (PIs).

Page 17: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Machine learning from in vitro data

Using mutations relative to the consensus sequence as indicator variables, we can apply standard machine learning techniques to predict fitness under a given condition from the sequence.

Given the large number of uninformative features, LASSO and other techniques that include feature selection tend to do well.

Page 18: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

from Rhee et al.(2006)

using least-squared regression to obtain coefficients for contribution of each mutation to resistance against a selection of PI drugs.

Page 19: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

from Hinkley et al.(2011)

using generalised kernel ridge regression.

tested model using only main effects (ME) against model incorporating epistasis: inter-genic, intra-genic or both (MEEP)

Page 20: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

from Hinkley et al.(2011)

These authors found ~18% improvement in predictive power by including epistasis between mutations within the same gene – e.g. the HIV protease shown.

Page 21: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

In vivo data

A drug resistance fitness landscape in vitro may not be the same as that experienced by the virus when exposed to the patient’s immune system.

Another approach is to learn fitness landscapes by comparing the sequences of drug-naïve viruses against those obtained from patients on a specific drug regime.

Page 22: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Machine learning from in vivo data

Deforche et al. (2008) apply a Bayesian Network

Probability of a set of mutations (A1,A2,...,An)

Fitness of a set of mutations (A1,A2,...,An)

A phylogenetic guide tree is used to take sequence sampling bias into account

Page 23: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.
Page 24: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Predicting and validating mutational trajectories

Page 25: Estimating fitness landscapes John Pinney j.pinney@imperial.ac.uk.

Where next?