Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon...

34
Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of Calgary

Transcript of Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon...

Page 1: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Less is more

Approaches to biologist-driven analysis

and next-generation sequencing data

Paul GordonGenome Canada Bioinformatics Platform

University of Calgary

Page 2: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.
Page 3: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

What am I doing here?

• Next Generation Sequencing

• Next Generation Web

• Future challenges

Genome Canada Bioinformatics Platform

Page 4: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Better tech: less DNA, more sequence

44μm

70nm

Page 5: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

PhytoMetaSyn

Page 6: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Sprockets: Hierarchical Gene Models from ESTs

Developed in collaboration with BASF Plant Sciences

Page 7: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Genozymes

Page 8: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Hydrocarbon Metagenomics

Page 9: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.
Page 10: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Exploring gene expression patterns

CAVEman• Java 3D-based, world-first complete 3D human body atlas (adult male)

– 2,335 organs, hierarchical organization following Terminologia Anatomica• Numerous applications involving mapping of genetic and disease data• More information: http://cave.ucalgary.ca/caveman

Patient MRI stack mapped onto atlas and registered by landmarks

Pharmacokinetics visualization(Absorption-distribution-metabolism-excretion of Aspirin)

Page 11: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Basic Research

• Archaeal UV-light response

• Large-scale human genome organization

• ING-protein interactions (cancer and ageing-rated

proteins)

Page 12: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Research Applications

• Kidney transplants: improved rejection diagnostics in Edmonton

•Mad cow disease/chronic wasting disease: live diagnostics

•Desulf.: mechanisms of oil pipeline corrosion and its prevention

Page 13: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

DNA Diagnostics Discovery for Mad Cow

Preclinical ClinicalPreinoculation

Controls

Control animal #6

Ball toy

Photo: S. Czub, CFIA Lethbridge

Page 14: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Next-gen

Motif finding (elk dataset)61 blood samples

107 million base pairs

432 billion pairwise alignments (6574312)

1082019 25mers or smaller

Uninfected 152317

Infected3 universal

Infected 132417

Thousands of animal coverage/timepoint combos (CPU intensive)

Decypher hardware accelerator

Decypher hardware accelerator

Page 15: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Motif Results

Page 16: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

↑ EVI1

↑PLZF

Retrovirus

PrPsc(+?)

↓PLZF-controlled genes

Infectious agent

Circulating Nucleic Acids

Endogenous Retrovirus? Consistent with protein-only evidence…

Neurovirulent? (e.g. M.L. Labat 1999)

Possible mode of action?

Virus particles? ~25nm

PrP Amyloid fibres

Vacuole Manuelidis et al, PN

AS 2007

Protected promoters(Motifs A & B)

Feedback

PrP

Integration

Nucleoprotein complexesCell death

CNA Export

Carp et al., EMBO J., 2006Leblanc et al., EMBO J. 2006Stengel et al., Biochem. Biophys. Res. Commun. 2006Lee et al., Biochem. Biophys. Res. Commun. 2006Etc.Activation

Page 17: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Better tech: less input, more resultsBetter tech: less DNA, more sequence

Generate Manuscript

Now

Page 18: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Where are we at?

Bioinformatics

Web

Emerging Technologies

Life Sciences

Semantic Web

Source: Gartner Inc.

Page 19: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

How software works…

Functions/Rules

Parameters/Input

Results/Output

(article, allele,…)

(Gene name, DNA sequence, QTL…)

Page 20: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

The problem with the Web

Once you label me, you negate me.Søren Kierkegaard

1998 Now

Page 21: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Bluejayhttp://bluejay.ucalgary.ca

Comparative genomics

BioMoby linking

Waypoints

Gene expression integration

Page 22: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

The task at hand (biologist)

Sequencer Data File (Binary)

ACCGT…

KnownProteins

BLASTReport(related

proteins)(computer scientist)

Page 23: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

DNASequence

NCBI_gi

Sequence_Alignment

Page 24: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.
Page 25: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.
Page 26: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Audience

GodAmoeba

Tave

rna

self-

star

ters

Willin

g to

take

traini

ng

Capab

le b

ut fe

arfu

l

Self-perception of computer skills

Page 27: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.
Page 28: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.
Page 29: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

The need for shoehorns

• The current vision of the Semantic Web intends to create a new structure starting up with no reference to its vast, functioning, but more primitive predecessor … things just don’t happen like that

Page 30: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

All the Web as Workflows

Seahawk

Proxied Web page

Drag ‘n’ drop

Seahawkprompting

Page 31: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

What’s Ahead?

The more a man learns, the more he realizes how little he knows

Page 32: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Semantic Web

Page 33: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

http://www.uniprot.org/tissues/229 http://purl.uniprot.org/po/0009009

Page 34: Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of.

Take home messages

As tech improves, we can ask better questions

We will need shoehorns to access existing resources for the foreseeable future