Download - USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Transcript
Page 1: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

USING GPU AND POWER8

TO EXPLORE HOW

GENOMES FOLD

Ido Machol

Aiden Lab

Baylor College of Medicine

Rice University

GTC 2015

Page 2: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce
Page 3: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

THE HUMAN GENOME

IS LONG!

…CGTTTACGAAAATCGCAAAACTTTCGATACCCATAGGCTACTGATCATACGACCGTTTACGAAAATCGAAACCTTTCCGATCTAGGCTAC…

3 BILLION Letters

2 METERS

Nucleus Cell

6 μm

Page 4: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

10 bp

100 bp

1 Kb

10 Kb

100 Kb

1 Mb

10 Mb

100 Mb

Page 5: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

SAME GENOME, DIFFERENT

FUNCTIONS

Page 6: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

PART I:

TECHNOLOGY

Page 7: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

MICROSCOPY &

FLUORESCENT IN SITU HYBRIDIZATION

FISH

Page 8: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

CONTACT MAPPING

Exploring structure via proximity

Page 9: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce
Page 10: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce
Page 11: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

4-11 (lives nearby)

0-3 (lives far away)

Always (same person)

Times in the Same Photo

FACEBOOK CONTACT MAP

Homer

Page 12: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Simpsons'

Contact

Map

# of Pictures Together

4 5 6 7 8 9 10 11 12 13 14

2 0 1 2 1 0 1 0 0

0 3 2 1 0 0 0 0 0

1 2 16 6 5 4 11 1 1

2 1 6 8 6 3 4 0 0

1 0 5 6 8 4 5 1 0

0 0 4 3 4 5 5 0 0

1 0 11 4 5 5 11 1 1

0 0 1 0 1 0 1 2 1

0 0 1 0 0 0 1 1 1 0 16

2 0 1 2 1 0 1 0 0

0 3 2 1 0 0 0 0 0

1 2 16 6 5 4 11 1 1

2 1 6 8 6 3 4 0 0

1 0 5 6 8 4 5 1 0

0 0 4 3 4 5 5 0 0

1 0 11 4 5 5 11 1 1

0 0 1 0 1 0 1 2 1

0 0 1 0 0 0 1 1 1

Page 13: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Hi-C

3D Genome Sequencing

Page 14: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Hi-C: genome-wide Chromosome

Conformation Capture

Erez Lieberman-Aiden, Nynke van Berkum

et al. Science 2009

Page 15: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational Challenge I

Alignment, calculate contacts

…CTGCCTCCTCGCGG CCGCGTGGTGGCAG…

DNA Reference

Sequence

Align to reference genome

… …

Page 16: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Alignment is not trivial

…CTGCC_TCCTCGCGG…

…CTGC__TCCTCGCGG… …CTGAA_TCCTCGCGG… …CTGCCCTCCTCGCGG…

Substitution

Deletion

Insertion

Page 17: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational HW and SW setup

Page 18: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

8 x Power8 Servers

2 Sockets x 12 cores x 8 threads = 192 virtual cores each

Total of 1,536 virtual cores in cluster.

• 4 X 256GB RAM

• 2 X 1024GB RAM

• 2 X 256GB RAM with NVIDIA K40 Tesla

Model 8247-22L and 8247-42L

Byte order: BI-Endian

Rice RSCG PowerOmics

hardware

Page 19: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Tesla K40

Stream Processors 2880

Core Clock 745MHz

Boost Clock(s) 810MHz, 875MHz

Memory Clock 6GHz GDDR5

VRAM 12GB

Single Precision 4.29 TFLOPS

Double Precision 1.43 TFLOPS (1/3)

GPUs

Page 20: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Storage

• IBM GPFS Storage Server (Model 24)

• 4 X JBOD

• Total of 361 TB fast scratch disk space

• (Up to 1.4 Peta bytes)

• FlashSystem 840 20TB Flash

Page 21: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Interconnect:

• 56 Gigabit 36-port FDR IB switch

• Mellanox Next gen Connect-IB FDR Host Channel Adapters

• 10-Gigabit Ethernet

• Internet 2

Interconnect

Page 22: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Rice RSCG PowerOmics

software

Cluster management

• IBM Platform LSF, PPM, PAC, PowerKVM 2.1.0

Operating system

• Ubuntu 14.4 (little-endian) + Red Hat Enterprise Linux 7.0

Storage

• Mellanox OFED 2.4-1

• GPFS 4.1

Scientific

• BioBuilds 2014.11

Page 23: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Challenge -

Alignment of billions of contacts

High Resolution Map 13 billion reads forming 5 billion contacts in the map

IBM Power8 Cluster 675 read alignments / second / CPU core

192 cores

About 27 hours

…CTGCCTCCTCGCGG…

Page 24: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Chromosome

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

Genome

Page 25: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Genome

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

Page 26: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Genome Chromosome 8

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

0 700

Reads/250 kb2

Page 27: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

A

A

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

0 700

Reads/250 kb2

Page 28: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

A B

A

B

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

0 700

Reads/250 kb2

Page 29: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

PART II:

BIOLOGY

Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Erez Lieberman-Aiden, Nynke van Berkum et al. Science 2009 Science, 2009

Page 30: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Genomic analysis of compartments

Genes

Chromosome 14 Mb2 Pixels 1

The two compartments correlate strongly with open and closed chromatin

kb2 Pixels 100

Page 31: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

The whole genome is plaid

1 2 3 4 5 6 7 8

9 10 11 12 13 14 15 16

17 18 19 20 21 22 X

Page 32: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

A TOUR OF THE NUCLEUS

Page 33: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Organization

observed at three distinct scales

NUCLEAR SCALE

100Mb

CHROMOSOME SCALE MEGABASE SCALE

10Mb 1Mb

Page 34: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Organization

observed at three distinct scales

NUCLEAR SCALE

100Mb

CHROMOSOME SCALE MEGABASE SCALE

10Mb 1Mb

Page 35: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Organization

observed at three distinct scales

NUCLEAR SCALE

100Mb

CHROMOSOME SCALE MEGABASE SCALE

10Mb 1Mb

Page 36: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping Suhas Rao*, Miriam Huntley*, Neva Durand, Elena Stamenova, Ivan Bochkov, James Robinson, Adrian Sanborn, Ido Machol, Arina Omer, Eric Lander, Erez Lieberman Aiden Cell 2014

Page 37: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

5 b

illio

n c

on

tact

s

30

mill

ion

co

nta

cts

More Contacts, Higher Resolution

Page 38: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Detection of Chromatin Loops Genome-

wide via Hi-C

A

A-2ε A-ε

A+ε

A+2ε

B-ε

B-2ε

B

B+ε B+2ε

Page 39: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Into the loops

L3 L2 L1

L1 L2 L3

Page 40: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational Challenge III

Loop calling

Which one shows a loop?

Page 41: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

X

X

3D Map Features

X

Page 42: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational Challenge III

Loop calling

• Apply 4 filters for each pixel.

• 20 Giga pixel image.

• Millions of parallel filters.

NVIDIA Tesla GPU 200x faster than previous CPU implementation – from 3 weeks to 3 hours.

Page 43: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

10,000 Loops in the Human Genome

Page 44: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Loops turn genes on and off

Lung fibroblast cell Lymphoblastoid cell

Page 45: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

SUMMARY OF

COMPUTATIONAL

EFFORTS

Page 46: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Sequence alignment

proportions

Genome data production and analysis

• In about 36 months we produced sequence equivalent of more than 2200x coverage of the human genome.

• For reference, the Human Genome Project produced 12.6x coverage, over the span of 4 years.

Storage

• We currently have 25 TB of RAW sequenced data

• We sequence 1 TB each month.

• After processing the raw sequenced data, we store 3 TB of Raw and processed data.

Page 47: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational speed up

Cluster processing

• We produce 1 Billion reads per month.

• Power8 is capable of processing alignments at 675 reads/second per CPU core.

• 50% faster then the cluster system we were using before.

• At this speed, we consume about 17 “CPU days” per month.

• With power8 cluster having over 192 cores, the jobs complete processing in about 2 hours.

GPU processing

• Using NVIDIA Tesla K40, we run our loop calling algorithm over a 20Giga pixel map 200x faster than CPU implementation.

• Instead of 3 weeks we get the work done in only 3 hours.

Page 48: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

aidenlab.org/juicebox

Page 49: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Aiden Lab

Erez Lieberman Aiden

Suhas Rao

Miriam Huntley

Neva C Durand

Elena Stamenova

Adrian Sanborn

Arina Omer

Ivan Bochkov

Olga Dudchenko

Robert Nnake

Su-Chen Huang

Muhammad Shamim

Chris Lui

Sarah Nyquist

Sanjit Batra

Ashok Cutkosky

Najeeb Tarazi

Jian Li

Broad Institute

Eric Lander

Jim Robinson

GREETINGS FROM

ANOTHER DIMENSION