USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD -...

49
USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD Ido Machol Aiden Lab Baylor College of Medicine Rice University GTC 2015

Transcript of USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD -...

Page 1: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

USING GPU AND POWER8

TO EXPLORE HOW

GENOMES FOLD

Ido Machol

Aiden Lab

Baylor College of Medicine

Rice University

GTC 2015

Page 2: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce
Page 3: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

THE HUMAN GENOME

IS LONG!

…CGTTTACGAAAATCGCAAAACTTTCGATACCCATAGGCTACTGATCATACGACCGTTTACGAAAATCGAAACCTTTCCGATCTAGGCTAC…

3 BILLION Letters

2 METERS

Nucleus Cell

6 μm

Page 4: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

10 bp

100 bp

1 Kb

10 Kb

100 Kb

1 Mb

10 Mb

100 Mb

Page 5: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

SAME GENOME, DIFFERENT

FUNCTIONS

Page 6: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

PART I:

TECHNOLOGY

Page 7: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

MICROSCOPY &

FLUORESCENT IN SITU HYBRIDIZATION

FISH

Page 8: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

CONTACT MAPPING

Exploring structure via proximity

Page 9: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce
Page 10: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce
Page 11: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

4-11 (lives nearby)

0-3 (lives far away)

Always (same person)

Times in the Same Photo

FACEBOOK CONTACT MAP

Homer

Page 12: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Simpsons'

Contact

Map

# of Pictures Together

4 5 6 7 8 9 10 11 12 13 14

2 0 1 2 1 0 1 0 0

0 3 2 1 0 0 0 0 0

1 2 16 6 5 4 11 1 1

2 1 6 8 6 3 4 0 0

1 0 5 6 8 4 5 1 0

0 0 4 3 4 5 5 0 0

1 0 11 4 5 5 11 1 1

0 0 1 0 1 0 1 2 1

0 0 1 0 0 0 1 1 1 0 16

2 0 1 2 1 0 1 0 0

0 3 2 1 0 0 0 0 0

1 2 16 6 5 4 11 1 1

2 1 6 8 6 3 4 0 0

1 0 5 6 8 4 5 1 0

0 0 4 3 4 5 5 0 0

1 0 11 4 5 5 11 1 1

0 0 1 0 1 0 1 2 1

0 0 1 0 0 0 1 1 1

Page 13: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Hi-C

3D Genome Sequencing

Page 14: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Hi-C: genome-wide Chromosome

Conformation Capture

Erez Lieberman-Aiden, Nynke van Berkum

et al. Science 2009

Page 15: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational Challenge I

Alignment, calculate contacts

…CTGCCTCCTCGCGG CCGCGTGGTGGCAG…

DNA Reference

Sequence

Align to reference genome

… …

Page 16: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Alignment is not trivial

…CTGCC_TCCTCGCGG…

…CTGC__TCCTCGCGG… …CTGAA_TCCTCGCGG… …CTGCCCTCCTCGCGG…

Substitution

Deletion

Insertion

Page 17: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational HW and SW setup

Page 18: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

8 x Power8 Servers

2 Sockets x 12 cores x 8 threads = 192 virtual cores each

Total of 1,536 virtual cores in cluster.

• 4 X 256GB RAM

• 2 X 1024GB RAM

• 2 X 256GB RAM with NVIDIA K40 Tesla

Model 8247-22L and 8247-42L

Byte order: BI-Endian

Rice RSCG PowerOmics

hardware

Page 19: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Tesla K40

Stream Processors 2880

Core Clock 745MHz

Boost Clock(s) 810MHz, 875MHz

Memory Clock 6GHz GDDR5

VRAM 12GB

Single Precision 4.29 TFLOPS

Double Precision 1.43 TFLOPS (1/3)

GPUs

Page 20: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Storage

• IBM GPFS Storage Server (Model 24)

• 4 X JBOD

• Total of 361 TB fast scratch disk space

• (Up to 1.4 Peta bytes)

• FlashSystem 840 20TB Flash

Page 21: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Interconnect:

• 56 Gigabit 36-port FDR IB switch

• Mellanox Next gen Connect-IB FDR Host Channel Adapters

• 10-Gigabit Ethernet

• Internet 2

Interconnect

Page 22: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Rice RSCG PowerOmics

software

Cluster management

• IBM Platform LSF, PPM, PAC, PowerKVM 2.1.0

Operating system

• Ubuntu 14.4 (little-endian) + Red Hat Enterprise Linux 7.0

Storage

• Mellanox OFED 2.4-1

• GPFS 4.1

Scientific

• BioBuilds 2014.11

Page 23: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Challenge -

Alignment of billions of contacts

High Resolution Map 13 billion reads forming 5 billion contacts in the map

IBM Power8 Cluster 675 read alignments / second / CPU core

192 cores

About 27 hours

…CTGCCTCCTCGCGG…

Page 24: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Chromosome

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

Genome

Page 25: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Genome

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

Page 26: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Genome Chromosome 8

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

0 700

Reads/250 kb2

Page 27: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

A

A

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

0 700

Reads/250 kb2

Page 28: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

A B

A

B

Hi-C

GENERATES GENOME-

WIDE CONTACT MAPS

0 700

Reads/250 kb2

Page 29: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

PART II:

BIOLOGY

Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Erez Lieberman-Aiden, Nynke van Berkum et al. Science 2009 Science, 2009

Page 30: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Genomic analysis of compartments

Genes

Chromosome 14 Mb2 Pixels 1

The two compartments correlate strongly with open and closed chromatin

kb2 Pixels 100

Page 31: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

The whole genome is plaid

1 2 3 4 5 6 7 8

9 10 11 12 13 14 15 16

17 18 19 20 21 22 X

Page 32: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

A TOUR OF THE NUCLEUS

Page 33: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Organization

observed at three distinct scales

NUCLEAR SCALE

100Mb

CHROMOSOME SCALE MEGABASE SCALE

10Mb 1Mb

Page 34: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Organization

observed at three distinct scales

NUCLEAR SCALE

100Mb

CHROMOSOME SCALE MEGABASE SCALE

10Mb 1Mb

Page 35: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Organization

observed at three distinct scales

NUCLEAR SCALE

100Mb

CHROMOSOME SCALE MEGABASE SCALE

10Mb 1Mb

Page 36: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping Suhas Rao*, Miriam Huntley*, Neva Durand, Elena Stamenova, Ivan Bochkov, James Robinson, Adrian Sanborn, Ido Machol, Arina Omer, Eric Lander, Erez Lieberman Aiden Cell 2014

Page 37: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

5 b

illio

n c

on

tact

s

30

mill

ion

co

nta

cts

More Contacts, Higher Resolution

Page 38: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Detection of Chromatin Loops Genome-

wide via Hi-C

A

A-2ε A-ε

A+ε

A+2ε

B-ε

B-2ε

B

B+ε B+2ε

Page 39: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Into the loops

L3 L2 L1

L1 L2 L3

Page 40: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational Challenge III

Loop calling

Which one shows a loop?

Page 41: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

X

X

3D Map Features

X

Page 42: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational Challenge III

Loop calling

• Apply 4 filters for each pixel.

• 20 Giga pixel image.

• Millions of parallel filters.

NVIDIA Tesla GPU 200x faster than previous CPU implementation – from 3 weeks to 3 hours.

Page 43: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

10,000 Loops in the Human Genome

Page 44: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Loops turn genes on and off

Lung fibroblast cell Lymphoblastoid cell

Page 45: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

SUMMARY OF

COMPUTATIONAL

EFFORTS

Page 46: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Sequence alignment

proportions

Genome data production and analysis

• In about 36 months we produced sequence equivalent of more than 2200x coverage of the human genome.

• For reference, the Human Genome Project produced 12.6x coverage, over the span of 4 years.

Storage

• We currently have 25 TB of RAW sequenced data

• We sequence 1 TB each month.

• After processing the raw sequenced data, we store 3 TB of Raw and processed data.

Page 47: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Computational speed up

Cluster processing

• We produce 1 Billion reads per month.

• Power8 is capable of processing alignments at 675 reads/second per CPU core.

• 50% faster then the cluster system we were using before.

• At this speed, we consume about 17 “CPU days” per month.

• With power8 cluster having over 192 cores, the jobs complete processing in about 2 hours.

GPU processing

• Using NVIDIA Tesla K40, we run our loop calling algorithm over a 20Giga pixel map 200x faster than CPU implementation.

• Instead of 3 weeks we get the work done in only 3 hours.

Page 48: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

aidenlab.org/juicebox

Page 49: USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD - …on-demand.gputechconf.com/gtc/2015/presentation/S5821... · 2015-03-18 · Computational speed up Cluster processing • We produce

Aiden Lab

Erez Lieberman Aiden

Suhas Rao

Miriam Huntley

Neva C Durand

Elena Stamenova

Adrian Sanborn

Arina Omer

Ivan Bochkov

Olga Dudchenko

Robert Nnake

Su-Chen Huang

Muhammad Shamim

Chris Lui

Sarah Nyquist

Sanjit Batra

Ashok Cutkosky

Najeeb Tarazi

Jian Li

Broad Institute

Eric Lander

Jim Robinson

GREETINGS FROM

ANOTHER DIMENSION