[2C5]Map-D: A GPU Database for Interactive Big Data Analytics

15
map-D GDWD UHÀQHG www.map-d.com @datarefined Todd Mostak [email protected] Ι Ι 1830 Sansome St. San Francisco, CA 94104 #mapd @datarefined

description

DEVIEW 2014 [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Transcript of [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Page 1: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

map-DGDWD�UHÀQHG

www.map-d.com @datarefined Todd Mostak [email protected] Ι Ι

1830 Sansome St. San Francisco, CA 94104

#mapd @datarefined  

Page 2: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

map-D? super-fast database built into GPU memory

Do? world’s fastest real-time big data analytics interactive visualization

Demo? twitter analytics platform 1billion+ tweets milliseconds

Page 3: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

The importance of interactivity

People have struggled for a long time to build interactive visualizations of big data that can deliver insight

•  Hypothesis testing can occur at “speed of thought”

Interactivity means:

How Interactive is interactive enough?

•  According to a study by Jeffrey Heer and Zhicheng Liu, “an injected delay of half a second per operation adversely affects user performance in exploratory data analysis.”

•  Some types of latency are more detrimental than others:

•  For example, linking and brushing more sensitive than zooming

Page 4: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Strategies for interactivity

•  Sampling:

•  Ex. BlinkDB

•  Issues:

•  Need statistically robust method for sampling

•  Sampling can miss “long-tail” phenomena

•  Pre-computation

•  Ex. ImMems (datacubing)

•  Issues:

•  Only can show what curator thought was relevant

•  Can only store a certain number of binned attributes

•  Must be curated!

•  At the same time, Map-D also rendered HD data visualizations and sent them to Tweetmap’s interactive analytics GUI

Live demo: www.mapd.csail.mit.edu SC13 video and write up:

Page 5: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

The Arrival of In-Memory Systems

•  Traditional RDBMS used to be too slow to serve as a back-end for interactive visualizations.

•  Queries of over a billion records could take minutes if not hours

•  But in-memory systems can execute such queries in a fraction of the time.

•  Both full DBMS and “pseudo”-DBMS solutions

•  But still often too slow

Page 6: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Enter Map-D

Page 7: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

the technology

Page 8: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Core Innovation

SQL-enabled column store database built into the memory architecture on GPUs and CPUs

•  Memory and computational bandwidth of multiple GPUs •  Heterogeneous architectures (CPUs and GPUs)

•  Fast RDMA between GPUs on different nodes •  GPU Graphics pipeline

System can scan data at > 2TB/sec per node, with > 10TB/sec per node logical throughput with shared scans

Code developed from scratch to take advantage of:

Double-level buffer pool across GPU and CPU memory

Shared scans – multiple queries of the same data can share memory bandwidth

Page 9: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

The  Hardware  

S1  

CPU  0   CPU  1  

RAID  Controller  

GPU  0  

S2   S3   S4  

GPU  3  GPU  2  GPU  1  

IB   IB  

QPI  PCI   PCI  

S1  

CPU  0   CPU  1  

RAID  Controller  

GPU  0  

S2   S3   S4  

GPU  3  GPU  2  GPU  1  

IB   IB  

QPI  PCI   PCI  

Switch  

Node  0   Node  1  

Page 10: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

The  Two-­‐Level  Buffer  Pool  

GPU  Memory  

CPU  Memory  

SSD  

Page 11: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Multiple GPUs, with data partitioned between them

Node  1   Node  2   Node  3  

Filter  text ILIKE ‘rain’!

Filter  text ILIKE ‘rain’!

Filter  text ILIKE ‘rain’!

Shared Nothing Processing

Page 12: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

the product

Page 13: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Complex  AnalyKcs  

GPU  in-­‐memory  SQL  database  

VisualizaKon  

Image  processing  OpenGL  

H.264/VP8  streaming  GPU  pipeline  

 

Machine  learning    Graph  analyKcs  

       

Scale  to  cluster  of  GPU  nodes  SQL  compiler  Shared  scans  User  defined  funcKons  Hybrid  GPU/CPU  execuKon  OpenCL  and  CUDA      

License

Simple  #  of  GPUs  

Mobile/server  versions  

Product GPU  powered  end-­‐to-­‐end  big  data  analyKcs  and  visualizaKon  plaYorm        

Page 14: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

Ma

p-D

co

de

Single GPU

12GB memory

Map-D code integrated into GPU memory

Single CPU

768GB memory

Map-D code integrated into CPU memory

NVIDIA TEGRA Mobile chip

4GB memory

Map-D code integrated into chip memory

8 cards = 4U box

4 sockets = 4U box

Map-D code runs on GPU + CPU memory

36U rack: ~400GB GPU ~12TB CPU

Mobile Map-D running small datasets

Native App

Web-based service

Map-D hardware architecture

Large Data Big Data

Small Data

Next Gen Flash 40TB

100GB/s

Page 15: [2C5]Map-D: A GPU Database for Interactive Big Data Analytics

map-D

www.map-d.com

@datarefined

[email protected]