map-D - NVIDIA€¦ · map-D GDWD UHÀQHG @datarefined Todd Mostak Steven Stewart [email protected]...

13
map-D GDWD UHÀQHG www.map-d.com @datarefined Todd Mostak Steven Stewart [email protected] [email protected] Ι Ι Ι Ι 245 First St. Suite 1832 Cambridge, MA 02148 #mapd @datarefined

Transcript of map-D - NVIDIA€¦ · map-D GDWD UHÀQHG @datarefined Todd Mostak Steven Stewart [email protected]...

Page 1: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

map-DGDWD�UHÀQHG

www.map-d.com @datarefined

Todd Mostak Steven Stewart

[email protected] [email protected]  

Ι Ι  

Ι Ι  

245 First St. Suite 1832 Cambridge, MA 02148

#mapd @datarefined  

Page 2: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

map-D? super-fast database built into GPU memory

Do? world’s fastest real-time big data analytics interactive visualization

Demo? twitter analytics platform 1billion+ tweets milliseconds

Page 3: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

The importance of interactivity

People have struggled for a long time to build interactive visualizations of big data that can deliver insight

•  Hypothesis testing can occur at “speed of thought”

Interactivity means:

How Interactive is interactive enough?

•  According to a study by Jeffrey Heer and Zhicheng Liu, “an injected delay of half a second per operation adversely affects user performance in exploratory data analysis.”

•  Some types of latency are more detrimental than others:

•  For example, linking and brushing more sensitive than zooming

Page 4: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

Strategies for interactivity

•  Sampling:

•  Ex. BlinkDB

•  Issues:

•  Need statistically robust method for sampling

•  Sampling can miss “long-tail” phenomena

•  Pre-computation

•  Ex. ImMems (datacubing)

•  Issues:

•  Only can show what curator thought was relevant

•  Can only store a certain number of binned attributes

•  Must be curated!

•  At the same time, Map-D also rendered HD data visualizations and sent them to Tweetmap’s interactive analytics GUI

Live demo: www.mapd.csail.mit.edu SC13 video and write up:

Page 5: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

The Arrival of In-Memory Systems

•  Traditional RDBMS used to be too slow to serve as a back-end for interactive visualizations.

•  Queries over a billion records could take minutes if not hours

•  But in-memory systems can execute such queries in a fraction of the time.

•  Both full DBMS and “pseudo”-DBMS solutions

•  But still often too slow

Page 6: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

Enter Map-D

Page 7: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

the technology

Page 8: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

Core Innovation

SQL-enabled column store database built into the memory architecture on GPUs and CPUs

•  Memory bandwidth •  Massive parallelism across multiple GPUs

•  Systems with both GPU and CPU memory •  Near-linear scaling to clusters of GPU nodes

System can process > 2TB/sec per node, with > 10TB/sec per node logical throughput with shared scans

Code developed from scratch to take advantage of:

Double-level buffer pool across GPU and CPU memory

Shared scans – multiple queries of the same data can share memory bandwidth

Page 9: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

Multiple GPUs, with data partitioned between them

Node  1   Node  2   Node  3  

Filter  text ILIKE ‘rain’!

Filter  text ILIKE ‘rain’!

Filter  text ILIKE ‘rain’!

Shared Nothing Processing

Page 10: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

the product

Page 11: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

Complex  Analy-cs  

GPU  in-­‐memory  SQL  database  

Visualiza-on  

Image  processing  OpenGL  

H.264/VP8  streaming  GPU  pipeline  

 

Machine  learning    Graph  analy-cs  

       

Scale  to  cluster  of  GPU  nodes  SQL  compiler  Shared  scans  User  defined  func-ons  Hybrid  GPU/CPU  execu-on  OpenCL  and  CUDA      

License

Simple  #  of  GPUs  

Mobile/server  versions  

Product GPU  powered  end-­‐to-­‐end  big  data  analy-cs  and  visualiza-on  plaQorm        

Page 12: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

Ma

p-D

co

de

Single GPU

12GB memory

Map-D code integrated into GPU memory

Single CPU

768GB memory

Map-D code integrated into CPU memory

NVIDIA TEGRA Mobile chip

4GB memory

Map-D code integrated into chip memory

8 cards = 4U box

4 sockets = 4U box

Map-D code runs on GPU + CPU memory

36U rack: ~400GB GPU ~12TB CPU

Mobile Map-D running small datasets

Native App

Web-based service

Map-D hardware architecture

Large Data Big Data

Small Data

Next Gen Flash 40TB

100GB/s

Page 13: map-D - NVIDIA€¦ · map-D GDWD UHÀQHG  @datarefined Todd Mostak Steven Stewart todd@map-d.com steve@map-d.com ! Ι Ι! Ι Ι! 245 First St. Suite 1832 Cambridge, MA 02148

map-D

www.map-d.com

@datarefined

[email protected]