QCon London 2016 - Patterns of reliable in-stream processing @ Scale

41
PATTERNS OF RELIABLE IN-STREAM PROCESSING @ SCALE Alexey Kharlamov

Transcript of QCon London 2016 - Patterns of reliable in-stream processing @ Scale

PATTERNSOFRELIABLEIN-STREAMPROCESSING@SCALEAlexeyKharlamov

50+Billionevents/day

20+Terabytes/day

Upto1.5Mevent/second

10+DataCenters

Whydidwedecidetogoreal-Gme?

300servers

50servers

λ

LAMBDAArchitecture

Realtime View

HistoricalView

KAFKA CLUSTER

Query Layer

HADOOP

Couldwelowercosts?

λ

κ

AggregatedData

KAFKA CLUSTER

Query Layer

KAPPAArchitecture

κ

Log

Checkpoint N - 1

Transaction N

Processingfailure

Rollback

κLog

Session 1

Session 2

Session 3

TX N-1 TX N TX N+1

Even

t

Even

t

Even

t

Even

t

Even

t

Even

t

Even

t

Even

t

Even

t

Even

t

Session 1

κ

50servers

50servers

AggregatedData

KAFKA CLUSTER

Query Layer

STATE STORAGE

AlldatatransformaGonsareidempotentandstateisvolaGle.

κ

κ

Input Topic

State S1

State S2

Output

S1=F1(Input)

S2=F2(S1)

Output=F3(S2)

κ

Log

Checkpoint N - 1

State lossposition

Rollback

Areweallset?

κ

Oops!SomethingwrongwithGme!

κ

EventTime

Log

κ

Actually…Morelikethat

κLog

Inreality…

κLog

Inreality…

κLog

LogicalGmeiscumulaGvemaximumofobservedeventGmestamps

κ

Inreality…

κLog

SessionWindow

Inreality…

κLog

SessionWindow

Log

Log

Log

Log

Log

C1 C2 C3 C4 C5

RogueBroker

κ

RogueBroker

κLog

LOSTDATA

LimitlogicalGmeprogressacrossconsumers

κ

LogicalGmeprogresscontrol

κ

TX N

TX N TX

N

TX N

TX N

+ 1

TX N

+ 1 TX

N +

1

TX N

Progress limit

TX N

TX N TX

N

TX N

TX N

+ 1

TX N

+ 1

TX N

+ 1

TX N

Progress limit

LogicalGmeprogresscontrol

κ

GlobalsynchronizaGon

Assumedatadistributedevenly

ZipfdistribuGon

HardwaremaYers!

QuesGons?

THANKYOU!@[email protected]

LogicalTime(watermark)

κLog