PATTERNSOFRELIABLEIN-STREAMPROCESSING@SCALEAlexeyKharlamov
50+Billionevents/day
Upto1.5Mevent/second
Whydidwedecidetogoreal-Gme?
300servers
50servers
λ
LAMBDAArchitecture
Realtime View
HistoricalView
KAFKA CLUSTER
Query Layer
HADOOP
Couldwelowercosts?
λ
κ
AggregatedData
KAFKA CLUSTER
Query Layer
KAPPAArchitecture
κ
Log
Checkpoint N - 1
Transaction N
Processingfailure
Rollback
κLog
Session 1
Session 2
Session 3
TX N-1 TX N TX N+1
Even
t
Even
t
Even
t
Even
t
Even
t
Even
t
Even
t
Even
t
Even
t
Even
t
Session 1
κ
50servers
50servers
AggregatedData
KAFKA CLUSTER
Query Layer
STATE STORAGE
AlldatatransformaGonsareidempotentandstateisvolaGle.
κ
κ
Input Topic
State S1
State S2
Output
S1=F1(Input)
S2=F2(S1)
Output=F3(S2)
κ
Log
Checkpoint N - 1
State lossposition
Rollback
Oops!SomethingwrongwithGme!
κ
Actually…Morelikethat
κLog
LogicalGmeiscumulaGvemaximumofobservedeventGmestamps
κ
Inreality…
κLog
SessionWindow
Inreality…
κLog
SessionWindow
Log
Log
Log
Log
Log
C1 C2 C3 C4 C5
RogueBroker
κ
RogueBroker
κLog
LOSTDATA
LimitlogicalGmeprogressacrossconsumers
κ
LogicalGmeprogresscontrol
κ
TX N
TX N TX
N
TX N
TX N
+ 1
TX N
+ 1 TX
N +
1
TX N
Progress limit
TX N
TX N TX
N
TX N
TX N
+ 1
TX N
+ 1
TX N
+ 1
TX N
Progress limit
LogicalGmeprogresscontrol
κ
GlobalsynchronizaGon
Assumedatadistributedevenly
THANKYOU!@[email protected]
LogicalTime(watermark)
κLog