λ architecture applied at Exponea.

27
Analy&cs for the fastest growing companies

Transcript of λ architecture applied at Exponea.

Page 1: λ architecture applied at Exponea.

Analy&csforthefastestgrowingcompanies

Page 2: λ architecture applied at Exponea.

λarchitectureappliedatExponea

Mar&nStrýček,10.3.2016

Page 3: λ architecture applied at Exponea.
Page 4: λ architecture applied at Exponea.

Choose your stack

Page 5: λ architecture applied at Exponea.

Intuitive approach

•  Collect data to (no)SQL DB

•  Running live queries against (no)SQL database •  UI will be like SQL generator •  But may/will eventually result in slow queries

•  Batch preprocessing of data •  Continues change of report definitions •  Delays / Over night results / no more night

Page 6: λ architecture applied at Exponea.

Conversion Funnel

Page 7: λ architecture applied at Exponea.

Valid solutions - SQL

FROMeventse1LEFTJOINeventse2ONe1.customer_id=e2.customer_idANDe2.type='view_item'ANDe1.?mestamp<e2.?mestampLEFTJOINeventse3ONe2.customer_id=e3.customer_idANDe3.type='add_to_cart'ANDe2.?mestamp<e3.?mestamp

Page 8: λ architecture applied at Exponea.

Valid solutions - NoSQL

varmap=func?on(){varsteps=['view_item','add_to_cart','buy’];varcounts=[0,0,0,0,0,0,0,0,0,0];vari=0;for(varjinthis.value.events){varevent=this.value.events[j];if(event['type']==steps[i]){counts[i]++;i++;if(i===steps.length)break;}}if(i>0)emit('funnel',{'counts':counts});};

Page 9: λ architecture applied at Exponea.

Valid solutions - NoSQL

varreduce=func?on(key,values){varcounts=[0,0,0,0,0,0,0,0,0,0];for(variinvalues){for(varjinvalues[i].counts){counts[j]+=values[i].counts[j];}}return{'counts':counts};};db.embeded_customers.mapReduce(map,reduce,{out:'customers_matched_funnel_1'}).find();

Page 10: λ architecture applied at Exponea.

Alternative solutions - custom in memory database

Page 11: λ architecture applied at Exponea.

IMF – Customer data structure

Page 12: λ architecture applied at Exponea.

IMF – Basic structure project1customer1

event1?mestampproper?es

property1,value1...

event2...proper?es

property1,value1...

Page 13: λ architecture applied at Exponea.

IMF

•  Sharding •  Customer Id as sharding key

•  Replication •  IMF –master knows how many shards and

replicas are connected •  Loading •  From a stream of data

Page 14: λ architecture applied at Exponea.

Apparchitecture

Page 15: λ architecture applied at Exponea.

λarchitecture

Page 16: λ architecture applied at Exponea.

λarchitecture

Page 17: λ architecture applied at Exponea.

We have speed we need volume

•  Fast layer is solved •  Big data requirements •  Loading old data into fast layer •  0 data expiration •  Access to data from BI tools •  Custom queries

Page 18: λ architecture applied at Exponea.

λarchitecture

Page 19: λ architecture applied at Exponea.

Map-R

•  Map-R filesystem •  Direct access to files that are stored within cluster •  Faster than HDFS

•  Map-R distribution •  No dependency hell

Page 20: λ architecture applied at Exponea.

λarchitecture

Page 21: λ architecture applied at Exponea.

Datacollec?onAPI:Real?mevsAsync

•  Realtime •  Customer segments •  Website customization •  Recommendations •  personalization

•  Async •  Do not lose data •  Event driven campaigns

Page 22: λ architecture applied at Exponea.

Data Collection

Page 23: λ architecture applied at Exponea.

Real?me–webcustomiza?on

Page 24: λ architecture applied at Exponea.

Eventriggercampaign

Page 25: λ architecture applied at Exponea.

λarchitectureatExponea

Page 26: λ architecture applied at Exponea.

Takeaways

•  Lambda solves two contradictory challenges •  Process data fast •  Process very big data

•  Apache Spark is good choice for both speed & batch

layer, anyway our IMF is way faster :-)

Page 27: λ architecture applied at Exponea.

Thank you.