λ architecture applied at Exponea.

Post on 07-Jan-2017

319 views 5 download

Transcript of λ architecture applied at Exponea.

Analy&csforthefastestgrowingcompanies

λarchitectureappliedatExponea

Mar&nStrýček,10.3.2016

Choose your stack

Intuitive approach

•  Collect data to (no)SQL DB

•  Running live queries against (no)SQL database •  UI will be like SQL generator •  But may/will eventually result in slow queries

•  Batch preprocessing of data •  Continues change of report definitions •  Delays / Over night results / no more night

Conversion Funnel

Valid solutions - SQL

FROMeventse1LEFTJOINeventse2ONe1.customer_id=e2.customer_idANDe2.type='view_item'ANDe1.?mestamp<e2.?mestampLEFTJOINeventse3ONe2.customer_id=e3.customer_idANDe3.type='add_to_cart'ANDe2.?mestamp<e3.?mestamp

Valid solutions - NoSQL

varmap=func?on(){varsteps=['view_item','add_to_cart','buy’];varcounts=[0,0,0,0,0,0,0,0,0,0];vari=0;for(varjinthis.value.events){varevent=this.value.events[j];if(event['type']==steps[i]){counts[i]++;i++;if(i===steps.length)break;}}if(i>0)emit('funnel',{'counts':counts});};

Valid solutions - NoSQL

varreduce=func?on(key,values){varcounts=[0,0,0,0,0,0,0,0,0,0];for(variinvalues){for(varjinvalues[i].counts){counts[j]+=values[i].counts[j];}}return{'counts':counts};};db.embeded_customers.mapReduce(map,reduce,{out:'customers_matched_funnel_1'}).find();

Alternative solutions - custom in memory database

IMF – Customer data structure

IMF – Basic structure project1customer1

event1?mestampproper?es

property1,value1...

event2...proper?es

property1,value1...

IMF

•  Sharding •  Customer Id as sharding key

•  Replication •  IMF –master knows how many shards and

replicas are connected •  Loading •  From a stream of data

Apparchitecture

λarchitecture

λarchitecture

We have speed we need volume

•  Fast layer is solved •  Big data requirements •  Loading old data into fast layer •  0 data expiration •  Access to data from BI tools •  Custom queries

λarchitecture

Map-R

•  Map-R filesystem •  Direct access to files that are stored within cluster •  Faster than HDFS

•  Map-R distribution •  No dependency hell

λarchitecture

Datacollec?onAPI:Real?mevsAsync

•  Realtime •  Customer segments •  Website customization •  Recommendations •  personalization

•  Async •  Do not lose data •  Event driven campaigns

Data Collection

Real?me–webcustomiza?on

Eventriggercampaign

λarchitectureatExponea

Takeaways

•  Lambda solves two contradictory challenges •  Process data fast •  Process very big data

•  Apache Spark is good choice for both speed & batch

layer, anyway our IMF is way faster :-)

Thank you.