Deconstructing Lambda Architectures

18
Deconstructing Lambda Architectures Felix Crisan, Co-Founder & CTO, NETOPIA

Transcript of Deconstructing Lambda Architectures

Deconstructing Lambda ArchitecturesFelix Crisan, Co-Founder & CTO, NETOPIA

Then...

...and now

Meaning of NoSQL

1970 = We have no SQL

1980 = Know SQL

2000 = No SQL!

2005 = Not only SQL

2015 = No, SQL(slide adapted from @markmadsen)

λ - Architecture - Why?

● Robust & Fault Tolerant system● Multiple workloads● Ad-hoc queries● Scalable (out rather than up)● Extensible

λ - Architecture - So?

● In a nutshell - storage is cheap● Data is generated (retrieved,

stored) in time increments● Data tends to be immutable

(especially events)

Relational Algebra

queryresult = f(all data)

f is a composite of one or more σ(select), Π(projection), U(reunion), X(product), -

(difference - setwise)

CAP - Pick Any Two

Consistency Availability

Partition Tolerance

No system can be in this region

You can find systems here

λ-Arch - Who/When?

● Nathan Marz● Oct 2011● “How To Beat the CAP

theorem” http://nathanmarz.

com/blog/how-to-beat-the-cap-theorem.html

λ-Architecture - What?

(Big)Data Source

Batch Layer

Speed Layer

Serving Layer

λ-Architecture

Layer Holds Characteristics

Batch Deep Global Truth High Throughput/High Latency

Speed Relevant Local Truth Medium Throughput/Low Latency

Serving Data for Rapid Retrieval Low Throughput/Low Latency

Batch Layer

● de-normalized data inputs/master dataset

● append-only● scalable● idempotent calculations● AP

Usual suspects

● HDFS/Hadoop● MapReduce (MRv1), YARN (MRv2)● Spark/SparkSQL● Hive● Pig● ...and others

Speed/Realtime Layer

● Realtime is actually Near Real Time● Compensates latency in Batch Layer● Continuous computation/Limited

window

Usual Suspects

● Storm (http://storm.apache.org/)

● Spark Streaming (http://spark.apache.org/)

● Samza (http://samza.apache.org/)

● S4 (http://incubator.apache.org/s4/)

● MQ (0MQ,RabbitMQ/AMQP etc)

Serving Layer

● Indexes and exposes views into data● Sometimes split into Speed Serving

and Batch Serving● Supports ad-hoc queries

(Non)Usual suspects

● Drill (http://drill.apache.org/)

● ElephantDB (https://github.com/nathanmarz/elephantdb)

● Voldemort (http://www.project-voldemort.

com/voldemort/)

● ElasticSearch/Solr/Lucene (https://www.

elastic.co/)

● Cassandra sstables (http://cassandra.apache.

org/)

● Cloudera Impala (http://www.cloudera.com/ )

Takeaways

● Immutability● (P)Re-computation● Human fault-tolerance● No One-Size-Fits-All● YMMV● ψ-Architectures?