Deconstructing Lambda Architectures
-
Upload
fixone -
Category
Technology
-
view
97 -
download
0
Transcript of Deconstructing Lambda Architectures
Meaning of NoSQL
1970 = We have no SQL
1980 = Know SQL
2000 = No SQL!
2005 = Not only SQL
2015 = No, SQL(slide adapted from @markmadsen)
λ - Architecture - Why?
● Robust & Fault Tolerant system● Multiple workloads● Ad-hoc queries● Scalable (out rather than up)● Extensible
λ - Architecture - So?
● In a nutshell - storage is cheap● Data is generated (retrieved,
stored) in time increments● Data tends to be immutable
(especially events)
Relational Algebra
queryresult = f(all data)
f is a composite of one or more σ(select), Π(projection), U(reunion), X(product), -
(difference - setwise)
CAP - Pick Any Two
Consistency Availability
Partition Tolerance
No system can be in this region
You can find systems here
λ-Arch - Who/When?
● Nathan Marz● Oct 2011● “How To Beat the CAP
theorem” http://nathanmarz.
com/blog/how-to-beat-the-cap-theorem.html
λ-Architecture
Layer Holds Characteristics
Batch Deep Global Truth High Throughput/High Latency
Speed Relevant Local Truth Medium Throughput/Low Latency
Serving Data for Rapid Retrieval Low Throughput/Low Latency
Batch Layer
● de-normalized data inputs/master dataset
● append-only● scalable● idempotent calculations● AP
Usual suspects
● HDFS/Hadoop● MapReduce (MRv1), YARN (MRv2)● Spark/SparkSQL● Hive● Pig● ...and others
Speed/Realtime Layer
● Realtime is actually Near Real Time● Compensates latency in Batch Layer● Continuous computation/Limited
window
Usual Suspects
● Storm (http://storm.apache.org/)
● Spark Streaming (http://spark.apache.org/)
● Samza (http://samza.apache.org/)
● S4 (http://incubator.apache.org/s4/)
● MQ (0MQ,RabbitMQ/AMQP etc)
Serving Layer
● Indexes and exposes views into data● Sometimes split into Speed Serving
and Batch Serving● Supports ad-hoc queries
(Non)Usual suspects
● Drill (http://drill.apache.org/)
● ElephantDB (https://github.com/nathanmarz/elephantdb)
● Voldemort (http://www.project-voldemort.
com/voldemort/)
● ElasticSearch/Solr/Lucene (https://www.
elastic.co/)
● Cassandra sstables (http://cassandra.apache.
org/)
● Cloudera Impala (http://www.cloudera.com/ )