1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric...

37
1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi Corporation Παρουσίαση: Ηλίας Τσιγαρίδας (Μ484)
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    3

Transcript of 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric...

Page 1: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

1

Lessons from Giant-Scale Services

IEEE Internet Computing,  Vol. 5, No. 4., July/August 2001

Eric A. BrewerUniversity of California, Berkeley,

and Iktomi Corporation

Παρουσίαση: Ηλίας Τσιγαρίδας (Μ484)

Page 2: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

2

Examples of Giant-scale services

Aol Microsoft network Yahoo eBay CNN Instant messaging Napster Many more…

The demand

They must be always available, despite their scale, growth rate, rapid evolution of content and features, etc

Page 3: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

3

Article Characteristics

Characteristics “Experience”

article No literature

points Principles

approaches Not quantitative

evaluation

The reasons Focusing on high

level design New area Proprietary nature

of the information

Page 4: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

4

Article scope

Look at the Basic Model of the giant-scale services

Focusing the challenges of High availability Evolution Growth Principles for the above

Simplify the design of large systems

Page 5: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

5

Basic Model (general)

The “infrastructure services” Internet-based systems that provide

instant messaging, wireless services and so on

Page 6: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

6

Basic Model (general)

We discuss Single-site Single-owner Well-connected

cluster Perhaps a part of

a larger service

We do not discuss Wide are issues

Network partitioning Low or discontinuous

bandwidth Multiple admistrative

domains Service monitoring Network QoS Security Log and logging

analysis DBMS

Page 7: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

7

Basic Model (general)

We focus on High availability Replication Degradation Disaster tolerance Online evolution

The scope is bridging the gap between

the basic building block of giant-scale services

and the real world scalability and availability they require

Page 8: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

8

Basic Model (Advantages)

Access anywhere, anytime Availability via multiple devices Groupware support Lower overall cost Simplified service updates

Page 9: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

9

Basic Model (Advantages)

Access anywhere, anytime The infrastructure is ubiquitous You can access the service from

home, work airport and so on

Page 10: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

10

Basic Model (Advantages)

Availability via multiple devices The infrastructure handles the

processing (the most at least) User access the services via set-top

boxes, networks computer, smart phones and so on

In that way we have offer more functionality for a given cost and battery life

Page 11: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

11

Basic Model (Advantages)

Groupware support Centralizing data from many users

allowing group-ware application like Calendar Teleconferencing systems, and so on

Page 12: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

12

Basic Model (Advantages)

Lower overall cost Hard to measure overall cost but

Infrastructure services have an advantage over designs based on stand alone devices

High utilization Centralize administration reduce the cost,

but harder to quantify

Page 13: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

13

Basic Model (Advantages)

Simplified service updates Updates without physical distribution The most powerful long term

advantage

Page 14: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

14

Basic Model (Components)

Page 15: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

15

Basic Model (Assumptions)

The service provider has limited control over the clients an the IP network

Queries drive the service Read only queries outnumber

greatly update queries Giant-scale services use CLUSTERS

Page 16: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

16

Basic Model (Components)

Clients, such as Web browsers. Initiate the queries to the services

IP network, public Internet or a private network. Provides access to the service.

Load manager, provides indirection between the service’s external name and the servers’ physical names (IP addresses). Load balancing. Proxies or firewalls before the load manager. 

Servers. Combining CPU, memory, and disks into an easy-to-replicate unit.

Persistent data store, replicated or partitioned database spread across the servers. Optional external DBMSs or RAID storage. 

Backplane. Optional. Handles inter-server traffic.

Page 17: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

17

Basic Model (Load Management)

Round Robin DNS “Layer-4” switches

understand TCP and port numbers

“Layer-7” switches parses URL

Custom “front-end” nodes They act like service specific “layer-7” routers

Include the clients in the load balancing Ex alternative DNS or Name Server

Page 18: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

18

Basic Model (Load Management)

Two opposite approaches Simple Web Farm Search engine cluster

Page 19: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

19

Basic Model (Load Management)Simple Web Farm

Page 20: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

20

Basic Model (Load Management)Search engine cluster

Page 21: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

21

High Availability (general)

Like telephone, rail or water systems Features

Extreme symmetry No people Few cables No external disks No monitors

Inkotomi in addition Manages the cluster offline Limit temperature and power variations

Page 22: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

22

High Availability (metrics)

MTBF MTTRuptime

MTBF

queries completedyield

queries offered

data availableharvest

complete data

Page 23: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

23

High Availability (DQ principle)

The systems overall capacity has a particular physical bottleneck Ex. Total I/O bandwidth, total seeks per second

Total amount of data to be moved per second

Measurable and tunable Ex. adding nodes, software optimization OR

faults

Data per query×Queries per second constant

Page 24: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

24

High Availability (DQ principle)

Focus on the relative DQ value, not on the absolute

Define the DQ value of your system

Normally DQ values scales linearly with the number of the nodes

Page 25: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

25

High Availability (DQ principle)

Analyzing the faults impact Focus on how DQ reduction

influence the three metrics Only for data-intensive sites

Page 26: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

26

High AvailabilityReplication vs. Partitioning

Replication 100% harvest 50% yield DQ -= 50%

Maintain D Reduce Q

Partitioning 50% Harvest 100% yield DQ -= 50%

Reduce D Maintain Q

Example: 2-node cluster. One down

Page 27: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

27

High AvailabilityReplication

Page 28: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

28

High AvailabilityReplication vs. Partitioning

Replication wins if the bandwidth is the same.

Extra cost is on the bandwidth not on the disks

Easy recovering We might also use partial

replication and randomization

Page 29: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

29

High AvailabilityGraceful degradation We can not avoid saturation, because

Peak-to-average ratio 1.6:1 to 6:1. Expensive to build capacity above the (normal) peak

Single events burst (ex. Online ticket sales for special events)

Faults like power failures or natural disaster affect substantially the overall DQ and the remaining nodes become saturated.

So, we MUST have mechanisms for degradation

Page 30: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

30

High AvailabilityGraceful degradation

The DQ principle give us the options for Limit Q (capacity) to maintain D Reduce D and increase Q Focus on harvest by Admission Control (AC)

Reduce Q Reduce D on dynamic databases Both

Cut the effective database to half (new approach)

Page 31: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

31

High AvailabilityGraceful degradation

More sophisticated techniques Cost based AC

Estimate query cost Reduce the data per query Augment Q

Priority (or value) based AC Drop low-valued queries Ex execute stock trade within 60s or the user pays

no commission Reduced data freshness

Reduce the freshness so reduce the work per query Increase yield at the expense of harvest

Page 32: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

32

High AvailabilityDisaster Tolerance

Combination of managing replicas and graceful degradation

How many locations? How many replicas on each

location? Load management

“Layer-4” switch do not help with the loss of a whole cluster

Smart clients is the solution

Page 33: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

33

Online Evolution & Growth

We must plan for continuous growth and frequent functionality updates

Maintenance and upgrades are controlled failures

Total loss of DQ value is ΔDQ = n · u · average DQ/node = DQ · u Where n is the number of nodes and u the

total amount per time a node requires for online evolution

Page 34: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

34

Online Evolution & GrowthThree approaches

An example for a 4-node cluster

Page 35: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

35

ConclusionsThe basic lessons learned

Get the basics right Professional data center, layer-7 switch,

symmetry Decide on your availability metrics

Everyone must agree on the goals Harvest and yield > uptime

Focus on MTTR at least as much as MTBF MTTR is easier and has the same impact

Understand load redirection during faults Data replication is insufficient, you need

excess DQ

Page 36: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

36

ConclusionsThe basic lessons learned

Graceful degradation is a critical part Intelligent admission control and dynamic

database reduction Use DQ analysis on all upgrades

Capacity planning Automate upgrades as much as

possible Have a fast simple way to return to older

version

Page 37: 1 Lessons from Giant-Scale Services IEEE Internet Computing, Vol. 5, No. 4., July/August 2001 Eric A. Brewer University of California, Berkeley, and Iktomi.

37

Final Statement

Smart clients could simplify all of the above