Download - JavaOne 2015: Scaling micro services at Gilt

Transcript
Page 1: JavaOne 2015: Scaling micro services at Gilt

scaling μ-services at Gilt [email protected]

San Francisco26th October 2015

Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman

@gilttech

Page 2: JavaOne 2015: Scaling micro services at Gilt

gilt: luxury designer brands at discounted prices

Page 3: JavaOne 2015: Scaling micro services at Gilt

we shoot the product in our studios

Page 4: JavaOne 2015: Scaling micro services at Gilt

we receive, store, pick, pack and ship...

Page 5: JavaOne 2015: Scaling micro services at Gilt

we sell every day at noon...

Page 6: JavaOne 2015: Scaling micro services at Gilt

stampede...

Page 7: JavaOne 2015: Scaling micro services at Gilt

this is what the stampede really looks like...

Page 8: JavaOne 2015: Scaling micro services at Gilt

rails to riches: 2007 - ruby-on-rails monolith

Page 9: JavaOne 2015: Scaling micro services at Gilt

2011: java, loosely-typed, monolithic services

(5) Hidden linkages; buried business logic

(4) Monolithic Java App; huge bottleneck for innovation.

(2) Lots of duplicated code :(

(3) Teams focused on business lines

(1) Large loosely-typed JSON/HTTP services

Page 10: JavaOne 2015: Scaling micro services at Gilt

enter: µ-services

“How can we arrange our teams around strategic initiatives? How can we make it fast and easy to get to change to production?”

Page 11: JavaOne 2015: Scaling micro services at Gilt

2015: micro-services

Page 12: JavaOne 2015: Scaling micro services at Gilt

driving forces behind gilt’s emergent architecture

● team autonomy● voluntary adoption (tools, techniques,

processes)● kpi or goal-driven initiatives● failing fast and openly● open and honest, even when it’s difficult

Page 13: JavaOne 2015: Scaling micro services at Gilt

service growth over time: point of inflexion === scala.

Page 14: JavaOne 2015: Scaling micro services at Gilt

anatomy of a gilt service

Page 15: JavaOne 2015: Scaling micro services at Gilt

anatomy of a gilt service - typical choices

gilt-service-framework,

log4j, cloudwatch Cave,

, , javascript

or

Page 16: JavaOne 2015: Scaling micro services at Gilt

service discovery: straight forward

zookeeper

Brocade Traffic Manager (aka Zeus, Stringray, SteelApp,...)

Page 17: JavaOne 2015: Scaling micro services at Gilt

what are all these services doing?

Page 18: JavaOne 2015: Scaling micro services at Gilt

… we used a “spread sheet”.‘The Gilt Genome Project’

Page 19: JavaOne 2015: Scaling micro services at Gilt

It’s hard to think of architecture in one dimension.

We added ‘Functional Area’, ‘System’ and ‘Subsystem’ columns to Gilt Genome; provides a stronger (although subjective) taxonomy than the previous ‘tags’.

It turns out we have an elegant, emergent architecture.

Some services / components are deceptively simple.

Others are simply deceptive, and require knowledge of their surrounding ‘constellation’

n = 265, where n is the number of services.

Page 20: JavaOne 2015: Scaling micro services at Gilt

Deceptively Simple - many services are small; < 2048 loc

Page 21: JavaOne 2015: Scaling micro services at Gilt

Deceptively Simple - many services are small, < 32 files.

Page 22: JavaOne 2015: Scaling micro services at Gilt

Gilt Admin (Legacy Ruby on Rails Application)

City

Discounts FinancialReporting

Fraud Mgmt

Gift Cards Inventory Mgmt Order Mgmt

Sales Mgmt Product Catalog

Purchase Orders

Targetting

Billing

Other Admin Applications (Scala + Play Framework)*

City Creative (2) CS

Discounts Distribution i18n Inventory (2)

Order Processing

(2)Util

Service Constellations (Scala, Java)*

Auth (1) Billing (1) City (6) Creative (4) CS (2) Discounts (1) Distribution (9) i18n (3) inventory (6)

Order Processing

(8)Payments (3) Product

Catalog (5) Referrals (1) Util (2)

Core Database - ‘db3’

Job System (Java, Ruby)

Gilt Logical Architecture - Back Office Systems

* counts denote number of service / app components.

Simply deceptive: service context only make sense in constellation.

Page 23: JavaOne 2015: Scaling micro services at Gilt

from bare-metal...

PHXIAD

Page 24: JavaOne 2015: Scaling micro services at Gilt

… to vapour.

Page 25: JavaOne 2015: Scaling micro services at Gilt

Lift-and-shift + elastic teams

Existing Data Centre

Dual 10Gb direct connect line, 2ms latency.

‘Legacy VPC’

MobileCommon Person-alisation Admin Data

(1) Deploy to VPC

(2) ‘Department’ accounts for elasticity & devops

Page 26: JavaOne 2015: Scaling micro services at Gilt

single tenant: one EC2 instance per service instance

Page 27: JavaOne 2015: Scaling micro services at Gilt

reproducible, immutable deployments: docker

Page 28: JavaOne 2015: Scaling micro services at Gilt

service discovery: same pattern, different LB

zookeeper

Amazon ELB

Page 29: JavaOne 2015: Scaling micro services at Gilt

# running instances per service: ‘rule of three’

Page 30: JavaOne 2015: Scaling micro services at Gilt

AWS instance sizing

Page 31: JavaOne 2015: Scaling micro services at Gilt

evolution of architecture and tech organisation

Page 32: JavaOne 2015: Scaling micro services at Gilt

Lessen dependencies between teams: faster code-to-prod

Lots of initiatives in parallel

Your favourite <tech/language/framework> here

We (heart) μ-servicesGraceful degradation of service

Disposable Code: easy to innovate, easy to fail and move on.

Page 33: JavaOne 2015: Scaling micro services at Gilt

We (heart) cloudDo devops in a meaningful way.Low barrier of entry for new tech (dynamoDB, Kinesis, ...)Isolation

Cost visibilitySecurity tools (IAM)Well documentedResilience is easyHybrid is easyPerformance is great

Page 34: JavaOne 2015: Scaling micro services at Gilt

seven μ-service challenges (& some solutions) no one ever said this was gonna be easy

Page 35: JavaOne 2015: Scaling micro services at Gilt

1. staging vs test-in-prodWe find it hard to maintain staging environments across multiple teams with lots of services.

● We think TiP is the way to go: invest in automation, use dark canaries in prod.

● However, some teams have found TiP counter-productive, and use minimal staging environments.

Page 36: JavaOne 2015: Scaling micro services at Gilt

2. ownershipWho ‘owns’ that service? What happens if that person decides to work on something else?

We have chosen for teams and departments to own and maintain their services. No throwing this stuff over the fence.

Page 37: JavaOne 2015: Scaling micro services at Gilt

1. Software is owned by departments, tracked in ‘genome project’. Directors assign services to teams.

2. Teams are responsible for building & running their services; directors are accountable for their overall estate.

bottom-up ownership, RACI-style

Page 38: JavaOne 2015: Scaling micro services at Gilt

‘ownership donut’ informs tech strategy

3. Ownership is classified: active, passive, at-risk.

‘done’ === 0% ‘at risk’

Page 39: JavaOne 2015: Scaling micro services at Gilt

3. deploymentServices need somewhere to live. We’ve open-sourced tooling over docker and AWS to give:

elasticity + fast provisioning + service isolation+ fast rollback

+ repeatable, immutable deployment.

https://github.com/gilt/ionroller

Page 40: JavaOne 2015: Scaling micro services at Gilt

4. lightweight APIsWe’ve settled on REST-style APIs, using http://apidoc.me. Separate interface from implementation; ‘an AVRO for REST” (Mike Bryzek, Gilt Founder)

We strongly recommend zero-dependency strongly-typed clients.

Page 41: JavaOne 2015: Scaling micro services at Gilt

5. audit + alertingHow do we stay compliant while giving engineers full autonomy in prod?

Really smart alerting: http://cavellc.github.io

orders[shipTo: US].count.5m == 0

Page 42: JavaOne 2015: Scaling micro services at Gilt

6. io explosionEach service call begets more service calls; some of which are redundant...=> unintended complexity and performance

Looking to lambda architecture for critical-path APIs: precompute, real-time updates, O(1) lookup

Page 43: JavaOne 2015: Scaling micro services at Gilt

7. reportingMany services => many databases => data is centralized.

Solution: real-time event queues to a data-lake.

Page 44: JavaOne 2015: Scaling micro services at Gilt

scaling μ-services at Gilt [email protected]

San Francisco26th October 2015

Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman

@gilttech