GeeCON Microservices 2015 scaling micro services at gilt

of 47/47
scaling μ-services at Gilt [email protected] Sopot, Poland 11th September 2015 Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman @gilttech
  • date post

    22-Jan-2017
  • Category

    Technology

  • view

    858
  • download

    1

Embed Size (px)

Transcript of GeeCON Microservices 2015 scaling micro services at gilt

  • scaling -services at Gilt [email protected]

    Sopot, Poland11th September 2015

    Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman

    @gilttech

  • why was I late today?

    and

    were micro-services to blame?

  • svc-localised-string

    mongodb

    login-reg mosaic product listingproduct search

    product search

    A localisation file was loadedwith an character encoding

    The driver spun on CPU, consuming CPU credits

    The service starved and fell over.

    Core parts of the site were broken

  • so

    how did I really feel about micro-services yesterday?

  • gilt: luxury designer brands at discounted prices

  • we shoot the product in our studios

  • we receive, store, pick, pack and ship...

  • we sell every day at noon

  • stampede...

  • this is what the stampede really looks like...

  • rails to riches: 2007 - ruby-on-rails monolith

  • 2011: java, loosely-typed, monolithic services

    Hidden linkages; buried business logic

    Monolithic Java App; huge bottleneck for innovation.

    lots of duplicated code :(

    teams focused on business lines

    Large loosely-typed JSON/HTTP services

  • enter: -services

    How can we arrange our teams around strategic initiatives? How can we make it fast and easy to get to change to production?

  • 2015: micro-services

  • driving forces behind gilts emergent architecture

    team autonomy voluntary adoption (tools, techniques,

    processes) kpi or goal-driven initiatives failing fast and openly open and honest, even when its difficult

  • service growth over time: point of inflexion === scala.

  • what are all these services doing?

  • anatomy of a gilt service

  • anatomy of a gilt service - typical choices

    gilt-service-framework,

    log4j, cloudwatch Cave,

    , java, javascript

    or

  • lines of code per service

  • # source files per service

  • service discovery: straight forward

    zookeeper

    Brocade Traffic Manager (aka Zeus, Stringray, SteelApp,...)

  • from bare-metal...

    PHXIAD

  • to vapour.

  • single tenant deployment: one AMI per service instance

  • reproducible, immutable deployments: docker

  • service discovery: new services use ELB

    zookeeper

    Amazon ELB

  • # running AMIs per service

  • liftnshift + elastic teams

    Existing Data Centre

    dual 10Gb direct connect line, 2ms latency

  • AWS instance sizing

  • evolution of architecture and tech organisation

  • Lessen dependencies between teams: faster code-to-prod

    Lots of initiatives in parallel

    Your favourite here

    We (heart) -servicesGraceful degradation of service

    Disposable Code: easy to innovate, easy to fail and move on.

  • We (heart) cloudDo devops in a meaningful way.Low barrier of entry for new tech (dynamoDB, Kinesis, ...)Isolation

    Cost visibilitySecurity tools (IAM)Well documentedResilience is easyHybrid is easyPerformance is great

  • seven -service challenges (& some solutions) no one ever said this was gonna be easy

  • 1. staging vs test-in-prodWe find it hard to maintain staging environments across multiple teams with lots of services.

    We think TiP is the way to go: invest in automation, use dark canaries in prod.

    However, some teams have found TiP counter-productive, and use minimal staging environments.

  • 2. ownershipWho owns that service? What happens if that person decides to work on something else?

    We have chosen for teams and departments to own and maintain their services. No throwing this stuff over the fence.

  • 1. Software is owned by departments, tracked in genome project. Directors assign services to teams.

    2. Teams are responsible for building & running their services; directors are accountable for their overall estate.

    bottom-up ownership, RACI-style

  • ownership donut informs tech strategy

    3. Ownership is classified: active, passive, at-risk.

    done === 0% at risk

  • 3. deploymentServices need somewhere to live. Weve open-sourced tooling over docker and AWS to give:

    elasticity + fast provisioning + service isolation+ fast rollback

    + repeatable, immutable deployment.

    https://github.com/gilt/ionroller

  • 4. lightweight APIsWeve settled on REST-style APIs, using http://apidoc.me. Separate interface from implementation; an AVRO for REST (Mike Bryzek, Gilt Founder)

    We strongly recommend zero-dependency strongly-typed clients.

    http://apidoc.mehttp://apidoc.me

  • 5. audit + alertingHow do we stay compliant while giving engineers full autonomy in prod?

    Really smart alerting: http://cavellc.github.io

    orders[shipTo: US].count.5m == 0

    http://cavellc.github.io

  • 6. io explosionEach service call begets more service calls; some of which are redundant...=> unintended complexity and performance

    Looking to lambda architecture for critical-path APIs: precompute, real-time updates, O(1) lookup

  • 7. reportingMany services => many databases => data is centralized.

    Solution: real-time event queues to a data-lake.

  • so

    how did I really feel about yesterdays outage?

    great.

  • svc-localised-string

    mongodb

    login-reg mosaic product listingproduct search

    product search

    A localisation file was loadedwith an character encoding

    The driver spun on CPU, consuming CPU credits

    The service was small: it was re-writtenin about an hour, deployed and fixed the site.

    We knew exactly where the problem was.

    We focussed and rapidly deployed tentative incremental fixes.

    Once we fixed that problem, all of our problems were fixed.

    Try that in a monolith :)

  • scaling -services at Gilt [email protected]

    Sopot, Poland11th September 2015

    Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman

    @gilttech