2016-06-28

ContainerDays

I skipped the workshop day and just joined the second day of ContainerDays 2016. My notes from the sessions I attended:
  • From Borg To Kubernetes: The History And Future Of Container Orchestration by Mandy Waite (Google): Mandy explained Google's demand for scheduling different workloads (monitoring, production tasks, batch runs) with resource isolation concerning CPU, memory and I/O. Many learnings and experience from internal container management systems has been baked into Kubernetes (k8s). Some core features of k8s were presented (e.g. deployments update a service, which is made of pods, that in turn group closely related containers) concluding with an outlook of some upcoming features (e.g. improved monitoring).
  • Shaping Applications for Docker, CoreOS, Kubernetes and Co by Thomas Fricke (Endocode AG) (here's another version of the slides from a different conference and a video of his talk at that conference): Key takeaway: When designing your application, separate "cattle" (i.e. systems you don't explicitly care about and that can be scaled massively and replaced easily) and "pets" (i.e. systems you care about), separate stateless layers (this is your "cattle") and persistence layers (these are your "pets"). Thomas also introduced the key differences between virtualization and containers and showed the existing ecosystem in the container world plus a case study with the lessons learned.
  • Linuxkernel features building my $CONTAINER by Erkan Yanar: Erkan showed some details of the base technologies for containers that are available for years now (e.g. chrootcapabilitiescgroups, namespaces).
  • The NoOps Movement by Marco Hutzsch (Otto): To me, what Marco named NoOps is actually DevOps done right or pushed further. It's about automation, infrastructure as code, full-stack responsibility incl. developers being on call, serverless architecture etc. He claimed to get rid of dedicated operations teams, but instead to build delivery teams with engineers being able to develop software and being interested in running what they've built. Reminds me of what we had at my former employer, and I can confirm that you can really gain speed with this approach.
  • Rancher Docker - From zero to hero by Michael Vogeler (Nexinto GmbH): Actually, I've recently played around a bit with Rancher in the AWS cloud. It offers nice features like resource management, cross-host networking, service discovery, monitoring and support for several scheduling engines like Kubernetes or Swarm. And it's set up quite easily - so, worth a second look ...
  • Plan B: Service to Service Authentication with Oauth by Henning Jacobs (Zalando): Each team has its own virtual data center (i.e. separate AWS cloud accounts) with its own domain. If one service wants to talk to another service, it has to contact the ELB bound to the domain of the other service. For the authorization on that communication they've chosen OAuth 2.0 with credentials stored on S3 (protected by IAM policies) to acquire auth tokens. The tokens are JSON Web Tokens which are self-contained for validation. The public keys are stored in a OAuth provider service, token revocations are handled by another service. All components are open source, docs are here.
  • A lightning talk by Florian Leibert (Mesosphere) providing an overview of the history and the architecture of Mesos and DC/OS including a live demo.
  • Efficient monitoring in modern environments by Tobias Schmidt (Soundcloud): Soundcloud created Prometheus. You should collect metrics on each and every layer of your systems (hosts, containers, applications) - detailed per component and not just aggregated. Aggregations are handy for alerting and dashboards. Four golden signals to monitor: Latency, traffic (e.g. requests per second), errors and saturation/utilization. Don't rely on medians, use percentiles instead (90%, 95%, 99%, 99.9% etc.). At Soundcloud every developer is on call. Use symptom based alerting instead of cause based, i.e. monitoring business processes instead of isolated technical components adds value - the four golden signals mentioned before can provide a good guidance. Only alert when human interaction is required - this reduces noise and prevents alert fatigue. Use alert grouping and service silencing e.g. for maintenance, avoid static thresholds in favor of relative thresholds e.g. based on historic data, use a ticketing system and treat warnings like new features. Provide concise playbooks/runbooks to enable fast problem mitigation. Practice outages! Recommended read: My Philosophy on Alerting.
  • A web shop in containers - Building the microservice platform for otto.de by Florian Sellmayr (ThoughtWorks) & Felix Bechstein (Otto): Florian and Felix presented an overview of the architecture behind otto.de using microservices running on Docker containers controlled by Marathon. With this architecture they gained freedom and responsibility for the developers, the ability to deploy new services without interaction with operations, increased standardization, better scalability and faster delivery. Problems they faced: Poor adoption of Mesos in various frameworks, service discovery, poor support for multi-tenancy and authentication, resource and container isolation. To mitigate some of these problems they came up with separated clusters and refactored them a lot over time (automation helped here a lot). Secrets are managed by Vault.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.