CloudNative London 2017
- What is This Cloud Native Thing Anyway?
- Eight Principles for Cloud Native Storage
- Lunar Ways journey towards Cloud Native Utopia
- Cloud Native in the US Federal Government
- Keynote by Adrian Cockcroft
- Five Reasons to use Kafka In the Cloud
- A Microservices Journey at JPMorgan Chase
- RPC
- Cloud Native Apps with GitOps
- Meet Ups
What is This Cloud Native Thing Anyway?
Presenter: Sam Newman
- The Twelve Factor App: Cloud Friendly rather than Cloud Native
- Open Data Centre Alliance Maturity Model: The issue is that a maturity model is linear, and implies that “more maturity is always better”.
- No celebrity Cloud-based company (e.g. Netflix) is IaaS vendor agnostic. They are committed to a single Cloud vendor.
- Lock-In vs Migration Cost: Lock-in by itself is not a problem. Migration cost is the metric to focus on.
- Cloud brokerage systems are a bad idea. Quantify the Migration cost instead
CNCF Cloud Native Definition
- Container Packaged
- Dynamically Managed
- Microservices orientated
Cloud Approaches
- Lift and Shift: drop legacy application on the cloud. E.g. Dropping VMware Image on Amazon
- Cloud Ready: 12 factor apps
- Cloud Native: Embracing cloud-only services
Cloud Native Definition
“An application built to take full advantage of an underlying cloud platform”
Cloud Native App Characteristics
- Build to scale
- Fault-tolerant
- Maybe: decomposed into services
- Pushes as much work to the platform as possible
- Automatable
Abstractions
- IaaS
- CoAAS: Containers as a Service
- PaaS: Low maturity. Serverless best example
Eight Principles for Cloud Native Storage
Presenter: Cheryl Hung
Cases
- Binaries
- Data
- Config
- Backup
Eight Principles of Cloud Native Storage
- API Driven
- Declarative and Composable
- Application centric
- Agile (elastic capacity)
- Performant
- Natively Secure
- Consistenly Available
- Platform Agnostic
Storage Types
- Block storage
- File storage
- Object storage
Storage landscape
- NFS
- Storage array: deterministic performance
- Ceph
- EBS: between 45 secs to 1hr to remount
- REX-Ray: Integrates w/ Swarm, Kubernetes
- StorageOS
Misc
- Katakoda - Interactive platform to learn about Kubernetes and CLI-based tools
Lunar Ways journey towards Cloud Native Utopia
Presenter: Kasper Nissen
Banking Application for saving towards goals integrated with partner banks
Key Points
- Minikube for local development
- KOPS for maintaining the cluster in AWS
- Realm for asynchronous synchronisation of data
- Isitio: service mesh
- Helm: Kubernetes quick start
- Moving from RabbitMQ to Kafka
- A Prometheus ecosystem involves Pushgateway, Grafana and Altermanager
Citations
“Containerization transforms the data center from being machine-oriented to being application-oriented” Burns et al,. Borg, Omega, and Kubernetes 2016
Cloud Native in the US Federal Government
Presenter: Jez Humble
Key points
- 4006 pages worth of regulation before pushing an application to production
- FedRAMP certification for Federal Government IT providers
- Specific region of AWS that is FedRAMP certified
- Cloud.gov - a FedRAMP authorised environment for Federal projects
- 269 / 325 security controls handled by cloud.gov (a subset of 269 is supported by AWS)
- “A government-compliant Heroku-like system” based on Cloud Foundry
Goals
- Teams can deploy into a production-like environment from day 1
- Architectural paradigm designed for distributed systems
- Push-button deployments
- Most of the controls taken care of at the platform level
- templates for all your compliance documentation
Principles for Building a PaaS
- Everything must be self-service
- Design your platform for multi-tenancy: IaaS like EC2 is poor at this goal
- Use Native Cloud Primitives
- Everything must be reproducible from version control
- Take care of compliance at the platform layer
Quotes
“DevOps - All things I did not need to know about when I was using Heroku”
“Speed is the new security”
“You don’t want to build logging and monitoring for every single account”
“If you have to raise a ticket to get access to a capability you don’t have a cloud”
“Don’t install custom software like Cisco appliances in a Cloud environment - Always use the Cloud Native backing services and capabilities like RDS in AWS”
“The entire state of the platform must be able to be reconstructed from version control”
Citations
“Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models.”
Keynote by Adrian Cockcroft
Presenter: Adrian Cockcroft
Key Points
- Autoscaling is ideal for predictable heavy workloads
- Serverless is ideal for spiky workloads with idle periods
- Today’s deployment pipeline: Developer -> Build System -> Canary Test -> Blue Version -> Green Version
Quotes
“You get capacity in a data centre by forklifting a rack into it”
“In the cloud you pay a month later for the seconds used rather than up-front for three years worth of depreciation”
“Pay for what yo used last month, not what you guess you will need next year”
“In the data centre you file tickets and wait for every step. Self service, instead is on-demand”
“A data architecture based around a primary and secondary instance is not cloud native”
“CNCF is curating all of the cloud relevant stuff in GitHub”
“Serverless: You can finish building and deploying an application in less time than you’d spend evaluating container runtimes.”
“We are working towards making containers a first class entity in AWS”
“It is not lock-in what people dislike (as in marriage), but unlocking (as in divorce)”.
Cloud Native Principles
- Pas as you go, afterwards
- Self service - no waiting
- Globally distributed by default
- Cross-zone/region availability models
- High utilization - turn idle resources off
- Immutable code
Cloud Practice Evolution
2012 Cloud practice | 2014 Cloud Practice | 2017 Cloud Practice |
---|---|---|
Netflix OSS | Docker | AWS Lambda |
Instances | Containers | Functions and Events |
Java focus -> Spring | Golang -> Kubernetes | Node.js -> Serverless |
Adoption
Town Planners | Settlers | Pioneers |
---|---|---|
Instances | Containers | Serverless |
Risk adverse | Efficient | Fastest Development |
Safe but slow | Faster | Low Cost |
Mature tooling | Evolving tooling | Tooling emerging |
Kubernetes vs AWS ECS
Kubernetes | AWS ECS |
---|---|
Managed by Customers | Managed by AWS |
Single Tenant | Multi-tenant |
Control Plane Overhead | Just EC2 instances by the second |
Networking: CNI | Moving to CNI |
IAM integration fixes needed | IAM Integrated |
Version upgrade management | Does not need version upgrade management |
Kubernetes
- Better developer features and APIs today
- Improving Operational Features
- Improving AWS Integration
ECS
- Better operational features today
- Improving developer APIs converging with CNCF components
- Improving portability
Lock-In vs Unlocking
The process involves choosing, using, and losing:
Choosing
- Investments: Negotiating, learning, experimenting.
- Making a commitment: whenever development is frozen, and the operations teams takes over, the key is turned in the lock
What changed?
Old World | New World |
---|---|
Monolith | Microservice |
PoC Install | Web service / OS |
Enterprise purchase cycle | Free tier /free trial |
Months | Minutes |
$100k-$Millions | $0-$1000s |
Using
- Cost of setup
- Cost of operation
- Capacity planning
- Scenario planning
- Incident management
- Tuning
Returns
- Service capabilities
- Availability, functional
- Scalability, agility
- Efficiency
Old World | New World |
---|---|
Frozen installations | Continuous Deployment |
Ops specialist silo | Dev automation |
Capacity upgrade costs | Elastic cloud resources |
Low utilization | High utilization |
High cost of change | Low cost of change |
Losing
Investments
- Negotiating time
- Contract penalties
- Replacement costs
- Decommissioning effort
- Archiving, sustaining legacy
Returns
- Reduced spending
- More advanced technology
- Better service, agility, scalability
- Choose again
Summary
Old World | New World |
---|---|
Monolithic | Microservices |
Frozen waterfall projects | Agile continuous delivery |
Long terms contracts | Pay as you go |
Local dependencies | Remote web services |
Bottom Line: ROI for choosing, using, losing has changed radically. Stop talking about lock-in, it’s just refactoring dependencies.
- The cost of each dependency is far lower
- Frequency of refactoring is far higher
- Investment and return is much more incremental
Cloud Native Availability Model
Four layers: People, Application, Switching, Infrastructure
- First layer (bottom): infrastructure and services. No single point of failure
- Second layer: Switching and interconnecting. Data replication, traffic routing, avoiding issues, anti-entropy recover
- Third layer: Application Failures. Error returns. Slow response. Network partition
- Fourth Layer (top): People. Unexpected application behaviour often causes people to intervene and make the situation worse
Chaos Engineering Tools
- Game days
- Simian Army
- FIT (Failure Injection Testing)
- ChAP (Chaos Automation Platform)
- Gremlin (End-to-end Chaos Engineering automation)
Security Red Team
“You should have a security red team who tries to brake into your site”
Tools
- Safestack AVA (Social engineering attacks)
- Metasploit
- Nmap
- AttackIQ
- SafeBreach
“Running a game day is more important than technology”
Five Reasons to use Kafka In the Cloud
Presenter: Ben Stopford
Quotes
“Kafka is a fully formed streaming platform”
“A distributed log is the formal definition for it”
“Big companies transition slowly to the Cloud. Netflix continued using their data centres for many years after moving to the Cloud”
“Kafka does not typically stretch across data centres. Typically you have two clusters and replicate between the two. Same concept applies in the cloud”
“The bottleneck in interaction with data stores is often updating an index somewhere”
Recommendations
- Consolidate organisational datasets and make them available in both (all) locations
- Do wrap data in schemas
- Don’t consolidate datamodels with legacy applications. Do it later
Practicalities
- Shared storage or instance storage: Shared storage has faster rebalancing times on failure. Instance storage is cheaper
- Confluence provides Docker images
- Be wary of instance variability
Other Points:
- Store shared datasets in the Brokers
- Use compacted topics for long lived datasets: they delete superceded messages that share the same key
- Enable replication. Handle Machine Failure Automatically.
- Most people run Kafka with a replication factor or three
- There is a 6 timeout for an unavailable node to be noticed
- Always on - Rolling Releases.
- Use the rack awareness feature to ensure you can lose a single AZ
- EC2 has a “as a service” feature
- NY Post about history of articles
- 100TB > tuning may be required
- For 2-way replication, use topic prefixes to remove cyclic dependencies: For example cloud_orders and perm_orders.
- Use bandwidth control to manage SLAs
A Microservices Journey at JPMorgan Chase
Presenters:
- Peter Maciver is Lead Design Authority of the “Manta” program (Asset and Wealth Management).
- Matthew Stine is Global CTO at Pivotal.
Quote from Peter Maciver
“A Cloud Native approach is just not for Netflix; it works for a highly regulated firm like JPMC”
MANTA
Market Data and Tradeable Assets (MANTA). Manta was developed to take advantage of the features of the internal cloud platform. It went into production in 2017 and has subsequently provided the architecture blueprint for microservice development across the entire firm.
- MULE Soft API in the beginning and then migrated to Pivotal tools
- Initial approach consisted on breaking a Monolith based on EJB and stored procedure into microservices and orchestrating services
- Chaos Monkey in Use
- Enrichment Crash event causes messages that were being processed or will be processed to be retired until successfully processed.
- DDD
- Tech Premiers and Reference Implementation
Provable Characteristics
- Autonomous Deployability
- Data Management
- Independent Evolution
- Autonomous Elasticity
- Resiliency and Fault Tolerance
Architecture Review Process
- Application and data decomposition strategies
- Microservices architecture principles
- Twelve Factor application principles
- Distributed computing principles and patterns
Choices
- Spring Cloud Stream / RabbitMQ - based architecture
- Event-driven micro-services pattern
- Pivotal wrote a whitepaper for Banking Reference Architecture: “Secure, Hybrid Banking Reference Architectures for Cloud-Native Applications”
- Refactoring of some of the resiliency patterns (Circuit Breakers -> Retry)
What works according to Mathew Stine
- Start with something real
- Start with a set of quality attributes (“provable characteristics”)
- Iterate toward the goal
- Production!
- Extrapolate patterns, validate with other use cases
Utility Services are Useful
- 4: Microservices
- 3.5: Utility Services
- 3: Application Platform
- 2: Communication
- 1: Hardware
Misc
- Production-Ready Microservices (recommended book)
- Event-Driven Architecture “as a Service”
RPC
- A local call paradigm (RPC) does not have awareness of Time Outs and Circuit Breakers.
- Temporal Coupling: The server must be available when you call
- Behavioural Coupling: The sender of the message determines what to do, not the Receiver
- Prefer Messaging!
Citations
“We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact with a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and taka into account issues of concurrency.”
A Note on Distributed Computing by Jim Waldo, Geoff Wyant, Ann Wollrath and Sam Kendall
Temporal and behavioural coupling diagram by Ian Robinson
Cloud Native Apps with GitOps
Presenter: Alexis Richardson
ROODA
- Release
- Observe
- Orient
- Decide
- Act
- Release
Quotes
“CNCF: For the first time, we have the entire industry pointing in one direction when it comes to Cloud Native applications”
Fundamental Theorem of DevOps “What can be described, can be automated and accelerated”.
Meet Ups
2017-10-03 Cloud Native London
Linkerd Service Mesh
Service Mesh Properties (Linkerd):
- client-side load balancing
- circuit-breaking
- service discovery
- retries and deadlines
- TLS: security/encryption
- Connection pooling
- Distributed tracing
The Eight Fallacies of Distributed Computing
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn’t change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
Linkerd:
- It’s a Data Plane but integrates with Control Planes including namerd and Istio
- JVM based, built on battle-tested Netty and Finagle
- Open Source + great community
- Does one thing well and it’s open to integrate with other tools that do their thing well
- Namers (service discovery): ZooKeeper, Consul, Kubernetes, Marathon, …
- Telemeters: Prometheus, StatsD, TraceLog, Zipkin, …
- Control Planes: namerd, Istio
Deployment models:
- Once per host
- Once per service (sidecar)
Different deployment configurations:
- service-to-linker
- linker-to-service
- linker-to-linker
Drawbacks:
- Big memory footprint (256MB)
- Opening a new connection takes a lot of time (>100ms)
- Some service mesh configuration should live in your application logic instead
“The way that microservices interact with other at runtime needs to be monitored , managed, and controlled”