Cloud Native Patterns by Cornellia Davis

January 01, 2021

Cloud Native Patterns by Cornellia Davis is about defining the cloud native term as a set of application development priorities towards fault-tolerance, availability, and scalability. In it, the author explains several popular design patterns for designing cloud native applications.

Cloud Native Patterns by Cornellia Davis is about defining the cloud native term as a set of application development priorities towards fault-tolerance, availability, and scalability. In it, the author explains several popular design patterns for designing cloud native applications.

The author purpose in writing the book is to (1) define cloud native as in application development principles of fault-tolerance, availability, and scalability, and to (2) highlight several of the key design patterns that are useful when designing cloud native applications.

Defining Cloud Native

The author begins by provoking the reader with an (always appropriate) reference to the princess bride and Cloud Native "I don't think it means what you think it means". Cornellia Davis' main purpose in writing the book is to correct a misconception that writing cloud native applications is about writing applications to be hosted in the cloud. This may seem like a sensible sensible assumption, but in fact Cloud Native is not really about where an application is hosted, it is all about how the application works.

Cloud-native software is highly distributed, must operate in a constantly changing environment, and is itself constantly changing.

For an application to be Cloud Native it should be:

Always Up: which means resilient to failure (fault-tolerant)
Agile Releases Model: its components can evolve independently, be deployed independently, and release frequently
Dynamic Demand Model: adapt to different sources and volumes of demand, scaling dynamically and continuing to function

Cornellia Davis' model for the cloud-native application has three entities:

Applications: scalable, configurable, maintainable, monitorable code
Data: distributed, object oriented, microservice-friendly data over so-called "data monoliths"
Interactions: reasonable, reliable, monitorable relationships between cloud native applications and data

Cloud native applications are concerned with

Horizontal (out/in) Scalability
Stateless
Configuration
Application lifecycle

Cloud native data are concerned with

Breaking down data monoliths
Distributed data fabric and polyglot persistence
Distributed data synchronicity

Cloud native interactions are concerned with the relationships between cloud native applications and data, expressed through

Async relationships
Pull-based relationships
Distributed infrastructure
Composite monitoring and tracing

The Problem of Scale

Central to Cloud Native Patterns is the problem of scale. Cloud Native grew out of recognition that as applications scale funny things start to happen.

For example, different parts of the application often develop different levels of demand. A blog's reads will greatly outstrip its write demand.

Also organization challenges will emerge. More engineers working on the same application will introduce organizational challenges. When the application spans multiple business domains, it can become difficult to conceptualize and reason about. DDD, Object Oriented Design, and Microservices emerged to support multiple teams. Cloud Native takes those concepts further, adding to the rich literature on how to deal with scale and complexity.

Unpredictable Evolution of the Digital Ecosystem

Digital applications have become critical systems. We rely on them increasingly for our finance, business, transportation, and health.

Engineers who develop applications that are not reliable are not useful and potentially dangerous. Cloud Native Patterns argues that developers should be valuing risk as a key feature of their services.

Developers may not be able to predict ahead of time how their services may evolve in the broader ecosystem. It needs to be a polite and cautious citizen, caring about its upstream clients and downstream services. It must takes steps to be trustworthy, secure, and reliable.

Cornellia Davis Highlights Some Enablers:

Continuous Delivery
Repeatability
Safe Deployments
Change is the Rule

Cloud Platform

Although the book makes it clear Cloud Native isn't about hosting, the author explains how the philosophical tenants of Cloud Platforms supports and enables Cloud Native.

DEFINITION The cloud-native platform presents application dial tone: an inter- face that makes the application the first-class entity that the developer or operator interacts with.

Cloud Platforms follow a declarative model so developers can specify (usually a version-controlled yml or json file) "I want this thing" and the responsibility of the Cloud Platform is to build and deploy that thing. If the actual state diverges from the desired state, it takes a corrective action (usually to spin up a new instance of the thing). This gets us away from the model of sysadmins responsible for debugging configurations and "fixing" a given instance ("Cattle not Pets").

Event Driven Architecture

First the author establishes the basic software architecture "invocation style" which all of the later cloud-native patterns are applied.

Our software as a whole is no longer executing in a single process [...] a requestor can no longer depend on an immediate response when a request is made [...] with the availability of React.js and other similar frameworks, reactive programming is becoming more commonplace for code running in the browser, but server-side programming remains heavily dominated by request/response.

The Cloud Native response is to rely more on Event Driven architecture patterns. This pattern decouples the client and the service.

The entity that triggers code execution in an event-driven system doesn’t expect any type of response—it’s fire and forget.

Command Query Responsibility Segregation

The Author dives into the popular Command Query Responsibility Segregation (CQRS) pattern. Commands are CRUD services (create, update, and delete). Queries are reads. A database stores the state for a service and RESTful services support HTTP GET, PUT, POST, and DELETE operations to interact with that data store.

The book explains how the CQRS pattern allows Commands to be developed, monitored, maintained, and scaled separately from Queries.

When you have a single controller, the model for both read and write operations is the same, but when you separate the business logic into two separate controllers, each can have its own model. This can be powerful.

Event Sourcing

Later, the book dives deeper into events focused architectures, outlining Event Sourcing. This pattern extends messaging patterns to allow for different applications writing to and from the same event stream and also futher seperating the reads architecture to optimize for use cases.

Organizational Patterns

The book also touches briefly on how organizational patterns may relate to architecture. For example Platform Teams and Application Teams from a layered architecture view.

This topic is covered in detail in Team Topologies: Organizing Business and Technology Teams for Fast Flow by Matthew Skelton and Manuel Pais.

Application Lifecycle

The author points out that in a Cloud Native ecosystem, the concept of the Application Lifecycle: the running states of an application. We should "pay careful attention to how app lifecycle events affect other apps that form the broader piece of software". Understanding these states of operation helps us model deployment, operation, and exception cases.

I sometimes like to think of these platforms as robots—robots that are handling a whole host of operational tasks that humans used to do. But here’s the thing: robots don’t read release notes.

We need to take into appreciate the operational concerns:

Manageability: similar to the idea of limiting toil, manageability is when an app can be supported with a high level of quality and the least amount of unnecessary human intervention
Resilience: we need to ensure that the platform has a fail-safe way of detecting when an app has failed
Responsiveness: we need to ensure users or clients receive outputs "in a timely manner, and what is considered timely depends on the use case"
Cost management: cost efficiency of cloud. Pay for what you use.

We also can design our deployment strategies to take advantage of the application lifecycle.

Canary Deployments
Blue / Green Deployments
Rolling Deployments
Batch Rolling Deployments

Remember that apps don't all share the same application lifecycle. Cloud technology creates the opportunity for new application lifecycle patterns (e.g. Serverless).

Control Loops

The author talks a lot about Control Loops. She explains that control loops are an important part of Cloud Native software. The author uses Kubernetes replication controller to explain the concept.

The Kubernetes replication controller implements a control loop that allows you to specify your app deployment declaratively, and Kubernetes will create and maintain that application topology. The control loop never expects to reach a done state. It’s designed to constantly be looking for the inevitable change and to respond appropriately.

Circut Breaker Pattern

As a service, plan for cases when you may be overloaded or failure rate of a downstream service may be high. While rate limiters can act as a buffer to your service, you need to be able to handle cases where your service is not able to complete requests because a downstream service is slow or timing out. By tracking failure rate. Rather than bombarding downstream clients with requests that result in failure, we can trip the circuit breaker and handle the case in a more graceful way.

The Circuit Breaker pattern can notify upstream clients potentially triggering back-off. Alternatively the service can handle these exceptions with cached data.

API Gateway Pattern

The role of the API gateway is to sit in front of bits of implementation and provide a whole host of services. These services might include the following:

Authentication and authorization
In-Flight Data Encryption
Rate Limiting: Protecting the service from load spikes
Access logging: e.g. for auditing and operations observability

These are cross-cutting concerns that needn’t be implemented over and over again. Use of API gateways relieves the developer of functionality that could just be viewed as plumbing, allowing them to focus on business needs. But perhaps even more important, it provides a point where enterprise controls can be uniformly applied.

However, the API Gateway can become a bottleneck and isn't especially Cloud Native in its approach. So in cloud-native architectures, API Gateway architectures are distributed and language agnostic. Sidecars & Service Meshs, we tack API Gateways onto the side of services. This eliminates networking overhead and eliminates CORS concerns. Examples of this are the Kubernetes Pod.

The Security and Compliance officer can define and embed policies in the Service Mesh network. These can ensure that traffic remains authenticated and authorized.

Istio is one of the most widely used service meshes, an open source project that was incubated by Google, IBM, and Lyft.

What does the book add to the discussion?

The author does a good job of introducing Cloud Native, makes a persuasive case, and hits a lot of high-value patterns. She wove in the role of Cloud Platforms and tools like Kubernetes that have developed a large footprint in the ecosystem and will likely help shape the future of Cloud Native.

Some of the examples, especially the ones requiring following along with specific programming steps interrupted the flow of the narrative and made it difficult to follow the key points but the chapter summaries helped there.

Overall, I think the book is a good read, though the chapters that dive into the weeds will need to be updated in two or three years, as these patterns evolve and likely new PaaS / SaaS abstractions in cloud architecture and Cloud Platforms gain traction.