The Three Ways (unfinished)

Aug 1, 2019

TRIGGER

$REDACTED asked Fodder to create a “roadmap to DevOps” plan for The Corporation. $REDACTED was provided with the following rough draft, in an effort to gauge if he was heading down the right path.

RESOURCES

ECO

An outline to DevOps velocity

By STOP DevOps

Overview

This document attempts to outline several paths to DevOps adoption within $REDACTED Monitoring Solutions. It was designed to be as concise as possible; rather than providing too much detail, it attempts to bring to light certain ideas, challenges, and processes.

We present three potential paths to DevOps adoption. The first is a hybrid approach, offering meaningful change while maintaining current architecture and a separation of duties. The second offers a more traditional approach, maintaining a strict separation between business units, and adopting current $REDACTED tooling. The third is a modern approach that requires massive organizational and architectural change, with the potential for massive gains.

Goals

Every project plan must have a defined set of shared goals, so that all parties are moving in the same direction. The below goals are some, but not all, of those goals:

Reduce waste within build and release pipelines
Improve stability, speed, and security of all architecture
Continuously roll-forward, always failing-fast
Appease all involved business units
Complete DevOps adoption within 90 days
Foster a culture that reduces workload, and replaces it with experimentation/learning

Metrics

DORA (DevOps Research and Assessment) classifies organizations, in part, by using the following criteria:

Terminology

There are several terms that will be shared across most plans. The below definitions will attempt to explain these concepts, from the perspective of our organization.

Concepts

The Three Ways

Briefly, The Three Ways are:

The First Way: Work always flows in one direction – downstream
The Second Way: Create, shorten and amplify feedback loops
The Third Way: Continuous experimentation, to learn from mistakes and achieve mastery

The First Way

The First Way states the following about the flow of work:

Work should only flow in one direction
No known defect should be passed downstream
Always seek to increase the flow

The First Way helps us think of IT as a value stream. Think of a manufacturing line, where each work center adds a component – and thus, value – to the line. Since each work center adds value, it is preferred, that each work center does their part correctly the first time around.

The Second Way

The Second Way describes the feedback process as the following:

Establish an upstream feedback loop
Shorten the feedback loop
Amplify the feedback loop

The Second Way teaches us to think of information as a value-addition. When used correctly, feedback can help to optimize the value stream.

For example:

Why was there so much wait time at this work center? Resource A was held up.
Why did this process have to be redone? Because it wasn’t done right the first time.
“Improving daily work is even more important than doing daily work.” – (Gene Kim, The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win)

The Third Way

The Third Way describes environment and culture through the following practices:

Promote experimentation
Learn from success and failure
Constant improvement
Seek to achieve mastery through practice

The Third Way teaches us that culture and environment are just as important as the work being done. It advocates a culture of experimentation and constant improvement. This results in measured risks and being rewarded for good results.

Continuous Integration

In software engineering, continuous integration (CI) is the practice of merging all developers' working code into a shared mainline several times per day. The main aim of CI is to prevent integration problems, referred to as “integration hell” in in the early days of CI.

CI was originally intended to be used in combination with automated unit tests written through the practices of test-driven development. Initially this was conceived of as running and passing all unit tests in the developer’s local environment before committing to the mainline. This helps avoid one developer’s work-in-progress breaking another developer’s copy. Where necessary, partially complete features can be disabled before committing, using feature toggles for instance.

In addition to automated unit tests, organizations using CI typically use a build server to implement continuous processes of applying quality control in general — small pieces of effort, applied frequently.

Continuous Deployment

Continuous deployment is a strategy for software releases wherein any code commit that passes the testing phase is automatically released into the production environment, making changes that are visible to the software’s users.

Continuous deployment eliminates the human safeguards against unproven code in live software. It should only be implemented when the development and IT teams rigorously adhere to production-ready development practices and thorough testing, and when they apply sophisticated, real-time monitoring in production to discover any issues with new releases.

Continuous Delivery

Continuous delivery (CD or CDE) is a software engineering approach in which teams produce software in short cycles, ensuring that the software can be reliably released at any time and, when releasing the software, doing so manually. It aims at building, testing, and releasing software with greater speed and frequency. The approach helps reduce the cost, time, and risk of delivering changes by allowing for more incremental updates to applications in production. A straightforward and repeatable deployment process is important for continuous delivery.

Configuration Management

Configuration management (CM) is a systems engineering process for establishing and maintaining consistency of a product’s performance, functional, and physical attributes with its requirements, design, and operational information throughout its life. The CM process is widely used by military engineering organizations to manage changes throughout the system lifecycle of complex systems, such as weapon systems, military vehicles, and information systems. Outside the military, the CM process is also used with IT service management as defined by ITIL, and with other domain models in the civil engineering and other industrial engineering segments such as roads, bridges, canals, dams, and buildings.

Containers

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all the parts it needs, such as libraries and other dependencies, and ship it all in one package.

Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels. All containers are run by a single operating-system kernel and are thus more lightweight than virtual machines. Containers are created from images that specify their precise contents. Images are often created by combining and modifying standard images downloaded from public repositories.

Declarative vs. Imperative languages

Declarative programming is a programming paradigm that expresses the logic of a computation without describing its control flow. Imperative programming is a programming paradigm that uses statements that change a program’s state.

While declarative programming requires a high degree of understanding to create, and imperative programming skills to develop – they are much easier to understand. This facilitates a better, faster knowledge transfer, and a lower barrier to entry.

Infrastructure as Code

One of the key innovations of the DevOps movement is the idea of infrastructure as code. In this paradigm, we reproduce and change the state of our environments in an automated fashion from information in version control rather than configuring infrastructure manually.

This way of working is a natural fit for cloud infrastructure, where resources can be provisioned and configured through APIs. Tools such as Terraform make it simple to provision and evolve cloud infrastructure using declarative, version-controlled configuration. This, in turn, makes provisioning testing and production environments fast and reliable, improving outcomes both for administrators and users of cloud infrastructure. Similar techniques can be used to deploy applications automatically.

In the 2018 “State of DevOps Report”, published by DORA (DevOps Research and Assessment), 44 percent of cloud adopters agreed or strongly agreed that environment configuration and deployments use only scripts and information stored in version control, with no manual steps required (other than approvals). Respondents using infrastructure as code are 1.8 times more likely to be in the elite performance group.

The Twelve-Factor App

In the modern era, software is commonly delivered as a service: called web apps, or software-as-a-service. The twelve-factor app is a methodology for building software-as-a-service apps that:

Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
Minimize divergence between development and production, enabling continuous deployment for maximum agility;
And can scale up without significant changes to tooling, architecture, or development practices.

The twelve-factor methodology can be applied to apps written in any programming language, and which use any combination of backing services (database, queue, memory cache, etc).

Tools

Docker
Kubernetes
Bitbucket
Terraform
Packer
Vault
Helm
Helmfile
Vagrant
KitchenCI
Semantic-Release
Bamboo
Minikube

Where we stand

STOP has adopted few discernable DevOps practices today. STOP is lacking in several key areas:

Dev/Ops team cohesion
Development velocity
Automated testing
Infrastructure-as-code
No-fault design and processes
A culture of learning and experimentation

Successes

Change Management is an important process

The main goal of any change management process is to ensure that changes are managed in a cohesive way, ensuring that there are checks and balances to establish and support the implemented enterprise operational model around user impact, corporate policies, regulatory compliance and security. In order to achieve this goal, the change management process needs to determine how changes are to be managed, what techniques are to be applied, and what methodologies are to be used.

Monitoring processes are very good

STOP monitoring processes are robust and effective. Time-series monitoring performed by the PerlTools application, in particular, is in-line with modern best practices. Orion (poll-based) monitoring is similarly effective, though this type of monitoring is considered to be “old” by many DevOps organizations. This type of monitoring tends to be complex to manage, and it leads to notification overload. Conversely, time-series based monitoring leverages logic to minimize alerts and perform automated remediation.

Core applications are well-designed

The fundamental components of the $REDACTED application are generally well-designed. They are simple to install, spawn isolated services on a single port, are configured via environment variables, and will heal in the event of a failure.

Problems

Anti-Type B: DevOps Team Silo

The DevOps Team Silo (Anti-Type B) typically results from a manager or exec deciding that they “need a bit of this DevOps thing” and starting a ‘DevOps team’ (probably full of people known as ‘a DevOp‘). The members of the DevOps team quickly form another silo, keeping Dev and Ops further apart than ever as they defend their corner, skills, and toolset from the ‘clueless Devs’ and ‘dinosaur Ops’ people.

The only situation where a separate DevOps silo really makes sense is when the team is temporary, lasting less than (say) 12 or 18 months, with the express purpose of bringing Dev and Ops closer together, and with a clear mandate to make the DevOps team superfluous after that time; this becomes what I have called a Type 5 DevOps Topology.– (source: https://web.devopstopologies.com/)

RPMs were not designed to perform configuration management

STOP currently bundles required configurations for each environment into RPM packages, along with the app. This is a bad practice that has led to a need for different RPM packages (for the same app) in different environments.

A better practice is to build a single artifact, performing configuration outside of the package itself.

Security as an afterthought (and somebody else’s problem)

Poor package management has led to a lack of security with the STOP architecture. Because security was not built-into the build and release process, it is overlooked by both the developers and $REDACTED’ security teams.

A better practice is to build security alongside the app, in version control, where it may be reviewed by the proper teams. Secrets should be managed outside of RPM packages – and default credentials should NEVER be used in production environments.

Attachment to mutable infrastructure and processes

$REDACTED has built a platform around the concepts of configuration management, which is a type of tooling that many DevOps organizations are moving away from completely. Configuration management reinforces the practice of building mutable (can be changed) environments, whereas immutable (unchanging) architectures are better suited to DevOps.

A mutable infrastructure can be broken – leading to re-work, just to return to a “normal” state. Conversely, immutable infrastructures lend themselves to proper version control, ease of rolling forward/backward, and consistent environmental behaviors.

Retroactive development patterns are antithetical to DevOps

Similar to the above problem, standing environments – coupled with configuration management - lends itself to broken applications. Rather than rebuilding and retrying, developers are left to troubleshoot application problems – rather than performing more important work.

Ephemeral development environments, coupled with strict dev/prod parity, lead to consistent, clean environments that rarely require troubleshooting. When they do, issues are easily replicated in lower environments.

Over-the-wall development leads to excessive rework

Knowledge silos and separations of duty within STOP leads to a culture of fear, mistrust, and the “guarding of one’s territory.” Developers and Operations personnel have little to no relationship. A lack of trust leads to extensive email chains, explanation and re-explanation to make anything happen, and flat-out (sometimes arbitrary) barriers placed into workflows.

A better practice is to bring disparate teams together, to foster a culture of trust and understanding. More than anything, freedom to experiment and collaborate can enable this type of organizational change.

A large percentage of code is not version-controlled at all

Some believe that scripting is a DevOps solution. It’s not (necessarily). Automation without proper version control, automated testing, standardized deployments and declarative formats can, in many instances, increase complexity – not simplify it.

Imperative solutions are being presented as DevOps successes

This is a mistake. Declarative programming leads to more immutable, easier-to-understand infrastructures. While imperative programming provides the underlying logic for any good declarative tool, it also leads to complexity when used for every problem. Imperative programming/scripting requires logic and specialized knowledge in order to deal with edge-cases and failure states. By contrast, declarative languages provide a higher-level understanding of the underlying logic, are easier to learn, and add an element of trust; users of declarative solutions can generally “trust” that their software will behave as expected.

Assumptions

All the below plans assume that the following is true:

The organization will systematically attack individual service areas over time, rather than attempting to fix every system at once.
This document will outline a 90-day plan to integrate a single service (i.e. “the gateways”). Once a single service has been migrated, future services will be much easier and faster to migrate.
The organization will be unable – and should not be expected to – plan for every possible change. In a DevOps organization, change happens organically, and plans are generated automatically.

Planning

Below are several plans that will attempt to outline a path to DevOps success. Not all plans are created equal.

Plan 1: The hybrid route

Complexity: Low

Chance of success: High

Max Twelve-Factor Targets: 12

Description

The optimal route would have the organization adopt modern technologies that enable organizational change, while maintaining current system architecture. The argument for this path is one of skill; we are not yet ready to implement a full container-based architecture, and the benefits/need for one has not yet been justified.

On a hybrid approach:

“Perfect is the enemy of good.” - Voltaire

Advantages

Ease of adoption
Does not usurp the power of Ops or Security
Fails fast

Disadvantages

Will require a closer relationship with Ops and Security

Requirements

Docker
Bitbucket
Semantic-Release
Packer
Terraform
Vagrant
KitchenCI
Bamboo

Responsibilities

Timeline

Phase 1 (90 days)
1. Create a standard Docker image that can be used by developers and CI pipelines to build/test RPM packages.
2. Work with Operations to develop a standard set of Packer images and Terraform modules to be used for infrastructure deployments.
3. Work with Security to develop a standard set of Integration tests.
4. Target a service to improve (i.e. “the gateways”). Identify any required dependencies, and work on those services first.
5. Implement a local Docker and/or Vagrant environment to run the service.
6. Implement Terraform configuration to allow developers to deploy Packer images and application code into lower environments.
7. Implement Terraform code in CI pipelines, automating deployments to all environments.
8. Implement controls within the CI pipeline to allow for fast failure; unit testing for developers, integration testing for Operations and Security.
9. Implement automated semantic versioning within the CI pipeline.
10. Train employees in the new development practices.
Phase 2 (14-day cycles)
1. Repeat steps 4 through 10 until all or most services have been migrated.

Topology

Type 1: Dev and Ops Collaboration

This is the ‘promised land’ of DevOps: smooth collaboration between Dev teams and Ops teams, each specialising where needed, but also sharing where needed. There are likely many separate Dev teams, each working on a separate or semi-separate product stack.

My sense is that this Type 1 model needs quite substantial organisational change to establish it, and a good degree of competence higher up in the technical management team. Dev and Ops must have a clearly expressed and demonstrably effective shared goal (‘Delivering Reliable, Frequent Changes’, or whatever). Ops folks must be comfortable pairing with Devs and get to grips with test-driven coding and Git, and Devs must take operational features seriously and seek out Ops people for input into logging implementations, and so on, all of which needs quite a culture change from the recent past. – (source: https://web.devopstopologies.com/)

Process

Expenses

Terraform Enterprise - $???
DORA Assessment - $40k

Plan 2: The traditional route

Complexity: Medium

Chance of success: Medium

Max Twelve-Factor Targets: 6