Speeding up with ephemeral and immutable infrastructure

Speeding up with ephemeral and immutable infrastructure
In the cloud, we know exactly what we want a server to be, and if we want to change that we simply terminate it and launch a new server with a new AMI. This is enabled by a change in how you think about managing your resources in the cloud or a virtualised environment. Also it allows us to fail as early in the process as possible and by doing so mitigate the inherent risk in making changes.
Greg Orzell in “Building with Legos” a Netflix Tech Blog article

Introduction

For years, infrastructure management was based on various processes and routines that required manual intervention by engineers or technicians. While these practices were effective, the development landscape has undergone significant changes in recent years. The advent of agile methodologies, shorter development cycles, increased focus on time-to-market speed, distributed systems, and scaled environments have made it challenging for traditional infrastructure management to keep pace. Cloud transformation and the cloud-native trend were the ultimate push that evidenced a change need.

A new, more agile approach to infrastructure management was needed to respond to these challenges. Instead of treating infrastructure as unique, valuable “pets” that required significant time, effort, and resources to maintain, a more standardized, commoditized approach was needed. By viewing infrastructure as replaceable “cattle,” organizations can standardize their systems, reduce the risks associated with manual management, and ensure their infrastructure is equipped to meet the demands of modern development.

The pets vs cattle analogy were first used by Randy Bias to explain the difference between traditional and new approaches to server management.

In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line.

In this article, we delve into the challenges of utilizing mutable and long-lived infrastructure and its effect on cloud-native transformations. We also explore the benefits of adopting an immutable and ephemeral infrastructure approach.

To provide practical insights, we will illustrate each topic with a real-world scenario from our experiences at xgeeks, demonstrating how utilizing immutable and ephemeral infrastructure has aided one of our customers in achieving a cloud-native transformation and better reliability and speed at delivering software.

Let’s get deep into the constraints of long-lived and mutable infrastructure

To understand the real benefits of immutable and ephemeral infrastructure, we need to get deep into the main challenges and constraints of a long-lived and mutable infrastructure in an agile development world:

  • An increase in operational complexity and consequently reduced reliability, the increase in distributed service architectures and dynamic scaling leads to a significant increase in maintenance and monitoring requirements, mainly due to changes in the runtime environment. Maintenance and configuration processes across multiple machines or servers are not compatible with flexible and continuously changing environments.
  • The previous point has a clear impact on the second, slower deployments. As infrastructure becomes unpredictable due to the multiple configurations and processes, the accuracy and consistency of information are diminished. This leads to a waste of time fixing configuration issues and debugging the runtime environment due to possible configuration drifts.
  • Next, there are also problems with the monitoring pain, imagine yourself searching for errors on a system running for a long time, with several processes running and several configuration changes over time.
  • And finally, there are fire drills or out-of-control events, like interventions, updates, or patches that you don’t have full control of, a cloud provider reboot or a zone outage could be a good example. This will increase the costs with on-call teams, being notified to put your infrastructure up and running again.

Our customer scenario at the beginning presented several challenges in implementing agile development processes. Despite initial efforts, the organization has struggled to achieve desired results due to infrastructure constraints.

Previously, the company was delivering its product every 3 months, allowing for manual correction of any configuration drift. However, with an increased push for more frequent product delivery, the simple task of managing 12 big virtual machines where the backend and front end were hosted became a significant challenge. Configuration drift caused by independently configured instances and later resource starvation resulting from missing log rotations causing database problems were just a few of the difficulties faced.

So, what is exactly immutable and ephemeral infrastructure?

To understand immutable infrastructure, first, we need to understand what immutable means. “Immutable” refers to something that cannot be changed, altered, or modified.

In the context of software development and infrastructure, “immutable” is used to describe systems, components, or resources that remain unchanged during their entire lifecycle. This means that once they are deployed, they cannot be updated or modified in any way. Instead, a new version of the system, component, or resource must be created if changes are needed.

Now is the time to talk about ephemeral but first, let’s get deep into what the ephemeral term means. “Ephemeral” refers to something that is short-lived or temporary and does not persist for a long time.

In the context of infrastructure, the term “ephemeral infrastructure” refers to computing resources or components that are created dynamically and destroyed as needed, rather than being persistent and long-lived. This allows for greater flexibility, scalability, and ease of management in cloud-based or other dynamic computing environments.

As observed, both types of infrastructure differ in their design principles. While immutable infrastructure prioritizes stability through unchanging components, ephemeral infrastructure values flexibility through its ability to be easily replaced. By combining these two, an infrastructure is created that can quickly scale, deploy, and recover in response to changes in demand or conditions.

Coming back to our scenario, it became evident that those virtual machines needed to be transformed into immutable and ephemeral components. The persistence of these machines was hindering the client’s deployment process, so we needed to find a way to make these instances reproducible and externalize any non-reproducible elements.

What are the main advantages of using this type of infrastructure?

Now, let’s delve into the advantages of this method and why it helps organizations with their cloud-native transformation.

  • First, simplifying operations, once utilizing automated deployment techniques allows for the substitution of outdated resources with updated versions, ensuring your systems remain in their original “known-good” state.
  • Second, there is continuous and faster deployment and awareness of what is being run, and its behavior is maintained. Updating becomes a regular, ongoing process with fewer errors occurring in production and all updates can be monitored through source control and CI/CD processes.
  • Next, we have mitigation of errors and increase reliability, new instances can be raised almost instantly and their lifecycle is now much shorter, this will reduce the risk of data loss or corruption, as well as the risk of configuration drifts, vulnerability surface, and the level of effort required to meet service level agreements. This helps organizations maintain a high level of reliability and stability, even as their workloads change and evolve over time.
  • Another advantage is preparation for fire drills or cloud-ready components. Once you know the desired state of each machine, operations like reboot, recovery, and running can happen and you are much more confident when cloud reboots happen that your underlying instances should be handled gracefully and with minimal if any, application downtime.
  • The added benefit of improved scalability comes with the aforementioned advantage and this makes it easy to scale up or down as needed, without having to worry about the underlying hardware. This allows organizations to quickly respond to changing demands and to take advantage of new market opportunities.
  • And finally potential reduction of costs. Immutable infrastructure is ready to be dynamic which is very important when we are talking about provisioning infrastructure in a cloud provider. Another outcome in terms of reducing costs is a reduction in expenses related to the upkeep and upgrading of conventional, persistent servers.

Seems good so far, right? Let’s get back to our scenario.

To begin with, we started to externalize the database instance to a Platform-as-a-Service (PaaS) solution to reduce the risk of downtime, which allows us to simplify operations and increase reliability. We then followed three steps to make these machines immutable resources:

  • externalization of configurations
  • packaging
  • provisioning

We transferred all configuration management responsibilities to tools such as Consul and Vault from HashiCorp to achieve service discovery, configuration management, health checks, and secure storage of sensitive data. We used Packer also from HashiCorp to create pre-configured virtual machine templates that can be quickly deployed to save time and reduce manual configuration errors.

Finally, we established a deployment process for these machines using Terraform from HashiCorp, a leading Infrastructure as Code tool for provisioning.

After all these steps, the cloud was only one command away, since we were able to create reproducible infrastructure, which happened some months after, along with containerization and so much more.

Immutable and ephemeral infrastructure can be found in all sizes and forms

So far we have repeatedly mentioned the terms infrastructure, machines, and servers, but what can be turned into immutable and ephemeral infrastructure? Nearly everything can be, but let’s delve deeper.

Virtualization was the catalyst for the growth of immutable and ephemeral infrastructure. It was easy to create new servers, firewalls, etc. on a hypervisor, and if something went wrong, a new machine could be brought online with just a few clicks.

However, virtual machines became cumbersome due to their heavy weight and numerous layers of management, including the kernel, operating system, packages and dependencies, applications, and more. To address these issues, newer concepts such as containerization emerged, resulting in smaller, lighter, and simpler components for our infrastructure.

With the advent of tools such as Kubernetes, Apache Mesos, Nomad, OpenShift, and others, the concept of immutable and ephemeral infrastructure gained a new perspective. Not only can our servers be transformed into immutable and ephemeral components, but our services and applications can also be made easily replaceable.

Finally, cloud providers delivered the finishing touch to the world of immutable infrastructure. With the ability to provision infrastructure through simple API requests, nearly everything can be turned into immutable. Resources such as servers, firewalls, load balancers, applications, functions and more can now be set up quickly, efficiently, and most importantly, automatically, allowing us to keep pace with our company’s evolving requirements and demands.

Final thoughts

To finalize our scenario follow-up, currently, our customer has all kinds of sizes and forms of immutable infrastructure resources running in his company.

After the cloud transformation, the engineering organization was having a significant impact since it started delivering software on a weekly basis vs a quarterly basis with containerization already in place.

A foundational block behind this improvement and implementation is the immutable and ephemeral infrastructure concept which gave our customer the opportunity to increase the pace of development with flexibility, speed, stability, and reduced costs.