Everybody is talking about cloud-native. Infrastructure is moving to the cloud. In the meantime, non-technical staff fails to realize that code that fulfills functional requirements is a small fraction of the final delivery.
There is an impedance mismatch between business and information technology professionals on what is software delivery.
This impedance mismatch can only be reduced by tearing down the walls that separate business and development team. This is a movement that started with Agile and XP (eXtreme Programming) but this a road that must be further explored. The communication must go beyond meetings, hybrid teams should have working sessions together, for instance, business people doing pair-programming with developers.
This article presents a bird view of what makes a cloud-native project, a description of its constituents, and why their presence is mandatory.
Fictitious software project
In this article I will use the following scenario: recently, a fictitious company decided to move to GCP (Google Cloud Platform) and adopted a cloud-native approach. The main development language is Python.
Cloud-native requires the extensive use of cloud provider serverless offerings and managed services. Everything must be automated, i.e., there is no space for manual deployments, environment or infrastructure commands, and any other non-repeatable and scripted actions. In fact, nothing can be done using graphical user interfaces.
A new project has been launched and a set of functional requirements were passed to the development team. Functional requirements are product features or functions that developers must implement to enable users to accomplish their tasks. Naturally, functional requirements will be fulfilled in Python code.
Multiple levels of code
Now I want to give you the shocking news: the Python code that directly responds to the project requirements is just 10 to 20% of the final codebase!
In this article, I will explain what the development team will be doing and the software layers present in a modern cloud-native project.
Application code
Application code is the Python and SQL code that responds to functional requirements. This is the code everybody is expecting from the beginning. No surprises here.
Unit tests
Unit test are also developed in Python. Usually there are more lines of unit test code than application code, sometimes 2 or 3 times more.
Test coverage is highly debated in the development community. About code coverage I just want to say that 100% is not desirable, a very good and realistic coverage is 70%.
CI/CD
CI/CD are the most defining engineering practices in cloud-native development. The only realistic way to have consistent development and deployment practices is with full automation. CI/CD is the automation of every task that supports testing, integration, and deployment in every relevant environment.
In GCP we usually write CI/CD scripts in Cloud Build. The most popular tool for CI/CD is Jenkins, where scripts are written in Groovy.
Cloud Build is defined as a series of steps where each step makes a call to a Docker container. This philosophy isolates steps and creates an extensible environment. It is possible to use pre-build containers with Git and other popular tools, and it is also possible to use customized containers.
Continuous Integration (CI)
Development must be supported in a source control tool. The most used source control tool nowadays is GitHub. Every new development should be done in short-lived branches. A branch should exist just for 1 day. When merging to the main branch, CI scripts ensure that the new code is successfully integrated with the codebase.
CI scripts should include steps for:
- unit-testing;
- static code analysis for style, security, etc.;
- type checking;
- generate documentation; and
- generate artifacts for testing purposes.
Continuous Delivery (CD)
Continuous Delivery generates versioned artifacts and publishes them to a repository. In GCP the repository usually is Google Container Repository. Other popular artifactory repositories are Nexus and JFrog. These scripts should contain steps for:
- integration tests;
- contracts testing;
- generate versioned artifacts;
- publish the artifacts to a repository.
Continuous Deployment (CD)
Continuous Deployment is described as scripts that deploy artifacts stored in an artifactory to a target environment. Continuous Deployment scripts also deploy other components that are not in an artifatory, for instance, configuration files. These components must nevertheless be managed by a source control tool. Oftentimes configuration files or environment variables configuration scripts are stored in dedicated source control repositories.
Continuous Deployment scripts are the hardest ones, they are also the ones that organizations have less experience with. The main difficulties associated with Continuous Deployment are related with the need to be associated with a sound production environment.
The success of automatic deployments to production environments require:
- monitorization of the production environment;
- artifacts instrumentation to provide metrics for instrumentation;
- capture of environment signals, for instance memory consumption, latency, and connectivity;
- ability to inject control data to access system health;
- automatic reaction to changing environment; and
- ability to rollback deployments.
Chaos Engineering is a set of techniques to experiment with code in production. Fails are injected into a live environment with the aim of verifying the capability of the system to react to those failures.
A very important technique to control deployments is to separate deployments and release using feature flags. Feature flags make it possible to deploy artifacts with certain hidden features that are activated and deactivated in run time. This technique permits that certain features are only available for certain clients or at a chosen release date. This technique also permits capabilities degradation in case of system overload.
Infrastructure as Code (IaC)
IaC is a mandatory automation layer in modern applications. Tools like Terraform and Pulumi describe infrastructure in declarative languages. These tools compare the desired state with the actual state and automatically modify the target environment if necessary.
Conclusions
This article revealed that modern cloud-native applications are very complex beats, where non-functional requirements far exceed functional requirements.
It is also important to realize that all the automation infrastructure must start in the project’s inception. It is impossible to catch up on automation after the development of the application code. It would be like starting a house construction from the ceiling.
It is possible to observe that nowadays the biggest chunk of effort during the life time of a successful project is on the automation of IaC and CI/CD.