Keeping C.A.L.M.S. at HolidayCheck

In the last article called “Adapting DevOps culture with C.A.L.M.S. I’ve described the C.A.L.M.S. model and showed its importance and usefulness for proper adaptation of DevOps culture.

At HolidayCheck we believe in DevOps culture and try to follow it on daily basis. As a DevOps Engineer, I’m a member of developer teams and provide them with support regarding infrastructure and system engineering. Having the ability to work with several different scrum teams in this company I have noticed that there are better and worse adaptations of DevOps culture. However, the most recent one I had a chance to work with did it surprisingly well and I would like to share my thoughts about last six months working with them.

Six months ago I was assigned to a newly formed squad with one focus: to integrate external user handling solution with our platform. At first, it sounds simple. However, if you have three different platforms, written in different tech stacks, some of them during migration, some considered legacy with lack of documentation or people knowing how it works –it can be more difficult than one could think. Apart from writing our own services, we were also meant to work with code owned by other teams. That meant sending pull requests and asking people to review and accept. It’s also worth mentioning that there was already one approach to rewrite user handling modules. It was very painful and did not finish, which made user handling the least pleasant part to be developed in the IT department.
On top of that, the new team was assembled from people who hadn’t worked with each other before. They were taken from other teams and an external outsourcing company. It all made me a bit skeptical about this project.

Surprisingly, I have noticed that every team member independently brought elements of DevOps culture to the team. People had a strong sense of ownership, willingness to make a change despite the known UH reputation. Every sprint they focused on minimizing work-in-progress to deliver as much as possible, even if that caused their own tasks not to be delivered. But what I liked most was no fear of stepping out of one’s comfort zone and do stuff they were not specialized in.

The code we started to develop was kept as simple as possible, allowing people to take over development in case of someone else’s unexpected absence. Also, automation was kept lean. We chose Jenkins 2.X as our build/deployment server, set up a hook on whole GitHub repository and agreed that every repo, every branch will have a Jenkinsfile describing a pipeline. Although I was the one to set up the initial tool, the whole process of building and improving pipeline was quickly taken over by all team members adjusting it to their needs while I was providing support if needed. Demands, expectations, and being pushy were replaced here by pairing, contributing, and supporting to have it ready sooner.

We remembered to stay lean. Having a continuous delivery pipeline, every change merged to the release branch was immediately deployed to production. Also, we focused on the absolute MVP to be able to go live and handle at least some test traffic. This gave us very important feedback regarding possible improvements. Pull requests to other repositories were prepared and posted in advance. Therefore we had our changes in foreign code already deployed to production, not interfering with current functionalities and waiting for a moment to take over user handling flow.
As we were a pretty small team with three developers, a product owner and a devop, we were trying to keep meetings brief. People didn’t need reminding to prepare for refinements or plannings, so the time needed to work out an agreement was also short.

Before going live we started defining metrics and aggregating logs in one place. We had to put some extra effort to automatically pull logs from the provider but that paid off with detailed user monitoring and ability to cross check error logs with events delivered by a third party site. As the output, we got multiple dashboards and log filters analyzing almost every aspect of the running application: from pure system metrics like resource consumption, latency, and uptime, to detailed information about user behavior with an ability to trace errors back to few requests before to better understand the context. After exposing a new login to live traffic every 5xx error was immediately alerted on team’s Slack channel and, thanks to gathered links and dashboards, we could identify a root cause within a few minutes.

I also saved one more surprise for the end. Although the company policy was to have co-located teams, due to a shortage of personnel our team was partly distributed. Apart from me, all team members were sitting in the same room in Munich, DE and I was working from the office in Poznań, PL. Also due to other responsibilities I could not allocate more than 60% of my time for this team.
Our internal communication, sharing opinions and ideas, was so good that most of the time I didn’t feel excluded at all. To be honest, working with them, even as a remote devop, was more enjoyable to me than working with some other teams co-located in one room.

Now that our goal is achieved and I am switching teams once again, I have decided to take a look back at the last 12 sprints and try to learn from it. And of course to share with you.

Was it really that candy-sweet all the time? Of course not, we had our problems. Starting with me not being 100% in the team. I regret that I could not get more involved in coding. Sometimes, especially in the very middle of the project, our meetings were too long and seemed pointless to me. Out CI pipeline crashed several times, blocking the whole development process and causing lot of tension. We were dependent on other teams which sometimes weren’t willing to help us because it was not compliant with their OKRs. It all happened more than I’d like, but I’m happy that we did work it out together and fixed it instead of pointing fingers.

What I personally learned from it was that:

  • the proper mindset is an absolute foundation for good DevOps culture
  • having a smaller team of engineers inclined to be full-stack means it’s better at self-managing and does not suffer in case of someone being suddenly absent
  • automation should be lean and constantly improved. Don’t put too much overhead on it at the beginning.
  • we should treat our applications as our own piece of production cake, equip it with a number of useful metrics and get knowledge out of them
  • ideas for the technological process should not be turned down by product people, as they influence greater delivery speed in the end of the day

I hope that I can take this knowledge and use it in the new project I’m about to join.

Adapting DevOps culture with C.A.L.M.S.

DevOps is still quite a buzzword. There are already plenty of articles describing what it is and what it isn’t. I think we can agree that it’s a culture, a way of work. I’m also sure that most of us have a general impression about what it should look like: development and operations working together, breaking down silos, deliver faster, automate, etc. All of these are important and true, but still only seem to be a partial description. I started looking for a more complex description. And I found a very interesting model describing the culture. It’s C.A.L.M.S.

C.A.L.M.S. is an acronym for five major points describing a DevOps culture. Let’s have a quick look at them:

C – Culture
This is something you cannot implement. First, you should start with people having a proper mindset and it should concern ALL team members. Everyone should be focused on a common goal and help others achieve it whether it’s within your specialization area or not. Stepping out of your comfort zone and leaning towards becoming a full-stack engineer is encouraged.

A – Automation
We want to do as little boring stuff as possible. Therefore everything that can be automated should be done this way. And that’s not only writing scripts for testing and deployment but also adapting the idea of programmable infrastructure and having everything written down, versioned, and automatically managed.

L – Lean
Automating everything can be a pitfall that overcomplicates the infrastructure. Therefore engineers should focus on keeping everything minimal, yet useful. That doesn’t concern only automation – code deployments to production environment should be small and frequent and whole applications being developed – simple and easy to understand. It also applies to team size: larger teams find it more difficult to agree on something.

M – Measurement
Frequent releases give great flexibility but also can put the production environment in danger. That’s why a developed application should be equipped with useful metrics and monitored in real time. In case of problems the team can be notified quickly and is able to develop a fix. Teams can also monitor how new features influence user behavior.

S – Sharing
Sharing is essential for improving the communication flow and making people work together. Therefore it’s important to share ideas, experiences, thoughts: inside the team, among teams, and even outside the company.

What I like most about this model is how these points interact with each other. Automation should always be lean and robust. Providing an automated CI/CD pipeline helps teams to stay lean. While setting up monitoring it’s better to choose only valuable metrics and set up handy dashboards and alerts. The metrics can be shared among teams to set up a more complex application analysis tool that would automatically provide some wider context into the data we collect, which can be automatically analyzed and trigger lean changes in features …
The foundation for all these things is Culture. In my opinion that’s the most difficult point of all five. Without it, the other four points are just minor improvements to everyday work.

If you liked this article and would like to read about how this model applies to the team I used to work with, please let me know by leaving a comment.