DevOps – The Outbrain Way

Like many other fast-moving companies, at Outbrain we have tried several iterations in the attempt to find the most effective “DevOps” model for us. As expected with any such effort, the road has been bumpy and there have been many “lessons learned” along the way. As of today, we feel that we have had some major successes in refining this model, and would like to share some of our insights from our journey.

Why to get Dev and Ops together in the first place?

A lot has been written on this topic, and the motivations and benefits of adding the operational perspective into the development cycles has been thoroughly discussed in the industry – so we will not repeat those.

I would just say that we look at these efforts as preventive medicine, like eating well and exercise – life is better when you stay healthy. It’s not as good when you get sick and seek medical treatment to get health again.

What’s in a name?

We do not believe in the term “DevOps”, and what it represents. We try hard to avoid it – why is that?

Because we expect every Operations engineer to have development understanding and skills, and every Developer to have operational understanding of how the service he / she developes works, and we help them achieve and improve those skills – so everyone is DevOps.

We do believe there is a need to get more system and production skills and expertise closer to the development cycles – so we call it Production Engineers.

First try – Failed!

We started by assigning Operations Engineers to work with dedicated development groups – the big problem was that it was done on top of their previous responsibility in building the overall infrastructure (config management, monitoring infrastructure, network architecture etc.), which was already a full time job as it was.

This mainly led to frustration on both sides – the operations eng. who felt they have no time to do anything properly, just touching the surface all the time and spread too thin, and the developers who felt they are not getting enough bandwidth from operations and they are held back.

Conclusion – in order to succeed we need to go all in – have dedicated resources!

Round 2 – Dedicated Production Eng.

Not giving up on the concept and learning from round 1 – we decided to create a new role – “Production Engineers” (or PE for short), whom are dedicated to specific development groups.

This dedication manifest in different levels. Some of them are semi trivial aspects, like seating arrangements – having the PE sit with the development team, and sharing with them the day to day experience; And some of them are focus oriented, like joining the development team goals and actually becoming an integral part of the development team.

On the other hand, the PE needs to keep very close relationship with the Infrastructure Operational team, who continues to develop the infrastructure and tools to be used by the PEs and support the PEs with technical expertise on more complex issues require subject matter experts.

What & How model:

So how do we prevent the brain split situation of the PE? Is the PE part of the development team or the Operations team? When you have several PEs supporting different development groups – are they all independent or can we gain from knowledge transfer between them?

In order for us to have a lighthouse to help us answer all those questions and more that would evident come up, we came up with the “What & How” model:

“What” – stands for the goals, priorities and what needs to be achieved. “The what” is set by the development team management (as they know best what they need to deliver).

“How” – stands for which methods, technologies and processes should be used to achieve those goals most efficiently from operational perspective. This technical, subject matter guidance is provided by the operations side of the house.

So what is a PE @ Outbrain?

At first stage, Operations Engineer is going through an on-boarding period, during which the Eng. gains the understanding of Outbrain operational infrastructure. Once this Eng. gained enough millage he /she can become a PE, joining a development group and working with them to achieve the development goals, set the “right” way from operational perspective, properly leveraging the Outbrain infrastructure and tools.

The PE enjoys both worlds – keeping presence in the Operations group and keeping his/hers technical expertise on one hand, and on the other hand be an integral part of the development team.

From a higher level perspective – we have eliminated the frustrations points, experienced in our first round of “DevOps” implementation, and are gaining the benefit of close relationship, and better understanding of needs and tools between the different development groups and the general Operations group. By the way, we have also gained a new carrier development path for our Operations Eng. and Production Eng. that can move between those roles and enjoy different types of challenges and life styles.

2 Comments

Lior

January 14, 2016 at 5:11 pm

Nice Article! – thanks for sharing.

It is written “carrier development path” but I guess you meant “career development path”

Gili Nachum

January 17, 2016 at 7:17 am

Exactly the same process we went through in IBM.
Our team needed to add Solr to the cloud infrastructure, we tried to get an engineer from the central infrastructure team to take care of this for us. It never worked! the guy was overloaded even before he started to work for us, also, there was always the tension of who owns what? app team or central infra team. Infrastructure work became a bottleneck.
A couple of months ago we switched to have a dedicated team member to take of Solr deployment in production. Now there’s only one owner, communication turned much easier, we switched from Bash to Python, and we have proper requirements/design/coding cycles.
Downside is that we “lost” an app engineer, but we have gained so much more.

In Spotify’s terms, while our PE is part of the app Squad, we encourage him to learn and sync up with the rest of the deployment automation Guild.
http://blog.crisp.se/2012/11/14/henrikkniberg/scaling-agile-at-spotify