When large numbers start piling up, in order to make sense of them, they need to be visualized.
I still work as a consultant at Outbrain about one day a week, and most of the time I’m in charge of the deployment system last described here. The challenges that are encountered when we develop the system are good challenges, and every day we have too many deployments to be easily followed, so I decided to visualize them.
On an average day, we usually have a dozen or two deployments (to production, not including test clusters) so I figured why don’t I use my google-visualization-fo0 and draw some nice graphs. Here are the results and explanations follow.
Before I begin, just to put things in context, Outbrain had been practicing Continuous Deployment for a while (6 months or so) and although there are a few systems that helped us get there, one of the main pillars was a relatively new tool written by the fine folks at LinkedIn (and in particular Yan
— Thanks Yan!), so just wanted to give a fair shout out to them and thank Yan for the nice tool, API and ongoing awesome support. If you’re looking for a deployment tool do give glu a try, it’s pretty awesome! Without glu and it’s API all the nice graphs and the rest of the system would not have seen the light of day.
The Annotated Timeline
This graph may seem intimidating at first, so don’t be scared and let’s dive right into it… BTW, you may click on the image to enlarge it.
First, let’s zoom into the right-hand side of the graph. This graph uses Google’s annotated timeline graph which is really cool for showing how things change over time and correlate them to events, which is what I do here — the events are the deployments and the x-axis is the time while they is the version of the deployed module.
On the right-hand side you see a list of deployment events — for example, the one at the top has “ERROR www @tom…” and the one next is “BehavioralEngine @yatirb…” etc. This list can be filtered so if you type a name of one of the developers such as @tom or @yatirb you see only the deployments made by him (of course all deployments are made by devs, not by ops, hey, we’re devopsy, remember?).
If you type into the filter box only www you see all the deployments for the www component, which by no surprise is just our website.
If you type ERROR you see all deployments that had errors (and yes, this happens too, not a big deal).
The nice thing about this graph from is first that while you filter the elements on the graph that are filtered out disappear, so, for example, let’s see only deployments to www (click on the image to enlarge):
You’d notice that not only the right-hand side list is shrunk and contains only deployments to www, but also the left-hand side graph now only has the appropriate markers. The rest of the lines are still there but only the markers for the www line are on the graph right now.
Now let’s have a look at the graph. One of the coolest things is that you can zoom into a specific timespan using the controls at the lower part of the graph. (click to enlarge)
In this graph, the x-axis shows the time (date and time of day) and the y-axis shows the svn revision number. Each colored line represents a single module (so we have one line for www and one line for the BehavioralEngine etc).
What you would usually see is for each line (representing a module) a monotonically increasing value over time, a line from the bottom left corner towards the top right corner, however, in relatively rare cases where a developer wants to deploy an older version of his module, then you clearly see it by the line suddenly dropping down a bit instead of climbing up; this is really nice, helps find unusual events.
In the next graph, you see an overview of deployments per day.
This is more of a holistic view of how things went the last couple of days, it just shows how many deployments took place each day (counts production clusters only) and colors the successful ones in green and the failed ones in red.
This graph is an executive summary that can tell the story of – in case there are too many reds (or there are reds at all), then someone needs to take that seriously and figure out what needs to be fixed (usually that someone is me…) – or in case the bars aren’t high enough, then someone needs to kick developer’s buts and get them deploying something already…
Like many other graphs from Google’s library (this one’s a Stacked Column Chart, BTW), it shows nice tooltips when hovering over any of the columns with their x values (the date) and their y value (number of successful/failed deployments)
Versions DNA Mapping
The following graph shows the current variety of versions that we have in our production systems for each and every module. It was attributed as a DNA mapping by one of our developers b/c of the similarity in how they look but that’s how far this similarity goes…
The x-axis lists the different modules that we have (names were intentionally left out, but you can imagine having www and other folks there). The y-axis shows the svn versions of them in production. It uses glu’s live model as reported by glu’s agents to zookeeper.
Let’s zoom in a bit:
What this diagram tells us is that the module www has versions starting from 41268 up to 41463 in production. This is normal as we don’t necessarily deploy everything to all servers at once, but this graph helps us easily find hosts that are left behind for too long, so for example, if one of the modules had not been deployed in a while then you’d see it falling behind low on the graph. Similarly, if a module has a large variability inversions in production, chances are that you want to close that gap pretty soon. The following graph illustrates both cases:
To implement this graph I used a crippled version of the Candle Stick Chart, which is normally used for showing stock values; it’s not ideal for this use case but it’s the closest I could find.
That’s all, three charts is enough for now and there are other news regarding our evolving deployment system, but they are not as visual; if you have any questions or suggestions for other types of graphs that could be useful don’t be shy to comment or tweet (@rantav).