Avi Youkhananov

December 14, 2020

Avi Youkhananov

Faster release with Maven CI Friendly Versions and a customised flatten plugin

Fed up with waiting for the maven release? We’ve found a way to cut the release time by half. Each of our teams at Outbrain is responsible for its own service code in its own repository. However, our teams also share a large Maven-based repository that contains modules (libraries) that get released as Maven artifacts. After a module is released, it can be used by the teams. Thus, the shared repository — in contrast to service code, which is managed within individual team repositories — serves as a centralised place to manage team libraries.

Since our shared repository has hundreds of modules managed by multiple teams, we started to face failures during release, and release time increased dramatically. We decided to tackle this issue to boost efficiency.

The solution we came up with was to move to the Maven Ci Friendly Versions, which eliminate race conditions. (The Maven release plugin involves a git commit phase to change the pom.xml versions, but the change cannot get pushed if the commit hash set is out of date, prompting the build to fail.)

Moreover, we stopped using the Maven Release Plugin, accelerating our release process.

Bye Bye, Release Plugin.

Until recently, we had been using the Maven Release Plugin in order to release our libraries. The plugin has a two-step process, with different commands involved (release:prepare, release:perform).

The “prepare” and “perform” goals involve building the project multiple times. Moreover only one release can be triggered at a time — i.e., multiple releases cannot run simultaneously due to race conditions that were described earlier — you must wait for the current release to complete before starting the next. For projects that, like ours, have long build times, this is a deal-breaker. The Maven release plugin took far too long to run.

Welcome, Maven CI-Friendly versions
The approach we took is lightweight compared to the Maven release plugin approach and allows for multiple releases to be triggered and run simultaneously.

Here are the advantages of this approach over using the release plugin:

The Maven CI-Friendly Setup

The structure of our Monorepo (which follows a parent-child hierarchy) allowed us to easily transform all our pom.xml files from hard-coded versions to ${revision} properties as our artifact versions, which can be overridden as well.

In order to avoid redefining the revision property for each module, we defined the revision property in the parent pom.xml.

Here is a child pom.xml:

<project>
 <parent>
  <artifactId>ci-friendly-parent</artifactId
  <groupId>com.outbrain.example</groupId>
  <version>${revision}</version>
 </parent>
 <artifactId>ci-friendly-child</artifactId>
 <name>CI Friendly Child</name>
</project>

And this is the parent pom.xml:

<project>
 <groupId>com.outbrain.example</groupId>
 <artifactId>ci-friendly-parent</artifactId>
 <name>CI Friendly Parent</name>
 <version>${revision}</version>
 <properties>
   <revision>1.0.0-SNAPSHOT</revision>
 </properties>
</project>

As you can see, we moved to the CI Friendly Versions using revision property, and we are now set up to issue a local build to verify that the definition is correct.

To issue a local build, which will not be published, we invoked
mvn clean package as usual. This resulted in the artifact version 1.0.0-SNAPSHOT.

Want to change the artifact version? Easy.
Use the following command:

mvn clean package -Drevision=<REPLACE_ME>

The Maven release plugin used the revision placed in the pom.xml to define the next revision for release. If a development pom.xml holds a version value of 1.0-SNAPSHOT then the release version would be 1.0.
This value is then committed to the pom.xml file.
Finally, we can avoid those commits and hard-coded versions in pom.xml files.

Install/Deploy

In the Maven Ci Friendly Versions guidelines it is mentioned that the flatten maven plugin is necessary if you want to deploy or install your artifacts. Without this plugin the artifacts generated by this project cannot be used by other Maven projects.

This is true. But, the problem is that the flatten Maven plugin coupled with the “resolveCiFriendliesOnly” option does not work as expected due to bugs. Maven’s flatten plugin is a somewhat over-engineered, overly complex plugin that did not fit our needs. As adherers of the Unix philosophy, we decided to create our lightweight custom plugin, the ci friendly flatten maven plugin that replaces only the ${revision}, ${sha}, and ${changelist} properties.

The final pom.xml

</project>
  <project>
   <groupId>com.outbrain.example</groupId>
   <artifactId>ci-friendly-parent</artifactId>
   <name>CI Friendly Parent</name>
   <version>${revision}</version>

   <properties>
    <revision>1.0.0-SNAPSHOT</revision>
   </properties>

<modules>
...
</modules>
<build>
   <plugins>
    <plugin>
      <groupId>com.outbrain.swinfra</groupId>
      <artifactId>ci-friendly-flatten-maven-plugin</artifactId>
      <version>FIND_HERE</version>
      <executions>
       <execution>
          <goals>
            <goal>clean</goal>
            <goal>flatten</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
   </plugins>
 </build>
</project>

Building the bridge to Maven Ci Friendly Versions

We decided to transition to Maven Ci Friendly Versions. As I mentioned before, due to the bugs in the flatten plugin, we first needed to develop our custom flatten plugin. In our case, we used TeamCity to release our libraries, fetching the release version from git and tagging it (but it is worth noting that this process is suited to other build systems as well).

So, what do the release steps look like?

Fetch the latest git tag, increment it, and write the result to revision.txt file. This is the version we are going to release.

mvn ci-friendly-flatten:version

2. Set a system.version TeamCity parameter for our soon-to-be-released version, needed in order to use this version in the steps that follow.

#!/bin/bash -x
VER_PATH=”%teamcity.build.checkoutDir%/revision.txt”
REV=`cat $VER_PATH`
set +x
echo “##teamcity[setParameter name=’system.version’ value=’$REV’]”

3. Deploy the jars with the new version.

mvn clean deploy -Drevision=%system.version%

4. Tag the current commit with the updated version and push the tag.

mvn ci-friendly-flatten:scmTag -Drevision=%system.version%

The Maven-Release Plugin has two commits and, thus, triggers Clean/Compile/Test multiple times. In contrast, the new release process relying on our custom flatten plugin has zero commits, effectively eliminating build failures caused by race conditions in committed code during the build.

Moreover, in this release process, the goals Clean/Compile/Test are executed only once.

Overall, our new approach slashed release time by 50%, from 6+ min to 3+ min.

As a last note, we’ve employed this approach in other projects as well. So, while the time savings on this project was 3 minutes, in another project that was originally 22 minutes, the custom flatten plugin cut 11 minutes off the build.

So, take it from us. There is no need to get bogged down by Maven flatten plugin. Save yourself the headache. All you need is to switch to the Maven Ci Friendly Versions with our custom Ci Friendly Flatten Maven Plugin.

November 11, 2019

Avi Youkhananov

Oh my Guava! We are moving to Caffeine.

Caching is extremely important! It provides fast response time, enabling effortless performance improvements in certain use cases.
At Outbrain, we have recently moved to Caffeine caching, after having used Guava in-memory caching for many years.

Background

Caffeine library is a rewrite of Guava’s cache that uses a Guava-inspired API that returns CompletableFutures, allowing asynchronous automatic loading of entries into a cache. The library was written by Ben Manes who is the author of ConcurrentLinkedHashMap on which Guava cache is based.

Guava OUT

Guava blocks during loading when a key is not present in the cache.
We wanted to change the API to work asynchronously, but the added complexity made the code difficult to understand and troubleshoot.
To make Guava non-blocking, we overrode the load and loadAll methods of CacheLoader to make the API return a future.
Since we changed the API to return a future, we encountered a problem. Guava is unaware that we use it to store futures, so it stores futures completed with an exception as well. To prevent the exceptions from being stored in the cache, we needed to add more complexity to our code.

Caffeine IN

Caffeine is a rewrite of Guava’s cache that uses an API that returns CompletableFutures out of the box, allowing asynchronous automatic loading of entries into a cache.
Caffeine removes futures that complete with an exception from the cache.
Caffeine uses both a Least Recently Used (LRU) eviction policy and a frequency-based admission policy relying on CountMin sketch. It has a better hit rate than LRU for many workloads.

Guava’s hit-rate benchmark vs Caffein’s

We analyzed service behavior with real traffic under different cache configurations to get an idea of how production services will behave.

The benchmarked cache contains image URLs keyed by UUID and follow an access pattern of most-frequent / least-frequent data, which is our most common use case at Outbrain.

In the benchmarks below, we did not measure and therefore have no interpretation of memory usage. But analyzing hit rates leads to some interesting insights.

1. Cache size: 10k items
Expiration after write: 5 min

Hit rate with 10k items:

Caffeine 28.33 %
Guava 20.95%

2. Cache size: 50k items
Expiration after write: 20 min

Hit rate with 50k items:

Caffeine 56.04 %
Guava 50.01%

3. Cache size: 100k items
Expiration after write: 20 min

Hit rate with 100k items:

Caffeine 70.10 %
Guava 66.77%

4. Cache size: 300k items
Expiration after write: 20 min

Hit rate with 300k items:

Caffeine 87.19 %
Guava 84.85%

Conclusion

We did it! With minor changes to infrastructure code (as Caffeine and Guava API are almost identical), we improved our hit rate and reduced code complexity for critical services in our system.

Honestly, Caffeine smells better than Guava.

November 25, 2018

Avi Youkhananov

CodinGame Story One – The key for creativity and happiness in developers life

Photo by Juan Gomez on Unsplash

“Keep a developer learning and they’ll be happy working in a windowless basement eating stale food pushed through a slot in the door. And they’ll never ask for a raise.” — Rob Walling (https://robwalling.com/2006/10/31/nine-things-developers-want-more-than-money/)

The past decade has produced substantial research verifying what may come as no surprise: developers want to have fun. While we also need our salaries, salaries alone will not incentivize us developers who, in most cases, entered a field to do what we love: engage in problem-solving. We like competition. We like winning. We like getting prizes for winning. To be productive, we need job satisfaction. And job satisfaction can be achieved only if we get to have fun using the skills we were hired to use.

We wanted to keep the backend developers challenged and entertained.
That’s why Guy Kobrinsky and I created our own version of Haggling, whose basic idea we adapted from Hola, a negotiation game.

The Negotiation Game:

Haggling consists of rounds of negotiations between pairs of players. Each pair’s goal is to maximize score in the following manner:

Let’s say there are a sunglasses, two tickets, and three cups on the table. Both players have to agree on how to split these objects between them. To one, the sunglasses may be worth $4, a ball $2, and the tickets are worthless. The opponent might value the same objects differently; while the total worth of all the objects is the same for both players, their valuation kept secret

Both players take turns making offers to each other about how to split the goods. A proposed split must distribute all objects between partners such that no items are left on the table. On each turn, one can either accept an offer or make a counter-offer. If after 9 offers an agreement is reached, every player receives the amount that its portion of the goods is worth, according to the assigned values. If there is still no agreement after the last turn, both players receive no points.

The Object of the Game:

Write code to obtain a collection of items with the highest value by negotiating items with an opponent player.

User Experience:

We wanted it to be as easy as possible for players to submit, play and test their code.
Therefore, we decided to keep player code simple – not relying on any third-party libraries.
To do this, we built a simple web application for testing and submitting code, supplying a placeholder with the method “accept” – the code that needs to be implemented by the different participants. The “accept” method describes a single iteration within the negotiation, in which each player must decide if they will accept the offer given to them (by returning null or the received offer) – or return a counter offer.

To assist in verifying the players’ strategy, we added a testing feature allowing players to run their code vs some random player. Developers were able to play around with it, re-implementing the code before actual submission.

Java Code Example:

[gist id= 8e870dad5baeec79cbda4be5f56617f6 file=HagglingCode.java]

Test Your Code and Submit Online:

Tournament And Scoreboard:

Practice tournaments ran continuously for two weeks, taking all submitted players into account and allowing developers to see their rank. During this time, competitors were able edit their code. So there was plenty of time to learn and improve.

We also provided analytics for every player. Developers were able to analyze and improve their strategy.

At the end of the two weeks, we declared a code freeze and the real tournament took place. Players’ final score was determined only from the results of the real tournament, not the practice tournaments.

Game Execution And Score:

We executed the game tournament using multiple agents – each of the agents was reported to Kibana:

The Back-Stage:

Where did we store players’ code?
We decided to store all players’ code in S3 of AWS to avoid revealing the code to other players.

What languages were supported?

We started with Java only, but players expressed interest in using Scala and Kotlin as well. So we gave these developers free rein to add support for those languages, which we then reviewed before integrating into the base code. Ultimately, developers were able to play in all three languages.

What was the scale of Haggling?

In the final tournament, 91 players competed in 164 million rounds in which 1.14 billion “accepts” were called. The tournament was executed on 45 servers, having 360 cores and using 225G of memory.

The greatest advantage of our approach was our decision to use Kubernetes, enabling us to add more nodes, as well as tune their cores and memory requirements. Needless to say, it was no problem to get rid of all these machines when the game period ended.

How did the tournament progress?

The tournament was tense, and we saw a lot of interaction with the game over the two weeks.
The player in the winning position changed every day, and the final winner was not apparent until very near the end (and even then we were surprised!).
We saw a variety of single-player strategies with sophisticated calculations and different approaches to gameplay.
Moreover, in contrast to the original game, we allowed gangs: groups of players belonging to a single team that can “help” each other to win.

So how do you win at haggling?

The winning strategy was collaborative – the winning team created two types of players: the “Overlord” which played to win, and several “Minions” whose job was to give points to the Overlord while blocking other players. The Overlord and Minions recognized each other using a triple handshake protocol, based on mathematical calculations of the game parameters. Beyond this, the team employed a human psychological strategy – hiding the strength of the Overlord by ensuring that for the majority of the development period the Overlord went no higher than third place. They populated the game with “sleeper cells” – players with basic strategies ready to turn into minions at the right moment. The upheaval occurred in the final hour of the game when all sleepers were converted to minions.

The graph shows the number of commits in the last hour before the code freeze:

Hats Off to the Hacker: who got the better of us?

During the two weeks, we noticed multiple hacking attempts. The hacker’s intent was not to crash the game, but rather to prove that it is possible and make a lesson of it.
Although it was not our initial intent, we decided to make hacking part of the challenge and to reward the hacker for demonstrated skills and creativity.

On the morning of November 7th, we arrived at the office and were faced with the following graph of the outcomes:

The game had been hacked! As can be seen in the graph, one player was achieving an impossible success rate. What we discovered was the following: the read-only hash map that we provided as method argument to players was written in Kotlin; but, when players converted the map to play in either Java or Scala, the resulting conversion rendered a mutable hash map, and this is how one of the players was able to modify the hash map. We had failed to validate the preferences, ensuring that the hashmap values that players turned in used the same values as the original.

In conclusion, This is exactly the sort of sandbox experience, however, that makes us better, safer, and smarter developers. We embraced the challenge.

Want to play with us? Join Outbrain and challenge yourself.

November 8, 2017

Avi Youkhananov

Keep bugs out of production

Production bugs are painful and can severely impact a dev team’s velocity. My team at Outbrain has succeeded in implementing a work process that enables us to send new features to production free of bugs, a process that incorporates automated functions with team discipline.

Why should I even care?

Bugs happen all the time – and they will be found locally or in production. But the main difference between preventing and finding the bug in a pre-production environment is the cost: according to IBM’s research, fixing a bug in production can cost X5 times more than discovering it in pre-production environments (during the design, local development, or test phase).

Let’s describe one of the scenarios happen once a bug reaches production:

A customer finds the bug and alerts customer service.
The bug is logged by the production team.
The developer gets the description of the bug, opens the spec, and spends time reading it over.
The developer then will spend time recreating the bug.
The developer must then reacquaint him/herself with the code to debug it.
Next, the fix must undergo tests.
The fix is then built and deployed in other environments.
Finally, the fix goes through QA testing (requiring QA resources).

How to stop bugs from reaching production

To catch and fix bugs at the most time-and-cost efficient stage, we follow these steps, adhering to the several core principles:

How to stop bugs from reaching production

Stage 1 – Local Environment and CI

Step 1: Design well. Keep it simple.

Create the design before coding: try to divide difficult problems into smaller parts/steps/modules that you can tackle one by one, thinking of objects with well-defined responsibilities. Share the plans with your teammates at design-review meetings. Good design is a key to reducing bugs and improving code quality.

Step 2: Start Coding

The code should be readable and simple. Design and development principles are your best friends. Use SOLID, DRY, YAGNI, KISS and Polymorphism to implement your code.
Unit tests are part of the development process. We use them to test individual code units and ensure that the unit is logically correct.
Unit tests are written and executed by developers. Most of the time we use JUnit as our testing framework.

Step 3: Use code analysis tools

To help ensure and maintain the quality of our code, we use several automated code-analysis tools:
FindBugs – A static code analysis tool that detects possible bugs in Java programs, helping us to improve the correctness of our code.
Checkstyle – Checkstyle is a development tool to help programmers write Java code that adheres to a coding standard. It automates the process of checking Java code.

Step 4: Perform code reviews

We all know that code reviews are important. There are many best practices online (see 7 Ways to Up-Level Your Code Review Skills, Best Practices for Peer Code Review, and Effective Code Reviews), so let’s focus on the tools we use. All of our code commits are populated to ReviewBoard, and developers can review the committed code, see at any point in time the latest developments, and share input.
For the more crucial teams, we have a build that makes sure every commit has passed a code review – in the case that a review has not be done, the build will alert the team that there was an unreviewed change.
Regardless of whether you are performing a post-commit, a pull request, or a pre-commit review, you should always aim to check and review what’s being inserted into your codebase.

Step 5: CI

This is where all code is being integrated. We use TeamCity to enforce our code standards and correctness by running unit tests, FindBugs validations Checkstyle rules and other types of policies.

Stage 2 – Testing Environment

Step 1: Run integration tests

Check if the system as a whole work. Integration testing is also done by developers, but rather than testing individual components, it aims to test across components. A system consists of many separate components like code, database, web servers, etc.
Integration tests are able to spot issues like wiring of components, network access, database issues, etc. We use Jenkins and TeamCity to run CI tests.

Step 2: Run functional tests

Check that each feature is implemented correctly by comparing the results for a given input with the specification. Typically, this is not done at the development level.
Test cases are written based on the specification, and the actual results are compared with the expected results. We run functional tests using Selenium and Protractor for UI testing and Junit for API testing.

Stage 3 – Staging Environment

This environment is often referred to as a pre-production sandbox, a system testing area, or simply a staging area. Its purpose is to provide an environment that simulates your actual production environment as closely as possible so you can test your application in conjunction with other applications.
Move a small percentage of real production requests to the staging environment where QA tests the features.

Stage 4 – Production Environment

Step 1: Deploy gradually

Deployment is a process that delivers our code into production machines. If some errors occurred during deployment, our Continuous Delivery system will pause the deployment, preventing the problematic version to reach all the machines, and allow us to roll back quickly.

Step 2: Incorporate feature flags

All our new components are released with feature flags, which basically serve to control the full lifecycle of our features. Feature flags allow us to manage components and compartmentalize risk.

Step 3: Release gradually

There are two ways to make our release gradual:

We test new features on a small set of users before releasing to everyone.
Open the feature initially to, say, 10% of our customers, then 30%, then 50%, and then 100%.

Both methods allow us to monitor and track problematic scenarios in our systems.

Step 4: Monitor and Alerts

We use the ELK stack consisting of Elasticsearch, Logstash, and Kibana to manage our logs and events data.
For Time Series Data we use Prometheus as the metric storage and alerting engine.
Each developer can set up his own metrics and build grafana dashboards.
Setting the alerts is also part of the developer’s work and it is his responsibility to tune the threshold for triggering the PagerDuty alert.
PagerDuty is an automated call, texting, and email service, which escalates notifications between responsible parties to ensure the issues are addressed by the right people at the right time.

stop bugs
All in All,
Don’t let the bugs fly out of control.

The Maven CI-Friendly Setup

Here is a child pom.xml:

And this is the parent pom.xml:

Install/Deploy

Background

Guava OUT

Caffeine IN

Guava’s hit-rate benchmark vs Caffein’s

Conclusion

The Negotiation Game:

The Object of the Game:

User Experience:

Java Code Example:

Test Your Code and Submit Online:

Tournament And Scoreboard:

Game Execution And Score:

The Back-Stage:

How did the tournament progress?

So how do you win at haggling?

Hats Off to the Hacker: who got the better of us?

Why should I even care?

How to stop bugs from reaching production

Stage 1 – Local Environment and CI

Step 1: Design well. Keep it simple.

Step 2: Start Coding

Step 3: Use code analysis tools

Step 4: Perform code reviews

Step 5: CI

Stage 2 – Testing Environment

Step 1: Run integration tests

Step 2: Run functional tests

Stage 3 – Staging Environment

Stage 4 – Production Environment

Step 1: Deploy gradually

Step 2: Incorporate feature flags

Step 3: Release gradually

Step 4: Monitor and Alerts

Search

עברית

Categories

Archive

RSS