Category: Algorithms

CodinGame Story One – The key for creativity and happiness in developers life

Photo by Juan Gomez on Unsplash

“Keep a developer learning and they’ll be happy working in a windowless basement eating stale food pushed through a slot in the door. And they’ll never ask for a raise.” — Rob Walling (https://robwalling.com/2006/10/31/nine-things-developers-want-more-than-money/)

The past decade has produced substantial research verifying what may come as no surprise: developers want to have fun. While we also need our salaries, salaries alone will not incentivize us developers who, in most cases, entered a field to do what we love: engage in problem-solving. We like competition. We like winning. We like getting prizes for winning. To be productive, we need job satisfaction. And job satisfaction can be achieved only if we get to have fun using the skills we were hired to use.

 

We wanted to keep the backend developers challenged and entertained.
That’s why Guy Kobrinsky and I created our own version of Haggling, whose basic idea we adapted from Hola, a negotiation game.

The Negotiation Game:

Haggling consists of rounds of negotiations between pairs of players. Each pair’s goal is to maximize score in the following manner:

Let’s say there are a sunglasses, two tickets, and three cups on the table. Both players have to agree on how to split these objects between them. To one, the sunglasses may be worth $4, a ball $2, and the tickets are worthless. The opponent might value the same objects differently; while the total worth of all the objects is the same for both players, their valuation kept secret

Both players take turns making offers to each other about how to split the goods. A proposed split must distribute all objects between partners such that no items are left on the table. On each turn, one can either accept an offer or make a counter-offer. If after 9 offers an agreement is reached, every player receives the amount that its portion of the goods is worth, according to the assigned values. If there is still no agreement after the last turn, both players receive no points.

The Object of the Game:

Write code to obtain a collection of items with the highest value by negotiating items with an opponent player.

User Experience:

We wanted it to be as easy as possible for players to submit, play and test their code.
Therefore, we decided to keep player code simple – not relying on any third-party libraries.
To do this, we built a simple web application for testing and submitting code, supplying a placeholder with the method “accept” – the code that needs to be implemented by the different participants. The “accept” method describes a single iteration within the negotiation, in which each player must decide if they will accept the offer given to them (by returning null or the received offer) – or return a counter offer.

 

To assist in verifying the players’ strategy, we added a testing feature allowing players to run their code vs some random player.  Developers were able to play around with it, re-implementing the code before actual submission.

 

Java Code Example:

 

Test Your Code and Submit Online:

 

Tournament And Scoreboard:

Practice tournaments ran continuously for two weeks, taking all submitted players into account and allowing developers to see their rank. During this time, competitors were able edit their code. So there was plenty of time to learn and improve.

We also provided analytics for every player. Developers were able to analyze and improve their strategy.

 

At the end of the two weeks, we declared a code freeze and the real tournament took place. Players’ final score was determined only from the results of the real tournament, not the practice tournaments.

 

Game Execution And Score:

We executed the game tournament using multiple agents – each of the agents was reported to Kibana:

     

The Back-Stage:

Where did we store players’ code?
We decided to store all players’ code in S3 of AWS to avoid revealing the code to other players.

What languages were supported?

We started with Java only, but players expressed interest in using Scala and Kotlin as well. So we gave these developers free rein to add support for those languages, which we then reviewed before integrating into the base code. Ultimately, developers were able to play in all three languages.  

What was the scale of Haggling?

In the final tournament, 91 players competed in 164 million rounds in which 1.14 billion “accepts” were called. The tournament was executed on 45 servers, having 360 cores and using 225G of memory.

The greatest advantage of our approach was our decision to use Kubernetes, enabling us to add more nodes, as well as tune their cores and memory requirements. Needless to say, it was no problem to get rid of all these machines when the game period ended.

                                

How did the tournament progress?
The tournament was tense, and we saw a lot of interaction with the game over the two weeks.
The player in the winning position changed every day, and the final winner was
not apparent until very near the end (and even then we were surprised!).
We saw a variety of single-player strategies with sophisticated calculations and different approaches to game play.
Moreover, in contrast to the original game, we allowed gangs: groups of players belonging to a single team that can “help” each other to win.


So how do you win at haggling?

The winning strategy was collaborative – the winning team created two types of players: the “Overlord” which played to win, and several “Minions” whose job was to give points to the Overlord while blocking other players.  The Overlord and Minions recognized each other using a triple handshake protocol, based on mathematical calculations of the game parameters.  Beyond this, the team employed a human psychological strategy – hiding the strength of the Overlord by ensuring that for the majority of the development period the Overlord went no higher than third place.  They populated the game with “sleeper cells” – players with basic strategies ready to turn into minions at the right moment.  The upheaval occurred in the final hour of the game, when all sleepers were converted to minions.

The graph shows the number of commits in the last hour before code freeze:

 

Hats Off to the Hacker: who got the better of us?

During the two weeks, we noticed multiple hacking attempts. The hacker’s intent was not to crash the game, but rather to prove that it is possible and make a lesson of it.
Although it was not our initial intent, we decided to make hacking part of the challenge, and to reward the hacker for demonstrated skills and creativity.

On the morning of November 7th, we arrived at the office and were faced with the following graph of the outcomes:


The game had been hacked! As can be seen in the graph, one player was achieving an impossible success rate. What we discovered was the following: the read-only hash map that we provided as method argument to players was written in Kotlin; but, when players converted the map to play in either Java or Scala, the resulting conversion rendered a mutable hashmap, and this is how one of the players was able to modify the hash map. We had failed to validate the preferences, ensuring that the hash-map values that players turned in used the same values as the original.


In conclusion, This is exactly the sort of sandbox experience, however, that makes us better, safer, and smarter developers. We embraced the challenge.


Want to play with us? Join Outbrain and challenge yourself.

 

Taking the pain out of Data Science – part 3

This is the third and last post on our machine learning framework.
Post #1 covers the challenges we face and gives an overview of our solution.
Post #2 focuses on how we handle our data and make it more accessible.
This part will focus on what we do once our dataset is ready and organized – a framework for building new models, and for deploying them to production.

Modeling challenges and boilerplate

Building a model contains many common parts.

First is handling the input. Even after we sorted out the data with our Data Collection process, we still need to read it and split correctly to train-test, and read data for our simulation process.

Another common part is evaluating the test metrics.
Running the model on the test data, and displaying different test metrics, such as MSE, AUC and other metrics, to see how well the model performs.

Third, is checking the business metrics.
Before trying a model in production, we want to simulate how the model behaves regarding the business KPI’s.
We evaluate a number of metrics that serve as good proxies to the business performance.

Our goal in this framework is to make the life of the data scientists easier – letting them focus on the models rather than writing time consuming, boilerplate code.

Taking the pain out of Data Science – part 3

Model Framework

We wanted to create a framework that includes these parts out-of-the-box.
Runs the fitting process, saves the model, tests the performance and runs simulation.

Model Framework

All the data scientists should focus on, is their model’s logic.
They can use any Spark ML packages, open source implementations or their own in-house implementations; the rest they get “for free” from the framework.

The interface they need to implement is simple:

  • Preparing the dataset – extracting new features, transforming the data.
  • Fitting the model on the data – the actual logic of the algorithm.
  • Returning the column names (features) that are required for the model to operate
  • Saving a representation of the model for later use

interface

Models productization

The final part of the framework is to bridge between research and production.
We want this transition to be as simple and fast as possible, to allow us to reach conclusions quickly and keep improving.

First, we want to allow fast and easy A/B testing of new models.

A quick reminder of how A/B tests work: we split the population into 2 independent groups of similar users.
We serve one group with the treatment – the new model and serve the other group with the control – our production baseline that the system currently uses.
After running for a while, we analyze the data, evaluate engagement & monetization metrics using statistical tests and conclude whether the treatment has managed to provide significant improvement.

Models productization

To support this, we added a step at the end of the framework.
The step reads the model’s coefficients, and updates the A/B test configurations with a variant that will serve a small portion of our users with the new model.

In a similar fashion, once we have a model with proven value – we want to tune it on a regular basis so it will keep learning based on new data we collect.

  • We run the whole modeling flow on a regular basis, triggered by our ETL engine.
    A new model is created with updated variables.
  • Each run, we validate the business metrics, to make sure the model keeps performing well and doesn’t deteriorate.
  • Finally, if the metrics were positive, we update the production configurations.

High level design

Here is a quick high level overview of this part of the system:

Our input data is stored on 2 hive tables, after being prepared by the Data Collection process.
The model is created using the model generator, that initializes the implementation based on the job’s configuration.
The framework then runs all the common parts:

  • Reads the data and splits it into test, train and simulation
  • Calls the model’s fit implementation and saves the result.
  • Uses the model for test data predictions and stores the results for analysis.
  • Runs the model on the simulation data and calculates the business KPI simulation metrics.
  • Saves the model for later use on HDFS.

All the results are stored as on Cassandra.

The last part is the productization step:
It gets the updated model variables from the fit output, validates the simulation metrics to verify the model’s performance, and updates the production configurations on MySQL.

All the results are stored as on Cassandra.
Takeaways

To sum up, here are the key lessons we learned that evolved to this framework:

  • Prepare your data well – to enable high scale modeling, this is crucial!
    I cannot over-emphasize how important this is, in order to avoid drowning in data and spending a lot of the research time with endless queries.
  • Build an effective research cycle – invest the time to build a good big data machine learning framework. It will really pay off in the long run and keep your data scientists productive and happier.
  • Connect research and production – research results are worthless if it takes forever to apply them in your product. Shorter cycles will enable you to try out more models and implementations and keep improving.
    Aim to make this as quick and easy as possible.

Taking the pain out of Data Science – part 2

This is post #2 in series of 3 posts covering our machine learning framework. We recommend to read post #1 first to understand the challenges we face and get an overview of our solution.
This part will focus on how we handle our data and make it more accessible – using an on-going data collection process.

Data Collection

The first part of any data science task, is getting a good dataset to work with. We have a lot of data, but preparing the datasets can be a very hard work – you really have to “get your hands dirty” to get the data from all the sources and tables and convert them into an easy to use dataset.

Data Collection

Challenges:

What are the main challenges in creating a good dataset to use?

  • Many output tables – tables that store requests for recommendations, served recommendations, user clicks, profiles for the users and documents and more.
  • Number of data stores – these tables are stored on a number of sources, due to their different nature. Some are Hive tables, some data is stored on MySQL, and on Cassandra as well.
  • Long queries – some of these tables are very big. Querying them, especially for a long date range, can take a while.
  • Irrelevant data – we rarely want data from our entire traffic. Usually we only want some portion of it which is relevant for the current modeling task.

Silos and partitioning:

In addition to these challenges, there are other advantages to a good data collection process.

We want to have the ability to train models on different silos – different population groups, that may behave differently and require different models.

To enable achieving this easily, we add a number of columns and partitions to our output aggregation tables – such as platform, country, language and more.

This allows us to quickly try out our models on different groups.

Output:

We decided to split our output into 2 main parts:

First, a dataset for building models: It will contain only the served recommendations we want (from specific variants of the traffic), and it should contain all of the clicked recommendations plus a sample of the non clicks, in order to have a balanced dataset for learning.

Second, a dataset that will be used for simulation of the business metrics.

Our recommendations are served in widgets, showing a small batch of recommendations.
For this use case, we take only recommendations from widgets that received at least one click.

With this dataset, we can apply our model only on these clicked widgets, and see how well we graded the clicked recommendation compared to the other non clicked recommendations.

Output

The solution – Automatic Spark job

Our solution to solve all these challenges was to create an automatic data collection job.

The job runs hourly, triggered by our ETL engine.
An hourly Apache Spark job aggregates an hourly dataset, with the relevant data; creates the needed partitions; and creates the two outputs described above.

Using Spark was very convenient for this use case. It allowed us to create a readable flow, that queries different input sources, and holds in memory data that is common for both tables before writing the final output to Hive.

 

A quick note on how we monitor our Spark jobs:

It is somewhat of a challenge to understand how a Spark job behaves, other the basic error messages and checking the job’s output.

To make the job’s progress and status more visible, we send metrics from the job’s driver, using HTTP, to our monitoring server, which aggregates them.

This allows us to create simple to use dashboards and understand with ease what is going on.

In addition we send out textual logs to our monitoring server, that indexes them to an ElasticSearch cluster. We can then view the logs with ease using Kibana.

Below is a dashboard example for a demo job, collecting a portion of our data.

dashboard example for a demo job

Stay tuned for our 3rd and last part of this blog post series where we will cover our Spark-based machine learning framework, that allows us to be highly agile with our research tasks, and dynamic as well as robust in pushing our models to production.

Taking the pain out of Data Science – RecSys machine learning framework over Spark, Part 1

Overview

The role of a Data Scientist, or Machine Learning engineer, is becoming more and more valuable in the tech industry. It is the fast growing job in the U.S. according to a LinkedIn study, and was recently ranked as the best job in America by Glassdoor.

However, the life of a Data Scientist isn’t easy – the job requires good Math and Statistics knowledge, programming background and experience, and “hacking” skills, in order to get things done. This is especially true when handling huge amounts of data of different types.

We, at the personalization team at Outbrain, decided to try and take out the pain of data science, and make our life easier to allow us to perform effective research with immediate production effect.

In this post series I will describe an end-to-end machine learning framework we developed, over Apache Spark, in order to address the different challenges our Data Scientists and Algorithm Engineers face.

Outbrain’s recommendation system machine learning challenge:

Our goal is recommending stories the user is most likely to be interested in, given the user’s interests and the current context.
We need to rank the stories in our inventory by the probability that the user will click to read, and display the top stories to the user.
Our supervision is the past user actions – given a user and a document with a set of features, did the user click or not.

Our data and features:

Outbrain generates a lot of data.
We get over 550 million unique monthly users, generate over 275 billion recommendations monthly and have more than 35 million clicks a day.

The first part to computing quality recommendations, is representing well our key players: The users and the documents.

We extract semantic features from each document that has our widget installed, using an NLP engine. The engine extracts semantic features on some levels of granularity, from high level categories to very granular entities.
For example, on the ‘Westworld’ story below, we extract:

  • High-level categories, such as entertainment.
  • Lower level topics – such as TV or murder.
  • Entities – persons, locations or companies, that the document discusses.

We represent our readers with similar semantic features.

We create an anonymous profile for each reader, based on content they were reading.

Each profile is an aggregation of the semantic features of each document the user read before.

Predictive models:

We use a variety of models, from three main types:

Content based models –

These models assume that there is a semantic connection between the documents the user likes to read.
The model uses the user profile, to find more documents that have similar semantic features to the ones we found out the user likes.
These could be stories within the same categories, from the same sites or covering a specific person or location.

Behavioural models –

Rather than assuming the user will want to keep reading documents on similar topics, it looks for connections between user interests, to other potential subjects that go well together.
For example, it may find that users that showed interest before on retirement investing, will be interested in the future on heart disease articles.

Collaborative models-

The third type, and potentially most powerful and interesting, are collaborative models.

These models use the wisdom of the crowd in order to recommend new content, potentially without the need to semantically understand it.
The basic idea is to find out readers with similar reading patterns; Find out what they like, in addition to the current user; and recommend these items.

Algorithms in this family use algebraic dimensionality reduction methods, such as Matrix Factorization or Factorization Machines, or finding a new, latent representation of the users and items using Deep Learning neural networks.

 

The process of data modeling consists of many common tasks.
To make this process efficient and enable agile research and productization, we have developed a general framework on top of Spark, containing 5 independent components.

Framework overview:

The system’s key components:

  1. Data collection
  2. Feature engineering
  3. Model training
  4. Offline evaluation and simulation
  5. Model deployment

The same system is used for both research and analysis of new algorithms, and also for running production models and updating them on a regular basis.

Next week, in part 2 of this blog post series, we will dive into the data collection flow which is a key ingredient to machine learning flows, and see how data is made more accessible using an automated Spark process.
Part 3 will cover our modeling framework, developed on top of Spark, that allows us to be highly agile with our research tasks, and dynamic as well as robust in pushing our models to production. Stay tuned!