Sharing Data Science Knowledge and Experience

Two years ago Outbrain and Zemanta joined forces. This, among many other great things, also resulted in big bi-directional knowledge exchange between Zemanta’s Data Science and Outbrain’s Recommendations teams. Inevitably, a lot of important progress has been made in both our algorithms and understanding. And of course, we still haven’t exhausted the huge backlog of new ideas which are sprouting from our discussions.

At Outbrain and Zemanta we know how important internal sharing of knowledge and experience is – but we also believe it is crucial to share knowledge with the wider community. And in our aspirations to do so, among other projects, we also started Zemanta’s Data Science Summer School.

The second annual Data Science Summer School

In July we hosted the second annual Data Science Summer School in our Zemanta’s office in Ljubljana, Slovenia. Among many applicants, we selected a group of very perspective young professionals and/or students and invited them to join us for a week of data science-flavored activities where they learned how we apply data science and machine learning in this data-rich industry.

The structure of the summer school

The week-long curriculum was set to be very practical and hands-on, but also have theoretical lectures intertwined. Participants first learned about the tools and techniques we are using in our day-to-day as data scientists in the industry. They learned how to use tools like git for version control, correctly set up python environments, use some python libraries like numpy, pandas for crunching data, matplotlib for visualization and scikit-learn to build some basic predictors.

Then, after setting up their environments, they got their feet wet by participating in a Kaggle challenge. Some participants had already participated in Kaggle challenges before so they shared their experiences and know-how, and for some, it was their first time so they tried to sponge up as much information as they could.

Finally, we provided them with a massive real dataset extracted from production, on which they had a chance to build their own predictors for estimating probabilities of clicks (CTR). After careful examination and analysis of 50+ provided features, they had an opportunity to use a tool of their choice to make predictions – some explored scikit-learn in more detail, while others chose various libraries like XGBoost for gradient boosted trees, XLearn for factorization machines or TensorFlow for neural networks. Finally, all teams presented their work and shared the gained knowledge.

Mixed in between hands-on experimentation they participated in many interesting talks and discussions on topics ranging from how programmatic advertising works, what is real-time bidding, theory powering auctions and what kind of algorithms and systems we are developing at Zemanta; all the way to data analysis, deploying machine learning models to production and some of our real-life scenarios and stories.

What the participants had to say

After successfully completing the week-long curriculum participants received their certificates and filled anonymous feedback forms saying things like “Great way to spend a week – the atmosphere was excellent!”, “The talks were especially interesting since they give a nice insight into the company”, “Working on real data gave me the opportunity to experience first-hand the problems data scientists are working on” – so we can say with great certainty the participants learned a lot and had tons of fun doing it.

Conclusion

This was the second iteration of Zemanta’s Data Science Summer School in our Ljubljana office in Slovenia. Mentors Robert, Luka, and Anže and I had a great time sharing knowledge with the students, who gained important insights into the processes behind applying data science and machine learning to solve real problems in the industry, so we are very excited to host more such events in the future.

Davorin Kopic
Head of Data Science @ Zemanta, an Outbrain Company

Leave a Reply

Your email address will not be published. Required fields are marked *