The Blog of Florian Dahms


A Blog about Mathematics, Optimization, Coding and More

Decomposition of Integer Programs with Matchability Structure

A core problem with NP hard optimization problems is that the size of the problem will inevitably kill you performancewise at some point. Modern integer programming solvers can get you a long way but often there will be a relatively sharp line separating...


Using TensorFlow models from the JVM using TensorFlow Serving

When it comes to using a deep learning model in production we face challenges which are quite different from those we encounter during training. In production we have much higher requirements regarding stability and monitoring. In many setups large...


Pokémon Go and the art of handling big geospatial data.

With all the hype surrounding Pokémon Go for me the question came up: how does a company like Niantic handle data with geographic information. In fact processing spatial data is not an easy task because databases are highly optimized when it comes...


Announcing upcoming data science course

Some time ago I decided to create a data science online course for beginners. This course is now in the middle of its production and I am planning to release it about a month from now. The goal of the course is to enable people without a computer science...


What is "overfitting"?

Overfitting happens when we learn some patterns which do not exist because we have too many sources to learn from. Our entire learning process consists of recognizing patterns and deriving rules from them. Usually these rules help us to plan ahead...


Connecting Scala microservices with ZeroMQ and Protocol Buffers

When thinking about software architectures the new (while actually being quite an old concept) hot thing are microservices. The idea makes sense, cutting a large system into smaller building blocks with well defined APIs makes each component easier...


Scala and the '@transient lazy val' pattern

Given a you have a Scala object holding some data that you want to store or send around by serializing the object. It turns out that the object is also capable of performing some complex logic and it stores the results of these calculations in its...


Writing efficient Spark jobs

This article covers a multitude of levers that I discovered so far for tuning Apache Spark jobs so they use less memory and/or running time. For some time now Apache Spark has been the shooting star among big data technologies and rightfully so as...


How the Simplex method works

One or probably the most important algorithm in mathematical optimization is the Simplex method. Its origin dates back to the year 1947 when it was introduced by George Dantzig. It is the most widely used method to solve linear programming problems...


How to define a good objective function

When modeling an optimization problem one of the tasks you will face is to come up with an objective function. This will then serve as a measure for the quality of a solution and the ultimate goal of the optimization algorithm. In the classical problems...