In 2006, Netflix announced the Netflix Prize, a $1 million award for improving the accuracy of the company’s Cinematch movie-recommendation algorithm. Four years later, a team called BellKor’s Pragmatic Chaos developed an algorithm that was 10 percent more accurate than Cinematch, handed its work over to Netflix, and collected the prize.

But the company never implemented the algorithm, despite the improvement in accuracy. Why?

According to Netflix: “The additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.”

So, BellKor’s Pragmatic Chaos came up with an algorithm that works measurably better than Cinematch, but in the end, Netflix faced two hurdles:

  • Scaling up the algorithm. The winning algorithm works great on a small scale — and by small scale, we’re talking about roughly 100 million ratings to “train” the algorithm and 2.8 million user/movie pairs on which the algorithm was applied — but once you scale up to the billions, you face new challenges. For instance, one common scalability issue is speed of computation. If the time it takes to run a particular algorithm (or run it with consistent accuracy) increases with the size of the relevant data set, you eventually reach a point where the computation time becomes too slow for practical purposes.
  • Implementing the algorithm. Even once you address scalability issues, there is additional work required to actually integrate the algorithm into your existing code and then deploy it across however many servers you need.

A change in the company’s business model also doomed the prize-winning algorithm. In 2007, Netflix began streaming movies directly to subscribers’ computers and other Internet-enabled devices, and that delivery method soon began to overshadow its original DVD-by-mail operations.

With the streaming feature, Netflix is getting so many more data points than with the DVD-by-mail system. Instead of basing recommendations on a user choosing one or two DVDs per week, a user could stream 15 titles in one week, which means the company gets a lot more subscriber preference data. Netflix now knows, for instance, how much of a movie subscribers actually watch before deciding it’s not for them — a data point it lacked in the DVD-by-mail system.

One other way a recommendations engine for a streaming service will differ from a recommendations engine for a DVD-by-mail service has to do with predicting subscribers’ immediate preferences (for a streaming title) versus their estimations of their future preferences (for a title they won’t receive for a couple of days). If you’re asked whether you’re in the mood to watch a sobering documentary right now, you can say unequivocally yes or no — but if you’re asked to predict whether you’ll be in the mood to watch that same title a few days from now, your prediction may well turn out to be wrong. An algorithm that works on one type of prediction may not work on another type.

The interesting thing here is that academically speaking, the prize-winning algorithm is much better than the algorithm Netflix is still using. But it was a really good business decision on Netflix’s part not to use the new algorithm.

We can sum it up in an algorithm implementation algorithm:

v=uac

To translate, the practical viability (v) of a recommendations algorithm is equal to the value to the user (u) times the accuracy (a) of the algorithm, divided by the cost (c) of implementation. In the case of the million-dollar algorithm, the variables in the numerator decreased as the business model changed, while the denominator was high to begin with and likely increased over time.

Bottom line: We can often improve our algorithms, but that doesn’t mean it’s always a good business decision to implement those improvements.