I still think we collectively learned the wrong lesson from the Netflix Prize contest

The collective conclusion from 2006’s Netflix Prize competition was this: You can’t beat massive model ensembles. Ten years on, I’m still convinced the larger lesson was completely overlooked: that sometimes, when modeling human behavior, there are better KPIs than RMSE.

In 2011, I began work on my own SaaS recommendation service (called Selloscope). As part of that work, I evaluated the performance of a Slope-One model (similar at the time to Netflix’s Cinematch algorithm) against a spiffied-up object-to-object collaborative filter. The latter doesn’t produce an error score, so I devised a different goalpost: If I remove random recommendations from each user’s profile, how well does the model fill in those gaps?

Continue reading “I still think we collectively learned the wrong lesson from the Netflix Prize contest”