Following in the footsteps of many wannabe geeks, I have been playing around with machine-learning algorithms for the Netflix Prize. The group that improves Netflix's recommendation algorithm (Cinematch) by 10% wins a cool million. I'm really impressed with some of these teams -- the top competitor has already beaten Cinematch by 5.77%.
My results have been pretty dismal, but not altogether unexpected. This is probably due to the fact that undergrad prob & stats is the limit of my math expertise. Also, I am learning linear algebra as I go. This is in contrast to the leading teams, who have multiple Ph.D.s working full-time on this.
I developed a correlation algorithm for item-based filtering, and it's achieving an RMSE (root mean squared error) of around 1.04. This is only slightly better than just predicting the average score for each movie, and it has a long way to go before it catches up with Cinematch (at 0.95; lower is better). I have a few tweaks in mind, but my cycle time is too high (I'm precomputing correlation tables, so every time I tweak I have something like 8 hours of table computation).
I think I give up.
Comments (1)
Heh, I wondered if this was something you'd be interested in, but never asked. Should have known you'd try it. Eh, give you a month solid and you'd be at 15% improvement ;)
Posted by John | January 10, 2007 4:18 PM
Posted on January 10, 2007 16:18