archive for February, 2008

Visualizing Netflix Data

I’ve been playing around with the Netflix Prize data recently. Instead of directly attacking the prize, I’m taking a crack at trying to better visualize the data. Here’s what I have so far:

NetflixPrize Avg. vs. Avg.

I think it’s pretty, and somewhat useful. I think I’ll try to make a predictor that can classify based on the four regions to see how well it does. I’d guess that it’ll do only slightly better than always picking one average or the other as the prediction. I’m pretty sure that most of the data is clustered near the dark area in the middle where the model is least effective.

Update: I’ve added some analysis in the Flickr comments, so be sure to check that out if you’re interested.