Million Dollar Miner

  • Million Dollar Miner

    I'm sure a few of you have heard of this, but in case you haven't, put your data mining skills to work and earn US$1 million if you can build a better prediction engine. That's right, one million United States one dollar bills if you can mine movie rating data better than the next guy (or gal).

    It's an interesting challenge, made even more so by the fact that Netflix will give you 100 million data points on which to "train" or test your system. That alone makes it an interesting challenge and I'm tempted to run the thing through SSIS just for the knowledge gained by working with that dataset.

    The crux of their issue is trying to better figure out which movies to recommend to people and keep them interested in, and perhaps upgrading, their Netflix service. The article is basically an interview with one of the guys in charge of recommendations at Netflix and he gives some interesting examples of problems with comparing recommendations.

    In other words, this ain't no simple T-SQL statement to get the best recommendations.

    I'm guessing that some comp-sci grad student or professor with tons of statistics in their training will come up with an interesting way to mine existing data, clean it, and come up with extrapolations for other customers, but you never know. It might be some simple algorithm that Janet DBA comes up with while sneaking some research in between backup and restore jobs. However, for all of you data mining people out there before you click your mouse and scoff about more important things to do, remember what you tell your kids.

    Just try.

    Steve Jones

  • Steve,

    On the subject of your editorial 'The Million Dollar Miner', I realise that Factor Analysis is old hat, but as a way of predicting trends in a mass of loosely correlated Marketing data, it is hard to beat. A long time ago, I programmed a pocket calculator to do an orthogonal Factor Analysis, so it can't be too hard a trick to perform in SQL, particularly as one uses Matrix maths. For more information see the Wikipedia entry on the subject.

    There is no truth at all in the idea that the technique was invented by a great uncle of mine.

    Best wishes,
    Phil Factor

  • This is not data mining.  It is marketing analysis. I used to work for a marketing research company.  You have to pull a sample of audiences of different ages, gender, education background, georgraphy locatioin, ethnicity.  Then you ask them what kind of movie they like, which star they like.... Then create a database of all the moves with rating, style (romantic, comedy..) and a database of all movie stars and directors.  Many people watch movies because they like particular moive star or director.  Then the statistic analyst can figure out which group of people would like what kind of movie. 

    What SQL Server can do - store all the data you collect.

     

  • I win.

    That shouldn't be too hard to figure out how to do.

    I just add up all the reviews of the popular "experts" and then pick the opposite of what they like. I am seldom disappointed.

     

  • What about recommendations for those strange artsy, Cannes-art-festival people? They like the movies the experts pick! (apologies Bob if you're one of those artsy folk).

    I'm not sure this isn't mining. Isn't this searching for series of patterns in a large dataset? Is this much different than looking for fraud in a series of credit card charges? I think it's close, but much more complex.

  • You forget I have limited exposure to those vast resources you have in the 48 contiguous states. None of that artsy junk allowed here. I just pick what Siskle and Ebert don't.

    What? One them is no more? That explains a few things.

    I also have a project before that sound more like what you describe.  It is to look at thousands of pieces of disconnected data entry to look for a certain pattern that could take one or more different forms when all combined and must also be evaluated over time with the results produced yesterday.

    Maybe I'll rent a DVD instead.

    Now, which do I choose...

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply