• Big amounts of garbage data is still garbage. Throwing more data on a flawed analysis does not make it better. With enough data you can draw all sorts of correlations which are absolutely meaningless.

    You need to understand the problem first, then understand the weaknesses and reliability issues in all your sources. When you finally do get a result you need to back test it against independent data sets to see if it still holds up.

    Unfortunately many managers (as well as some IT people) are so excited by the prospect of magical extraction that they fail to take a hard, critical look at their processes.

    ...

    -- FORTRAN manual for Xerox Computers --