• Hugo Kornelis (12/15/2010)


    mtillman-921105 (12/15/2010)


    The opposite could also be asked... Why do you have missing data?

    Because incomplete data is a fact of life.

    Companies that only do business when all data their database has a column for is known, will soon not do business anymore. Many fields are mandatory, but not all are.

    If a company collects data about household situation and income of their customers, for statistical analysis and maybe some data mining and targeted advertising, and I leave the fields for number of children and monthly household income blank, would you really enter both as 0? I do have children. And I do have income. I just refuse to share that information with companies that have no need for this information, and no legal basis for asking me about it.

    Fair enough points and I agree.

    I also have a story about how I thought that NULLs would help, but in the end didn't matter. This is oversimplifying, but we had multiple managers responsible for multiple totals. I was resposible for loading the SQL tables and made sure that NULLs were entered if any of the manager's totals were missing. (I was proud of my error-catching design.) Then one day one or two of the managers couldn't provide their information. Upper management's solution? Ignore them and proceed anyway. So the code ended up changing the NULLs to zeros in the end.

    Now I know that this doesn't really prove anything in a way since at times NULLs are helpful. But it does show that at times, NULLs are nothing but an obstacle.

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking