Let’s start by assuming that ANSI_NULLS are ON. If you aren’t sure what ANSI_NULLS are exactly, don’t worry, I’ll be going over that in some detail in a future post. However Microsoft tells us that ANSI_NULLS will always be ON in the near future. So we are not going to worry about that here.
So what does it mean to say that a value is NULL? Basically it means that the value is unknown. As an example, consider a binary variable. Its value can either be 1 or 0; but it can also be unknown (i.e. NULL). Pretty simple right? Well… on the surface maybe, but when you start thinking through the implications, it gets more and more complicated. For example:
DECLARE @Bin Binary DECLARE @Bin2 Binary SET @Bin = 1 SET @Bin2 = NULL
Here are the possible comparisons
- (@Bin = 1) returns True
- (@Bin = 0) returns False
- (@Bin = NULL) returns NULL
- (@Bin2 = 1) returns NULL
- (@Bin2 = 0) returns NULL
- (@Bin2 = NULL) returns NULL
Any value when compared to a NULL is NULL. Why? Well it’s kind of like Schrödinger’s Cat. Until a value is placed in it you don’t know what the value is. In fact, I like to mentally replace NULL with “unknown”:
- (@Bin = unknown) returns unknown
- (@Bin2 = 1) returns unknown
- (@Bin2 = 0) returns unknown
- (@Bin2 = unknown) returns unknown
Obviously, if a value is unknown, the result of any comparison is unknown. In other words NULL = NULL returns NULL.
CREATE TABLE #NullTest (MyVal binary) INSERT INTO #NullTest VALUES (1),(1),(0),(0),(NULL) SELECT COUNT(1) FROM #NullTest WHERE MyVal = 1 -- Returns 2 SELECT COUNT(1) FROM #NullTest WHERE MyVal = 0 -- Returns 2 SELECT COUNT(1) FROM #NullTest WHERE MyVal = NULL -- Returns 0
So we get two rows where MyVal = 1, two rows where MyVal = 0 and no rows where MyVal = NULL for a grand total of four. But wait, we inserted five rows!! So as we go farther down the rabbit hole we start to realize that aggregate queries can be heavily affected by NULLs. The implications go farther and farther down the rabbit hole as you think about it. So how do we work around this? Well there are a couple of options.
ISNULL(MyVal,0) will return a 0 if MyVal is a NULL and return the value of MyVal otherwise. This leads to queries that look like this:
SELECT COUNT(1) FROM #NullTest WHERE ISNULL(MyVal,1) = 1
I don’t recommend this for several reasons but the fact that it isn’t SARGable is probably sufficient.
Next we have the option “IS NULL”. Which is used like this
SELECT COUNT(1) FROM #NullTest WHERE MyVal = 1 OR MyVal IS NULL
This is generally the method I would use although you do have to watch out with your OR operator. I see a lot of logical mistakes when people use OR carelessly. Here is a simple example:
CREATE TABLE #NullTest (MyVal binary, DateVal datetime) INSERT INTO #NullTest VALUES (1, '1/1/1900'), (1, '1/1/1901'), (0, NULL), (0, '1/1/2001'), (NULL, '1/1/2000') -- What you meant SELECT * FROM #NullTest WHERE (MyVal = 1 OR MyVal IS NULL) AND (DateVal > '1/1/1950' OR DateVal IS NULL) -- What you entered SELECT * FROM #NullTest WHERE MyVal = 1 OR MyVal IS NULL AND DateVal > '1/1/1950' OR DateVal IS NULL -- How the computer saw it. SELECT * FROM #NullTest WHERE MyVal = 1 OR (MyVal IS NULL AND DateVal > '1/1/1950') OR DateVal IS NULL
I’ll let you run it yourself if you want to see the different results, but just looking at the code you should be able to tell how easy it is to mess yourself up.
There are other methods of course, and I’m sure more will show up over time. The important thing to remember is that if you are going to allow NULLs in your columns then you need to understand what a NULL is and plan for it accordingly. If you don’t Thomas LaRock is liable to come after you (not as bad as Grant Fritchey coming after you if you aren’t taking your backups but still.)