Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQL Man of Mystery

Wes Brown is a PASS chapter leader and SQL Server MVP. He writes for SQL Server Central and maintains his blog at http://www.sqlserverio.com. Wes is Currently serving as a Senior Lead Consultant at Catapult Systems. Previous experiences include Product Manager for SQL Litespeed by Quest software and consultant to fortune 500 companies. He specializes in high availability, disaster recovery and very large database performance tuning. He is a frequent speaker at local user groups and SQLSaturdays.

NULL is not = to anything

One of the fun facts about SQL Server and the relation model is the whole concept of three valued logic. Now I’m not breaking any new ground here I am just touching on something that trips people up when they expect a result and don’t get it due to the magic of NULL’s. To be honest, I’m no exception to falling into the unknown from time to time.

Codd laid out 12 fundamental rules of what a relational database system should conform to if it is to be considered truly relational.

Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

This rule, above all others has probably caused me the most heartburn over the years.

I’ve spent more time that I like to admit reading papers and books from pioneers of the relational model, E.F. Codd, C.J. Date. One of the great debates that carried on until the passing of Codd was over NULL and three valued logic in general. To give you an idea of how big and problematic NULL’s are, It’s been put forth that maybe we need to get rid of them all together or move deeper into the rabbit hole and make it 4 valued logic breaking NULL into different kinds of place holders for unknown.

Understand that every piece of data falls into a domain of some sort. 0 is a integer type. ‘’ is a empty string. NULL isn’t in any domain, or value type. It doesn’t represent something at all it is a place older period.

I’ve heard 3 valued logic described as yes/no/maybe but that isn’t accurate, it is true/false/UNKNOWN.

So the only logical thing that can happen to UNKNOWN is unknown. What’s even worse is UNKNOWN technically can’t be equal to UNKNOWN or NULL = NULL.

How do you know they are equal if they are both unknown?

 

For example:

select 1 where NULL = NULL

returns nothing since NULL can’t be equal to anything including NULL we don’t get a NULL back or the 1 back we tried to select.

select 6227 * 453 / 238 + NULL 

returns NULL

which makes since on the surface to almost everyone I work with.

select NULL / 0

returns NULL

To some folks this is confusing in a traditional programming since anything divided by zero gives us an error of cannot divide by zero.

Since NULL is the place holder for UNKNOWN there is no way to evaluate the statement other than UNKNOWN or NULL!

This must also carry through for string manipulation as well.

For example:

select 'here' + 'is ' + NULL 

returns NULL.

Again it is the old how can you concatenate something to the unknown problem.

Now with all this in our little busy heads we finally think we understand the problem in it’s fullness, but we don’t (or I always don’t).

Where things can get really sticky is in any kind of aggregate situation SUM(), AVG(). Generally, all aggregations have a NULL elimination step built into them.

So lets say we have a table that looks like this:

Col001 Col002
100 100
200 200
300 300
NULL 0

(this isn’t an article on table design so don’t sweat the lack of a key or the duplicate data in both columns)

create table myNumbers
(
Col001 int,
Col002 int
)
go
insert into myNumbers (Col001,Col002) VALUES (100,100)
insert into myNumbers (Col001,Col002) VALUES (200,200)
insert into myNumbers (Col001,Col002) VALUES (300,300)
insert into myNumbers (Col001,Col002) VALUES (NULL,300)
select avg(Col001) from myNumbers

select avg(Col002) from myNumbers

We get:

-----------
200
Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)

 

-----------
225

(1 row(s) affected)


What happens when the evaluation of the aggregation occurs there is no way to evaluate the NULL so that row is dropped and all the sudden you numbers look horribly wrong.

If I hadn’t put this to output to text we may have merrily trucked along and accepted that 200 was what we were expecting and not the 255 we really wanted due to the treatment of the NULL.

Since it is a warning and not an error our program won’t complain ether it will be more than happy to accept the result since it is valid one.

The only exception to this in general is using COUNT() since you can count a place holder just like you would a value type it isn’t evaluating the data held just that a row exists whether we know what is is in it or not.

 

I’ve only just scratched the surface of the unknown with this little post. I haven’t covered grouping or JOIN’s and I may in a later post.

 

My goal is simply to remind myself that dealing with NULL is never a trivial matter.


-Wes

Comments

Posted by Hugo Shebbeare on 10 April 2009

thank you Sir, nice post!

Posted by sene bauman on 13 April 2009

Thanks Wes, you made me smile:) Its always nice to know you are among friends who share your pain.  I can't count the number of times I have fallen into the NULL trap myself. But I am getting better, at least with aggregates.

I would however take exception to a minor point. NULL is not "unknown". In fact, it is precisely known. It simply means that the container contains no value, regardless of the data type.  If it were unknown then the IsNull() function would never work. :)

But the debate on the issue of whether 3 state logic true/false/empty is useful will continue.  Of course, regardless of its purpose, the pain associated with NULL can be relieved if default conditions are set on the server.  For example, SET ANSI_NULLS is OFF/ON.  For aggregates, I also always use sum(isNull(blah,0)).

Posted by Wesley Brown on 14 April 2009

Hey Sene,

Glad you enjoyed it! I used UNKNOWN because that is exactly the definition Codd gives NULL.

He stated it is unknown due to the fact it can represent any value including no value and when evaluating NULL that must be taken into consideration, according to ternary logic.

What it boils down to is anything evaluated = or <> to NULL, including NULL is NULL. This is clarified in the ANSI 92 standard and implemented in SQL Server in the 7.0 time frame if I’m not mistaken.

Before that NULL could be evaluated and return a result set which is a violation of the ternary logic model.

That is why we have the SET ANSI_NULLS option in SQL Server to begin with, so if you were upgrading from say 6.5 or had a ton of older code in place that expected NULL to behave the old way you didn’t have to re-write everything just to upgrade to the latest and greatest version of SQL Server.

To correctly handle NULL evaluation the key reserved phrases are IS NULL and IS NOT NULL this allows you to evaluate NULL to NULL and duplicate = or <> operations.

I almost covered the IsNull() function but kind of ran out of steam.

IsNull() is a T-SQL specific function and simply says if this field contains the NULL placeholder then give it a specific value, it doesn't assume empty. I would point you to COALESCE() which is the ANSI standard function for handling NULL's.

Don’t work around the problem of NULL work with it to yield it’s full potential!

I feel like I'm channeling Joe Celko....

Leave a Comment

Please register or log in to leave a comment.