Blog Post

NULL is not = to anything

,

One of the fun facts about SQL Server and the relation model is the

whole concept of three valued logic. Now I’m not breaking any new

ground here I am just touching on something that trips people up when

they expect a result and don’t get it due to the magic of NULL’s. To be

honest, I’m no exception to falling into the unknown from time to time.

Codd laid out 12 fundamental rules of what a relational database system should conform to if it is to be considered truly relational.

Rule 3: Systematic treatment of null values:

The

DBMS must allow each field to remain null (or empty). Specifically, it

must support a representation of "missing information and inapplicable

information" that is systematic,

distinct from all regular values (for example, "distinct from zero or

any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

This rule, above all others has probably caused me the most heartburn over the years.

I’ve

spent more time that I like to admit reading papers and books from

pioneers of the relational model, E.F. Codd, C.J. Date. One of the

great debates that carried on until the passing of Codd was over NULL

and three valued logic in general. To give you an idea of how big and

problematic NULL’s are, It’s been put forth that maybe we need to get

rid of them all together or move deeper into the rabbit hole and make

it 4 valued logic breaking NULL into different kinds of place holders

for unknown.

Understand that every piece of data falls into a

domain of some sort. 0 is a integer type. ‘’ is a empty string. NULL

isn’t in any domain, or value type. It doesn’t represent something at

all it is a place older period.

I’ve heard 3 valued logic described as yes/no/maybe but that isn’t accurate, it is true/false/UNKNOWN.

So

the only logical thing that can happen to UNKNOWN is unknown. What’s

even worse is UNKNOWN technically can’t be equal to UNKNOWN or NULL =

NULL.

How do you know they are equal if they are both unknown?

 

For example:

select 1 where NULL = NULL

returns nothing since NULL can’t be equal to anything including NULL we don’t get a NULL back or the 1 back we tried to select.

select 6227 * 453 / 238 + NULL 

returns NULL

which makes since on the surface to almost everyone I work with.

select NULL / 0

returns NULL

To some folks this is confusing in a traditional programming since

anything divided by zero gives us an error of cannot divide by zero.

Since NULL is the place holder for UNKNOWN there is no way to evaluate the statement other than UNKNOWN or NULL!

This must also carry through for string manipulation as well.

For example:

select 'here' + 'is ' + NULL 

returns NULL.

Again it is the old how can you concatenate something to the unknown problem.

Now with all this in our little busy heads we finally think we

understand the problem in it’s fullness, but we don’t (or I always

don’t).

Where things can get really sticky is in any kind of aggregate

situation SUM(), AVG(). Generally, all aggregations have a NULL

elimination step built into them.

So lets say we have a table that looks like this:

Col001Col002
100100
200200
300300
NULL0

(this isn’t an article on table design so don’t sweat the lack of a key or the duplicate data in both columns)

create table myNumbers
(
Col001 int,
Col002 int
)
go
insert into myNumbers (Col001,Col002) VALUES (100,100)
insert into myNumbers (Col001,Col002) VALUES (200,200)
insert into myNumbers (Col001,Col002) VALUES (300,300)
insert into myNumbers (Col001,Col002) VALUES (NULL,300)
select avg(Col001) from myNumbers

select avg(Col002) from myNumbers

We get:

-----------

200

Warning: Null value is eliminated by an aggregate or other SET operation.

(1 row(s) affected)

 

-----------

225

(1 row(s) affected)

What happens when the evaluation of the aggregation occurs there is

no way to evaluate the NULL so that row is dropped and all the sudden

you numbers look horribly wrong.

If I hadn’t put this to output to text we may have merrily trucked

along and accepted that 200 was what we were expecting and not the 255

we really wanted due to the treatment of the NULL.

Since it is a warning and not an error our program won’t complain

ether it will be more than happy to accept the result since it is valid

one.

The only exception to this in general is using COUNT() since you can

count a place holder just like you would a value type it isn’t

evaluating the data held just that a row exists whether we know what is

is in it or not.

 

I’ve only just scratched the surface of the unknown with this little

post. I haven’t covered grouping or JOIN’s and I may in a later post.

 

My goal is simply to remind myself that dealing with NULL is never a trivial matter.

-Wes

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating