SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


The Logic, Mathematics, and Utility of NULLs


The Logic, Mathematics, and Utility of NULLs

Author
Message
David Webb-CDS
David Webb-CDS
SSCarpal Tunnel
SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)

Group: General Forum Members
Points: 4012 Visits: 8586
Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.

I avoid using NULLable columns where practical, but I'm not religious about it.



And then again, I might be wrong ...
David Webb
mtillman-921105
mtillman-921105
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1502 Visits: 3852
David Webb-200187 (8/18/2010)
Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.

I avoid using NULLable columns where practical, but I'm not religious about it.


David, I like the way you think. We may be going down a dead-end road if we try to precisely define what NULL means and that's part of the point of this post.

But what about this - if we can use a nonnumeric NULL in a numeric column, why can't we also show "N/A" or its equivalent? I see how that would be very useful. But I haven't carefully thought it through. What "Gotcha's" would we create if we could use "N/A" in numeric columns? The same problems we have with NULLs maybe? What does 2 + "N/A" = ?

______________________________________________________________________
The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSChampion
SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)

Group: General Forum Members
Points: 13895 Visits: 4639
Steve Jones - Editor (8/18/2010)
It could be N/A, but that implies a specific meaning. Using NULL there as a placeholder works, but you are assigning a value to NULL. I prefer to keep it as "unknown" as in there will be a ship date, but right now it's unknown.


100% in agreement. That pretty much summarizes my position on the issue.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Tom Thomson
Tom Thomson
One Orange Chip
One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)

Group: General Forum Members
Points: 25737 Visits: 12494
David Webb-200187 (8/18/2010)
I avoid using NULLable columns where practical, but I'm not religious about it.


That's something I agree with 100% - let's avoid using nullable columns when it is practical to do so, and use them where it is impossible or impractical to avoid them (and yes, I am happy to assert that there are real world situations where it is impossible to avoid nulls - except perhaps by deciding not to bother with that database and the apps that use it after all).

mtillman-921105 (8/18/2010)
Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.


We have to avoid it being ambiguous. There are several ways of doing this. The simplest way is to give it a single simple meaning - NULL means "the database contains no value for this datum" - and not pretend that it means anything else (like not yet entered, or not applicable, or too secret to be allowed in this DB, or anything else other than that the database doesn't have a value for the datum). That's what Codd advocated when he first proposed having NULL. But ANSI screwed up and it doesn't mean that in SQL. Another way of doing it is to have multiple NULLs; one for each possible reason for the value not being in the database is ridiculous (we can't put a reasonable bound on the number of possible reasons - but please note that does NOT mean that Date's failed attempt at reductio ad absurdum starting from the case of two NULLs is logical or sensible in any sense) but it's quite possible to adopt Codd's later proposal for a two null system where NULLa means "the database has no value for the datum and the reason is not specified" and NULLi means "the database has no value for the datum because the column is inapplicable in this row" - there's no ambiguity and it's perfectly straightforward to deal with the two distinct NULLs.

Tom

Matt Miller (4)
Matt Miller (4)
One Orange Chip
One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)

Group: General Forum Members
Points: 28955 Visits: 19002
Tom.Thomson (8/18/2010)
David Webb-200187 (8/18/2010)
I avoid using NULLable columns where practical, but I'm not religious about it.


That's something I agree with 100% - let's avoid using nullable columns when it is practical to do so, and use them where it is impossible or impractical to avoid them (and yes, I am happy to assert that there are real world situations where it is impossible to avoid nulls - except perhaps by deciding not to bother with that database and the apps that use it after all).

mtillman-921105 (8/18/2010)
Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.


We have to avoid it being ambiguous. There are several ways of doing this. The simplest way is to give it a single simple meaning - NULL means "the database contains no value for this datum" - and not pretend that it means anything else (like not yet entered, or not applicable, or too secret to be allowed in this DB, or anything else other than that the database doesn't have a value for the datum). That's what Codd advocated when he first proposed having NULL. But ANSI screwed up and it doesn't mean that in SQL. Another way of doing it is to have multiple NULLs; one for each possible reason for the value not being in the database is ridiculous (we can't put a reasonable bound on the number of possible reasons - but please note that does NOT mean that Date's failed attempt at reductio ad absurdum starting from the case of two NULLs is logical or sensible in any sense) but it's quite possible to adopt Codd's later proposal for a two null system where NULLa means "the database has no value for the datum and the reason is not specified" and NULLi means "the database has no value for the datum because the column is inapplicable in this row" - there's no ambiguity and it's perfectly straightforward to deal with the two distinct NULLs.


True except that in practice, actually being able to make that distinction is an awfully big step.What do you do WHILE you actually decipher whether the machine got unplugged, or became self-aware and stopped measuring that data you were looking for. Do you then need 3 NULLS? (the original definition AND the two latter ones?)

Perhaps I deal in messy environments, but I find a lot of settings where the only real distinction needed is "is there data there" or "there's no data available". Putting placeholders in data when you just plain don't know and can't know WHY, just to make it "look" prettier, is like putting earrings on a pig....

I could see having the option for either definition, but if I had to pick just one, I'd go with the original definition for the reason above.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Steve Jones
Steve Jones
SSC Guru
SSC Guru (144K reputation)SSC Guru (144K reputation)SSC Guru (144K reputation)SSC Guru (144K reputation)SSC Guru (144K reputation)SSC Guru (144K reputation)SSC Guru (144K reputation)SSC Guru (144K reputation)

Group: Administrators
Points: 144186 Visits: 19424
I see NULL mostly as a "we need to get this data" marker. We don't know why it's not there, don't know what's there, but we need something.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Tom Thomson
Tom Thomson
One Orange Chip
One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)One Orange Chip (25K reputation)

Group: General Forum Members
Points: 25737 Visits: 12494
Matt Miller (#4) (8/18/2010)
[quote]True except that in practice, actually being able to make that distinction is an awfully big step.What do you do WHILE you actually decipher whether the machine got unplugged, or became self-aware and stopped measuring that data you were looking for. Do you then need 3 NULLS? (the original definition AND the two latter ones?)

Perhaps I deal in messy environments, but I find a lot of settings where the only real distinction needed is "is there data there" or "there's no data available". Putting placeholders in data when you just plain don't know and can't know WHY, just to make it "look" prettier, is like putting earrings on a pig....


I agree that the step from 1 to 2 NULLs is a big one but there are possible reasons for wanting it. I see it as assistance for a particular performance engineering bodge (representing several distinct entity types either in a single relation, whether a base relation or a derived relation to save having to write a lot of code and/or run a lot of overcomplex queries - this is a bodge that we sometimes have to live with unless we can afford to write and run ridiculously large amounts of code) plus getting some extra functionality out of NULLs when they crop up in some of the aggregate functions (ideally NULLi contributes nothing to SUM, AVG,MIN, MAX, VAR, STDEV,...; while NULLa causes them to return NULLa) plus a bit of support for applications which care whether something is known to be inapplicable or just not available for some unspecified reason. But all this can be done by adding an applicability bit column for each column that could take NULLi without too much pain in most cases, so I would rarely choose to use two NULLs instead of just one even in a language that allowed me to do so (with SQL I can't use two nulls anyway).
I could see having the option for either definition, but if I had to pick just one, I'd go with the original definition for the reason above.

In the two NULL system you have the option of using the one-NULL system, you simply define your schema so that NULLi is nowhere allowed. One of the "fun" things about Codd's two-null system is realising that the two NULLs do, in a sense, overlap. Nulla means the value is not here and the database doesn't say why; Nulli means the value is not here and the database says that it's not here because the column is inapplicable. So an inapplicable case could be represented by NULLa instead of NULLi - either because whoever entered it didn't know whether it was applicable or not or because the database designer decided to keep things simple and not represent that information about the reason for absence. The designer can choose not to use NULLi either in all cases (because of a preference for the single null system) or on a horses for courses basis (only adds NULLi when it's convenient and useful). So a schema designer can work in the two NULLs system and use only NULLa so that he gets the same result as he would in Codd's one NULL system, or use only NULLi (if he's so lucky that he doesn't find he needs NULLa, which rarely happens to me) and get a different single NULL system, or use the two-nulls system where appropriate. Of course in SQL the designer is stuck in a botched one-null system where NULL sometimes behaves as NULLa and sometimes as NULLi (when MAX or SUM is used on a nonempty set, for example) and is sometimes introduced instead of a correct result ("select sum(T.x) from T where T.x>T.x" should return 0 on any rational system, but returns NULL).

Tom

Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (211K reputation)SSC Guru (211K reputation)SSC Guru (211K reputation)SSC Guru (211K reputation)SSC Guru (211K reputation)SSC Guru (211K reputation)SSC Guru (211K reputation)SSC Guru (211K reputation)

Group: General Forum Members
Points: 211391 Visits: 41977
David Webb-200187 (8/17/2010)
Yep, that pretty well sums it up. I've been reading Hugh Darwen for a couple of days and working through Tutorial D and the Rel implementation. 6th normal form brings its own bag of headaches, and while I tend to avoid NULLs if I can help it, I'm not sure which set of headaches I'd prefer to deal with, if given an either-or choice.

I'm a little surprised that this thread hasn't generated more traffic. It could be that everyone has already staked out their positions and they feel no need to re-defend them.


It's because they hi-jacked a previous thead of mine. Everyone exhausted themselves there.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
jcrawf02
jcrawf02
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4842 Visits: 19324
@Matt - you're assuming pigs in earrings aren't pretty....

@Paul - "In theory, theory and practice are the same. In practice, they are not" ~ Einstein

---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Nadrek
Nadrek
SSCarpal Tunnel
SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)SSCarpal Tunnel (4.5K reputation)

Group: General Forum Members
Points: 4458 Visits: 2741
I'll simply agree with statements from two prior posts, slightly reworded:

1) In practice, with normal humans using the data, NULL must have an explicit operational definition to be meaningful. Essentially, some set of people must agree on what NULL means in a given column, and then document that, and others must use that document.

2) In practice, I try to avoid allowing NULLs as much as possible, because most of the time, they don't actually have an operational definition, and most of the time, the tri-valued logic serves only to generate incorrect results some of the time.

e.x. For a Quantity field, almost everyone assumes 0 means 0, and 1 means 1. Different people assume NULL means something different. Further, when you actually need more than one distinct non-numeric "meaning", then you still need to fall back on defining specific values, such as "-1 means UNKNOWN" and "-2 means N/A" and "-3 means Will Be Known Later" and so on.

The real danger of NULLs is that some people assume that their preferred definition must be the one that is being used, without asking. I would suggest that obviously invalid values raise questions more of the time.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search