Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

The Logic, Mathematics, and Utility of NULLs Expand / Collapse
Author
Message
Posted Wednesday, August 18, 2010 9:23 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: 2 days ago @ 5:44 PM
Points: 893, Visits: 6,894
Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.

I avoid using NULLable columns where practical, but I'm not religious about it.




And then again, I might be wrong ...
David Webb
Post #971220
Posted Wednesday, August 18, 2010 9:32 AM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Tuesday, February 25, 2014 8:59 AM
Points: 754, Visits: 3,813
David Webb-200187 (8/18/2010)
Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.

I avoid using NULLable columns where practical, but I'm not religious about it.


David, I like the way you think. We may be going down a dead-end road if we try to precisely define what NULL means and that's part of the point of this post.

But what about this - if we can use a nonnumeric NULL in a numeric column, why can't we also show "N/A" or its equivalent? I see how that would be very useful. But I haven't carefully thought it through. What "Gotcha's" would we create if we could use "N/A" in numeric columns? The same problems we have with NULLs maybe? What does 2 + "N/A" = ?


______________________________________________________________________
The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking
Post #971226
Posted Wednesday, August 18, 2010 9:44 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,068, Visits: 4,639
Steve Jones - Editor (8/18/2010)
It could be N/A, but that implies a specific meaning. Using NULL there as a placeholder works, but you are assigning a value to NULL. I prefer to keep it as "unknown" as in there will be a ship date, but right now it's unknown.


100% in agreement. That pretty much summarizes my position on the issue.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #971238
Posted Wednesday, August 18, 2010 5:59 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 9:56 AM
Points: 8,288, Visits: 8,739
David Webb-200187 (8/18/2010)
I avoid using NULLable columns where practical, but I'm not religious about it.


That's something I agree with 100% - let's avoid using nullable columns when it is practical to do so, and use them where it is impossible or impractical to avoid them (and yes, I am happy to assert that there are real world situations where it is impossible to avoid nulls - except perhaps by deciding not to bother with that database and the apps that use it after all).

mtillman-921105 (8/18/2010)
Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.


We have to avoid it being ambiguous. There are several ways of doing this. The simplest way is to give it a single simple meaning - NULL means "the database contains no value for this datum" - and not pretend that it means anything else (like not yet entered, or not applicable, or too secret to be allowed in this DB, or anything else other than that the database doesn't have a value for the datum). That's what Codd advocated when he first proposed having NULL. But ANSI screwed up and it doesn't mean that in SQL. Another way of doing it is to have multiple NULLs; one for each possible reason for the value not being in the database is ridiculous (we can't put a reasonable bound on the number of possible reasons - but please note that does NOT mean that Date's failed attempt at reductio ad absurdum starting from the case of two NULLs is logical or sensible in any sense) but it's quite possible to adopt Codd's later proposal for a two null system where NULLa means "the database has no value for the datum and the reason is not specified" and NULLi means "the database has no value for the datum because the column is inapplicable in this row" - there's no ambiguity and it's perfectly straightforward to deal with the two distinct NULLs.


Tom
Post #971539
Posted Wednesday, August 18, 2010 9:16 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 2:12 PM
Points: 7,084, Visits: 14,685
Tom.Thomson (8/18/2010)
David Webb-200187 (8/18/2010)
I avoid using NULLable columns where practical, but I'm not religious about it.


That's something I agree with 100% - let's avoid using nullable columns when it is practical to do so, and use them where it is impossible or impractical to avoid them (and yes, I am happy to assert that there are real world situations where it is impossible to avoid nulls - except perhaps by deciding not to bother with that database and the apps that use it after all).

mtillman-921105 (8/18/2010)
Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.


We have to avoid it being ambiguous. There are several ways of doing this. The simplest way is to give it a single simple meaning - NULL means "the database contains no value for this datum" - and not pretend that it means anything else (like not yet entered, or not applicable, or too secret to be allowed in this DB, or anything else other than that the database doesn't have a value for the datum). That's what Codd advocated when he first proposed having NULL. But ANSI screwed up and it doesn't mean that in SQL. Another way of doing it is to have multiple NULLs; one for each possible reason for the value not being in the database is ridiculous (we can't put a reasonable bound on the number of possible reasons - but please note that does NOT mean that Date's failed attempt at reductio ad absurdum starting from the case of two NULLs is logical or sensible in any sense) but it's quite possible to adopt Codd's later proposal for a two null system where NULLa means "the database has no value for the datum and the reason is not specified" and NULLi means "the database has no value for the datum because the column is inapplicable in this row" - there's no ambiguity and it's perfectly straightforward to deal with the two distinct NULLs.


True except that in practice, actually being able to make that distinction is an awfully big step.What do you do WHILE you actually decipher whether the machine got unplugged, or became self-aware and stopped measuring that data you were looking for. Do you then need 3 NULLS? (the original definition AND the two latter ones?)

Perhaps I deal in messy environments, but I find a lot of settings where the only real distinction needed is "is there data there" or "there's no data available". Putting placeholders in data when you just plain don't know and can't know WHY, just to make it "look" prettier, is like putting earrings on a pig....

I could see having the option for either definition, but if I had to pick just one, I'd go with the original definition for the reason above.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #971576
Posted Wednesday, August 18, 2010 9:32 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 11:24 AM
Points: 32,781, Visits: 14,942
I see NULL mostly as a "we need to get this data" marker. We don't know why it's not there, don't know what's there, but we need something.






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #971579
Posted Thursday, August 19, 2010 4:07 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 9:56 AM
Points: 8,288, Visits: 8,739
Matt Miller (#4) (8/18/2010)
[quote]True except that in practice, actually being able to make that distinction is an awfully big step.What do you do WHILE you actually decipher whether the machine got unplugged, or became self-aware and stopped measuring that data you were looking for. Do you then need 3 NULLS? (the original definition AND the two latter ones?)

Perhaps I deal in messy environments, but I find a lot of settings where the only real distinction needed is "is there data there" or "there's no data available". Putting placeholders in data when you just plain don't know and can't know WHY, just to make it "look" prettier, is like putting earrings on a pig....


I agree that the step from 1 to 2 NULLs is a big one but there are possible reasons for wanting it. I see it as assistance for a particular performance engineering bodge (representing several distinct entity types either in a single relation, whether a base relation or a derived relation to save having to write a lot of code and/or run a lot of overcomplex queries - this is a bodge that we sometimes have to live with unless we can afford to write and run ridiculously large amounts of code) plus getting some extra functionality out of NULLs when they crop up in some of the aggregate functions (ideally NULLi contributes nothing to SUM, AVG,MIN, MAX, VAR, STDEV,...; while NULLa causes them to return NULLa) plus a bit of support for applications which care whether something is known to be inapplicable or just not available for some unspecified reason. But all this can be done by adding an applicability bit column for each column that could take NULLi without too much pain in most cases, so I would rarely choose to use two NULLs instead of just one even in a language that allowed me to do so (with SQL I can't use two nulls anyway).
I could see having the option for either definition, but if I had to pick just one, I'd go with the original definition for the reason above.

In the two NULL system you have the option of using the one-NULL system, you simply define your schema so that NULLi is nowhere allowed. One of the "fun" things about Codd's two-null system is realising that the two NULLs do, in a sense, overlap. Nulla means the value is not here and the database doesn't say why; Nulli means the value is not here and the database says that it's not here because the column is inapplicable. So an inapplicable case could be represented by NULLa instead of NULLi - either because whoever entered it didn't know whether it was applicable or not or because the database designer decided to keep things simple and not represent that information about the reason for absence. The designer can choose not to use NULLi either in all cases (because of a preference for the single null system) or on a horses for courses basis (only adds NULLi when it's convenient and useful). So a schema designer can work in the two NULLs system and use only NULLa so that he gets the same result as he would in Codd's one NULL system, or use only NULLi (if he's so lucky that he doesn't find he needs NULLa, which rarely happens to me) and get a different single NULL system, or use the two-nulls system where appropriate. Of course in SQL the designer is stuck in a botched one-null system where NULL sometimes behaves as NULLa and sometimes as NULLi (when MAX or SUM is used on a nonempty set, for example) and is sometimes introduced instead of a correct result ("select sum(T.x) from T where T.x>T.x" should return 0 on any rational system, but returns NULL).



Tom
Post #971726
Posted Thursday, August 19, 2010 6:26 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 6:29 PM
Points: 35,977, Visits: 30,266
David Webb-200187 (8/17/2010)
Yep, that pretty well sums it up. I've been reading Hugh Darwen for a couple of days and working through Tutorial D and the Rel implementation. 6th normal form brings its own bag of headaches, and while I tend to avoid NULLs if I can help it, I'm not sure which set of headaches I'd prefer to deal with, if given an either-or choice.

I'm a little surprised that this thread hasn't generated more traffic. It could be that everyone has already staked out their positions and they feel no need to re-defend them.


It's because they hi-jacked a previous thead of mine. Everyone exhausted themselves there.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #971803
Posted Thursday, August 19, 2010 6:48 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, April 17, 2014 8:12 AM
Points: 2,627, Visits: 19,093
@Matt - you're assuming pigs in earrings aren't pretty....

@Paul - "In theory, theory and practice are the same. In practice, they are not" ~ Einstein


---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Post #971821
Posted Thursday, August 19, 2010 8:41 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 8:46 AM
Points: 845, Visits: 2,331
I'll simply agree with statements from two prior posts, slightly reworded:

1) In practice, with normal humans using the data, NULL must have an explicit operational definition to be meaningful. Essentially, some set of people must agree on what NULL means in a given column, and then document that, and others must use that document.

2) In practice, I try to avoid allowing NULLs as much as possible, because most of the time, they don't actually have an operational definition, and most of the time, the tri-valued logic serves only to generate incorrect results some of the time.

e.x. For a Quantity field, almost everyone assumes 0 means 0, and 1 means 1. Different people assume NULL means something different. Further, when you actually need more than one distinct non-numeric "meaning", then you still need to fall back on defining specific values, such as "-1 means UNKNOWN" and "-2 means N/A" and "-3 means Will Be Known Later" and so on.

The real danger of NULLs is that some people assume that their preferred definition must be the one that is being used, without asking. I would suggest that obviously invalid values raise questions more of the time.
Post #971937
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse