The Logic, Mathematics, and Utility of NULLs

  • These questions will be an ongoing project where some of the main points will be added here. The questions will be reformulated as we learn more. This is going to be a group project where everyone's input will count.

    What is a NULL? It represents an unknown. But can it also stand for "N/A"? As Jeff Moden implies below, it does not stand for "nothing" because "nothing" is a known quantity.

    Where did NULLs come from anyway? The origin of NULLs is important for a thorough examination of them. Were they merely invented as an ad hoc solution for example?

    Aren't the unknown values in algebra (like x and y), NULLs before they're solved? If we say yes, then "unknown" quantities have been in use in mathematics at least as long as algebra has been around. So the term "NULL" from Codd (in respect to RDBMS) may be new, but the basic concept may not be - at least not in mathematics in general.

    How are they used in Mathematics?

    How are they used in Logic?

    Over 2,000 years ago, Socrates appreciated the idea of NULL in its broadest sense. He proclaimed that he knew nothing. He was a breathing, talking NULL. But he had some terrific questions! (It's one thing to be ignorant and know it and quite another to be ignorant and not know it.)

    The Utility of NULLs

    Should NULLs be avoided?

    Some would argue that they're a mistake from the outset. For instance, one issue that they create is they take away the "relational model" of a RDBMS in that one table can be joined to another and yet there are no rows that correspond (in an OUTER JOIN). What kind of "relation" is that? In that case, a NULL becomes a stand-in for a non-relation? Simply an ad hoc solution? (This position is coming from a previous thread, but I thought it was a persuasive argument.)

    -David Portas-

    According to David Portas, NULLs do not behave like "unknown values" and this is an issue. If NULLs were applied consistently in SQL, then they would be less of a problem.

    But this view is more of an argument about how NULLs are used by the RDBMS rather than NULLs themselves.

    Would creating different Kinds of NULLs be worthwhile? For example, maybe "N/A" should be distinguished from NULL.

    Some would argue that a NULL already means that no data is available, so there is really nothing to add. Why complicate matters further? If the data is not there, it is not there - so what is it good for anyway? From a practical standpoint, there is no need to complicate the options.

    NULLs can have differing meanings, depending upon how they are used and in what context. How can we easily document them to prevent confusion? [Hit tip to Steve Jones on this question.]

    Some of this has been discussed in another thread, we can take some of that and add it here.

    My position is subject to change, but thus far, I will argue that sometimes NULLs are useful, sometimes they're just an obstacle and can cause errors. (Yes, it depends.)

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • Various parts of mathematics use a NULL-like concept. The ones which to me seem to be the closest to the NULL of relational databases crop up in Chris Strachey and Dana Scott's denotational semantics for programmes where one can have values like "bottom of D" for each domain D, and of course "bottom" for "bottom of everything". Basically "bottom of D" is a value in D of which we know nothing but that it is in D, while "bottom" is a value of which we know nothing, not even what domain it is a member of. Essentially all values exist in a lattice, and each domain is a sublattice, with the < relation in the lattice meaning "is less well defined". There is also a top value, which is overdefined (ie is contradictory), and potentially "top of D" values which are also overdefined but slightly less so (non-contradictory information on which domain they are in is available). Of course bottom and top occur naturally in lattice theory, even before it is applied to the denotational semantics of programmmes. But if one considers the natural topology of the lattices used in denotational semantics you see that the meaning of a recursive programme is the limit of the meanings of its partial evaluations (eg the meaning of a programme that computes factorials is the limit of the series of meanings of the programmes F(N) that produce factorials for numbers up to N but don't deliver a result for numbers > N, as N increases), which is quite a strong constraint on the lattices.

    Despite the closeness of the NULLs of RDBMS with the bottoms of Scott-Strachey semantics, and the fact that both indicate an absence of knowledge about a value, they are conceptually quite different.

    edit:correct the English

    Tom

  • WOW, over my head on the first post. OK, I'll just grab the popcorn and sit back and absorb.

    Carry on.


    And then again, I might be wrong ...
    David Webb

  • David Webb-200187 (8/17/2010)


    WOW, over my head on the first post. OK, I'll just grab the popcorn and sit back and absorb.

    Carry on.

    Heh... I'll summarize. In fact, it's already on a T-Shirt... "NULLs don't mean 'Nothing'". 😉 At least not in an RDBMS.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yep, that pretty well sums it up. I've been reading Hugh Darwen for a couple of days and working through Tutorial D and the Rel implementation. 6th normal form brings its own bag of headaches, and while I tend to avoid NULLs if I can help it, I'm not sure which set of headaches I'd prefer to deal with, if given an either-or choice.

    I'm a little surprised that this thread hasn't generated more traffic. It could be that everyone has already staked out their positions and they feel no need to re-defend them.


    And then again, I might be wrong ...
    David Webb

  • NULL seems like a variable to me, in an equation that can't be currently solved. Or maybe like "i". We know that it has some value, but we cannot pick a value to state, so we use a placeholder.

  • Steve Jones - Editor (8/18/2010)


    NULL seems like a variable to me, in an equation that can't be currently solved. Or maybe like "i". We know that it has some value, but we cannot pick a value to state, so we use a placeholder.

    I like this thought Steve, but I would also add that a NULL can represent an "N/A." For instance, if we have a table listing car models, then we might have a column for MPG. But what if the model is an electric car? What would we put in the column for MPG? Would it be wrong to use a NULL there since there really is no quantity that applies? We can't use "N/A" since it is a numeric field.

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • It could be N/A, but that implies a specific meaning. Using NULL there as a placeholder works, but you are assigning a value to NULL. I prefer to keep it as "unknown" as in there will be a ship date, but right now it's unknown.

  • Steve Jones - Editor (8/18/2010)


    It could be N/A, but that implies a specific meaning. Using NULL there as a placeholder works, but you are assigning a value to NULL. I prefer to keep it as "unknown" as in there will be a ship date, but right now it's unknown.

    Good point - if we use NULL for miles per gallon for an electric car, it doesn't really mean "unknown" in that case, but "N/A" since the MPG doesn't even apply in this case. We're already seeing inconsistencies here. So what would all of you say to do in this case? Use a NULL anyway since it "works?" I would probably use the NULL there if it was a numeric field and couldn't change the field's type easily. But aren't we already bending the rules of NULLs? We can redefine a NULL on a whim?

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • If you can be sure that it will always mean N/A, you could use it. Alternatively, you could use a code, something like 99989 to mean electric car.

    It's how you define it, and how you handle it in the application. However, be sure it's documented somehow.

  • Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.

    I avoid using NULLable columns where practical, but I'm not religious about it.


    And then again, I might be wrong ...
    David Webb

  • David Webb-200187 (8/18/2010)


    Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.

    I avoid using NULLable columns where practical, but I'm not religious about it.

    David, I like the way you think. We may be going down a dead-end road if we try to precisely define what NULL means and that's part of the point of this post.

    But what about this - if we can use a nonnumeric NULL in a numeric column, why can't we also show "N/A" or its equivalent? I see how that would be very useful. But I haven't carefully thought it through. What "Gotcha's" would we create if we could use "N/A" in numeric columns? The same problems we have with NULLs maybe? What does 2 + "N/A" = ?

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • Steve Jones - Editor (8/18/2010)


    It could be N/A, but that implies a specific meaning. Using NULL there as a placeholder works, but you are assigning a value to NULL. I prefer to keep it as "unknown" as in there will be a ship date, but right now it's unknown.

    100% in agreement. That pretty much summarizes my position on the issue.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • David Webb-200187 (8/18/2010)


    I avoid using NULLable columns where practical, but I'm not religious about it.

    That's something I agree with 100% - let's avoid using nullable columns when it is practical to do so, and use them where it is impossible or impractical to avoid them (and yes, I am happy to assert that there are real world situations where it is impossible to avoid nulls - except perhaps by deciding not to bother with that database and the apps that use it after all).

    mtillman-921105 (8/18/2010)


    Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.

    We have to avoid it being ambiguous. There are several ways of doing this. The simplest way is to give it a single simple meaning - NULL means "the database contains no value for this datum" - and not pretend that it means anything else (like not yet entered, or not applicable, or too secret to be allowed in this DB, or anything else other than that the database doesn't have a value for the datum). That's what Codd advocated when he first proposed having NULL. But ANSI screwed up and it doesn't mean that in SQL. Another way of doing it is to have multiple NULLs; one for each possible reason for the value not being in the database is ridiculous (we can't put a reasonable bound on the number of possible reasons - but please note that does NOT mean that Date's failed attempt at reductio ad absurdum starting from the case of two NULLs is logical or sensible in any sense) but it's quite possible to adopt Codd's later proposal for a two null system where NULLa means "the database has no value for the datum and the reason is not specified" and NULLi means "the database has no value for the datum because the column is inapplicable in this row" - there's no ambiguity and it's perfectly straightforward to deal with the two distinct NULLs.

    Tom

  • Tom.Thomson (8/18/2010)


    David Webb-200187 (8/18/2010)


    I avoid using NULLable columns where practical, but I'm not religious about it.

    That's something I agree with 100% - let's avoid using nullable columns when it is practical to do so, and use them where it is impossible or impractical to avoid them (and yes, I am happy to assert that there are real world situations where it is impossible to avoid nulls - except perhaps by deciding not to bother with that database and the apps that use it after all).

    mtillman-921105 (8/18/2010)


    Redefining NULL on a whim? You can't redefine something that doesn't have a good definition to start with. N/A, unknown, not yet entered, incomplete, etc. can all be valid interpretations of a NULL depending on the context. That's the problem. In order to use it, you have to have a potentially different contextual definition in every table, in some cases multiple definitions per table if you have multiple NULLable columns. Even those can be wrong. Imagine a system where a client's address was NULLable. It might mean that the address had not yet been collected, or it could mean the client was homeless and had no fixed address. The meaning of the NULL address would be ambiguous.

    We have to avoid it being ambiguous. There are several ways of doing this. The simplest way is to give it a single simple meaning - NULL means "the database contains no value for this datum" - and not pretend that it means anything else (like not yet entered, or not applicable, or too secret to be allowed in this DB, or anything else other than that the database doesn't have a value for the datum). That's what Codd advocated when he first proposed having NULL. But ANSI screwed up and it doesn't mean that in SQL. Another way of doing it is to have multiple NULLs; one for each possible reason for the value not being in the database is ridiculous (we can't put a reasonable bound on the number of possible reasons - but please note that does NOT mean that Date's failed attempt at reductio ad absurdum starting from the case of two NULLs is logical or sensible in any sense) but it's quite possible to adopt Codd's later proposal for a two null system where NULLa means "the database has no value for the datum and the reason is not specified" and NULLi means "the database has no value for the datum because the column is inapplicable in this row" - there's no ambiguity and it's perfectly straightforward to deal with the two distinct NULLs.

    True except that in practice, actually being able to make that distinction is an awfully big step.What do you do WHILE you actually decipher whether the machine got unplugged, or became self-aware and stopped measuring that data you were looking for. Do you then need 3 NULLS? (the original definition AND the two latter ones?)

    Perhaps I deal in messy environments, but I find a lot of settings where the only real distinction needed is "is there data there" or "there's no data available". Putting placeholders in data when you just plain don't know and can't know WHY, just to make it "look" prettier, is like putting earrings on a pig....

    I could see having the option for either definition, but if I had to pick just one, I'd go with the original definition for the reason above.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 67 total)

You must be logged in to reply to this topic. Login to reply