The Logic, Mathematics, and Utility of NULLs

  • Steve Jones - Editor (8/25/2010)


    I do agree that a qualifier of some sort, different for each domain, may be what is needed. In practicality, the issue is how do you document this and ensure the knowledge is spread. The lack of an easy way to document a column (extended properties haven't worked well for me, a bit cumbersome), is the main issue.

    Extended properties didn't work for me either. I sometimes have a heap of comments in table creation scripts, that works a bit better than extended properties, but is still far from perfect. Documentation seems to me to be one of the most difficult things in just about any branch of IT development.

    Tom

  • The problem is ensuring the data dictionary gets propagated to other people. Comments in table creation scripts are what I've done, since it's easier to maintain there, in a VCS, and then building a habit to have people check there if they are not sure of what a table does.

  • Steve Jones - Editor (8/25/2010)


    So, is NetFlix.VacationHoursTaken set to 0? 99999? or perhaps, Null 🙂

    I guess it has to be NULL - and then, if you have all the ANSI null-handling switched on, if someone takes another couple of days and some outdated legacy code adds 16 to it it comes out as still NULL. And if you ask "has X taken more than 25 days this year" you end up testing whether 200 > NULL, and the answer is UNKNOWN.

    And isn't this a beautiful example of how NULL is superior to a default value like 0 in the places where NULL is appropriate?

    Tom

  • Or maybe even "N/A"? :laugh:

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • Tom.Thomson (8/25/2010)


    mtillman-921105 (8/25/2010)


    Tom, I haven't researched this as much as I would like and will say more later, but even Charles Pierce, a highly respected American Pragmatist, was experimenting with three valued logic long before Codd, around the year 1910.

    Yes, Peirce worked with a 3-valued logic - truth values T (true), F (false), and V (unknown) - there's no NULL in there, because unknown as a truth value is not NULL. Peirce may have been the inspiration for Codd's use of 3VL in Codds original NULL proposal, but certainly not for two distinct NULL truth values which happened to match two distinct NULLs, as in Codd's later proposal.

    Of course Peirce wasn't first to use a multi-valued logic (Aristotle's logic, in at least one of his formulations, had three truth values: "necessarily true", "necessarily false", and "contingent"). Neither of course was Codd, and I didn't suggest he was - what I said was that he was the first to use NULLs as some of his truth values, which is a very different thing. Peirce may have been the first to use 3 valued logic in modern times (for logic, "modern times" is from 1847 on - that was the year in which De Morgan and Boole first published treatises on logic; and of course De Morgan also invented/discovered relational algebra, so should be better known to RDBMS people than he is); but Lukasiewicz published a paper on Aristotle's logic in 1910, and that may well have discussed a 3-valued logic in which case it's a toss-up whether he or Peirce was first. (Lukasiewicz was of course one of the most famous 20th century contributors to the theories multi-valued logics, and the whole subject of MV-Algebras was developed to provide a semantics for his logics, and we might not have acquired a formal theory of fuzzy logics without his work; but that was all a bit later.)

    Tom, let me try to state this in a different way. It may be true, strictly speaking, to say that "unknown as a truth value is not NULL." However, if we define NULL simply as unknown, then since Peirce and others were working with unknowns (no matter what they called them) in logic, then they were, in effect, working with NULLs, were they not?

    Also, if you follow my logic there, I would like to know what you think of algebra's use of x, y, etc. - are they not unknown values too until they are solved? Or am I taking this too far?

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • mtillman-921105 (8/25/2010)


    Tom, let me try to state this in a different way. It may be true, strictly speaking, to say that "unknown as a truth value is not NULL." However, if we define NULL simply as unknown, then since Peirce and others were working with unknowns (no matter what they called them) in logic, then they were, in effect, working with NULLs, were they not?

    Peirce called it "unknown", as have most people working with a 3-valued logic.

    The trouble with defining NULL as unknown is that there are cases where it doesn't work: that something is not entered in the database doesn't mean that it is unknown, it simply means that it isn't in the database (some people like "not yet entered, but will be entered tomorrow" as an example of this, others like "error when moving from staging table to real table, presumably typing error: we've sent a warning email but currently we don't have a value", and my personal preference is "we can't enter this value because it's too secret" (because it's very funny; if you've never been caught up in it, you won't know what sort of absolutely laughable situations clearance and its lack can lead to).

    Instead of coming at it from the database angle, we could come at it instead from the logic angle; when we go there what we see is that the truth value "unknown" is absolutely known: it's not "unknown" in the sense that we don't know what the truth value is, on the contrary we we know exactly what it is and it is "unknown". If we work in the simplest semantic model for "fuzzy logic" (with infinitely many truth values), truth values greater than 0.0 and less than 1.0 are sometimes called the unknown value set because only 0 is false and only 1 is true; it is senseless to say that a truth value which we know to be 0.5 is not known - after all, we know what it is. In this case, just as in three valued logics, "NULL" is not a synonym for the value "unknown" or for a value in the set of unknown values.

    I suppose someone could choose a different terminology for a logic, but as far as I'm aware no mathematicians (until Codd) or logicians ever used the term NULL for any of their truth values.

    Also, if you follow my logic there, I would like to know what you think of algebra's use of x, y, etc. - are they not unknown values too until they are solved? Or am I taking this too far?

    I need to think more about that one. They aren't much like a raw SQL NULL, but there have been suggestions that NULLs indicating "inapplicable" can change to real values when something in the real world changes (Codd gave at least one example, IIRC) and that as long as this is controlled by appropriate constraints (where in the relational system or in the app external world) that's OK - for example a "wife's maiden name" attribute ceases to be inapplicable when a man gets married. I can see that that might extend in an easy sort of a way to a NULL indicating "not yet resolved" or "not yet computed" or just "value absent" (in a single NULL system), so that variables in some sort of equation system that needs to be solved could be considered in such a way that their value is marked NULL until that solving takes place (and it doesn't matter at all whether that solving involves algebraic manipulations like Gaussian elimination, numerical methods like Runge-Kutta, horn clause logic unification, or sending someone out there with a tape measure - or indeed something completely different from all of those). But I have never thought about variables that way before, and am finding difficult to get my head round it well enough to decide whether it's a good analogy or not. That probably means you've done me a favour by suggesting it - (a) it's making me think, and that holds off the onset of senility! and (b) it may lead me into new and interesting territory. Thanks for that.

    Tom

  • Good God Almighty Tom, if you're battling the onset of senility, I might as well throw in the towel and just start drooling on myself...

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Tom.Thomson (8/25/2010)


    Steve Jones - Editor (8/25/2010)


    I do agree that a qualifier of some sort, different for each domain, may be what is needed. In practicality, the issue is how do you document this and ensure the knowledge is spread. The lack of an easy way to document a column (extended properties haven't worked well for me, a bit cumbersome), is the main issue.

    Extended properties didn't work for me either. I sometimes have a heap of comments in table creation scripts, that works a bit better than extended properties, but is still far from perfect. Documentation seems to me to be one of the most difficult things in just about any branch of IT development.

    I guess I'm the wild card here. Documentation in Extended Properties at both the table level and the column level have done wonders for me for the last 3 jobs. There's nothing like telling a newbie to look at the comments on a column before asking what the column does or is used for. Generation of a data dictionary is a breeze.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/25/2010)


    Tom.Thomson (8/25/2010)


    Steve Jones - Editor (8/25/2010)


    I do agree that a qualifier of some sort, different for each domain, may be what is needed. In practicality, the issue is how do you document this and ensure the knowledge is spread. The lack of an easy way to document a column (extended properties haven't worked well for me, a bit cumbersome), is the main issue.

    Extended properties didn't work for me either. I sometimes have a heap of comments in table creation scripts, that works a bit better than extended properties, but is still far from perfect. Documentation seems to me to be one of the most difficult things in just about any branch of IT development.

    I guess I'm the wild card here. Documentation in Extended Properties at both the table level and the column level have done wonders for me for the last 3 jobs. There's nothing like telling a newbie to look at the comments on a column before asking what the column does or is used for. Generation of a data dictionary is a breeze.

    ...and now I'm off to learn more, keep getting questions like Jeff mentioned for our one db we can write to. Thanks all!

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • OK, Just tried to get extended properties from SSMS and they are much better integrated. In 2005 they didn't work well at all and it was cumbersome to find meaning.

    Perhaps this is something I'll try as a data dictionary moving forward.

  • I found this argument against using NULLs from David Portas, who argues forcefully:

    "...null isn't a value. But in what sense does null "represent" the lack of a known value if the behaviour of nulls in expressions does not match the way unknown values behave? Why even bother to pretend that such a thing is being represented when it clearly is not?

    I think it's more accurate to say that Null is marker which causes a certain non-intuitive set of results in queries and expressions. That set of results does NOT accurately model unknown values, missing values or anything else based in reality or common sense. ..."

    [from http://www.sqlservercentral.com/Forums/Topic856197-373-2.aspx%5D

    I think that a distinction would be helpful here between how NULLs "behave" and just NULLs themselves. As Portas points out, he doesn't like the way they're implemented. So it isn't really the idea of allowing NULLs (or unknown values) into a column that causes the issues, but how they cause confusion when they're used. That is, NULLs, in and of themselves, aren't problematic. Only the confused application of them causes the issues with them.

    Is this fair to say? Do you agree?

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • mtillman-921105 (9/3/2010)


    I found this argument against using NULLs from David Portas, who argues forcefully:

    "...null isn't a value. But in what sense does null "represent" the lack of a known value if the behaviour of nulls in expressions does not match the way unknown values behave? Why even bother to pretend that such a thing is being represented when it clearly is not?

    I think it's more accurate to say that Null is marker which causes a certain non-intuitive set of results in queries and expressions. That set of results does NOT accurately model unknown values, missing values or anything else based in reality or common sense. ..."

    [from http://www.sqlservercentral.com/Forums/Topic856197-373-2.aspx%5D

    I think that a distinction would be helpful here between how NULLs "behave" and just NULLs themselves. As Portas points out, he doesn't like the way they're implemented. So it isn't really the idea of allowing NULLs (or unknown values) into a column that causes the issues, but how they cause confusion when they're used. That is, NULLs, in and of themselves, aren't problematic. Only the confused application of them causes the issues with them.

    Is this fair to say? Do you agree?

    First I'd say that I view David's remarks that you quoted as just what might be expected expected from one of Chris Date's followers. The problem with these people is that they are scared of logics that admit more than two truth values, so they don't learn how those logic works, so they view the results of caculi which incorporate such logics as illogical. And working with NULLs with a two-valued logic is extermely complex, so they shun that too. To do this, they have to shun NULL.

    Second, I think Matt Miller has a reasonable approach, described in the last paragraph of his post http://www.sqlservercentral.com/Forums/FindPost856864.aspx in the same topic.

    Third, NULLs are indeed problematic in SQL, because they sometimes appear when they shouldn't and sometimes don't appear where they should. But they aren't very problematic. But that is an implementation problem, not a problem with the concept of NULL as an indicator of the absence of a value where a value would otherwise be expected.

    Finally, Yes, I think its is fair to say, as you suggest, that it's the confused use of NULLs that causes trouble, not the NULLs themselves.

    There are two ways that NULLs cause problems real life databases (as opposed to in theory).

    The first way is that people struggle to avoid NULLs even in cases where they are in practise unavoidable. I can go over to a system in which no table has more than one column not contained in the primary key, and have no NULLs in my tables (I can just leave rows out) (this is an extreme case of null-free 6NF). This gets me nowhere becase I have to deal with this crazy schema. If I want the thing to behave like a database then (a) I now have to write inordinately large joins and (b) those joins have to be outer joins if they are to return any useful data, so they deliver lots of nice NULLs in their result sets. Or I can make the thing behave like an object-attribute store and allow only queries on single tables, so there's no relational database there at all, all the relational work is done in the application.

    The second problem is that people insist on assigning meanings to NULL other than its natural meaning (a marker for a missing value), and end up assigning different meanings for NULLs in different tables, or even for different columns in the same table, or (horror of horrors) different meanings wheich depend on which other things are also NULL (and it can get still worse than that). This is serious misuse of NULLs, and can lead to lots of problems. In SQL we only have one sort of NULL, so it has to cover all the cases where a value is absent - it's unknown, it's inapplicable in this row, it's known but hasn't been entered yet, it's known but its value in this row is top secret so you can't put it in this database, input validation rejected the information provided and it's marked null untill valid information is forthcoming, and so on and so on. To get good behaviour from your database you have to define NULL handling in such a way that it is acceptable for NULL whatever reason it is there for - so you either have to work out what all the possible reasons for NULL are in your database (including all the reasons that will crop up during its deployed life, not just the ones that are obvious to you today) or work out a generic strategy for null handling that will always be acceptable (the latter approach is generally safer - but you can sometimes define data input rules in such a way that NULL can only happen in very special circumstances, and have a matching special treatment for NULL, and if you can do that it's almost as good as being able to eleiminate all possibility of NULLs cropping up, which is of course the best way to go in the rare instances where it is possible).

    Tom

  • ...

    The first way is that people struggle to avoid NULLs even in cases where they are in practise unavoidable. I can go over to a system in which no table has more than one column not contained in the primary key, and have no NULLs in my tables (I can just leave rows out) (this is an extreme case of null-free 6NF). This gets me nowhere becase I have to deal with this crazy schema. If I want the thing to behave like a database then (a) I now have to write inordinately large joins and (b) those joins have to be outer joins if they are to return any useful data, so they deliver lots of nice NULLs in their result sets. Or I can make the thing behave like an object-attribute store and allow only queries on single tables, so there's no relational database there at all, all the relational work is done in the application.

    The second problem is that people insist on assigning meanings to NULL other than its natural meaning (a marker for a missing value), and end up assigning different meanings for NULLs in different tables, or even for different columns in the same table, or (horror of horrors) different meanings wheich depend on which other things are also NULL (and it can get still worse than that). This is serious misuse of NULLs, and can lead to lots of problems. In SQL we only have one sort of NULL, so it has to cover all the cases where a value is absent - it's unknown, it's inapplicable in this row, it's known but hasn't been entered yet, it's known but its value in this row is top secret so you can't put it in this database, input validation rejected the information provided and it's marked null untill valid information is forthcoming, and so on and so on. To get good behaviour from your database you have to define NULL handling in such a way that it is acceptable for NULL whatever reason it is there for - so you either have to work out what all the possible reasons for NULL are in your database (including all the reasons that will crop up during its deployed life, not just the ones that are obvious to you today) or work out a generic strategy for null handling that will always be acceptable (the latter approach is generally safer - but you can sometimes define data input rules in such a way that NULL can only happen in very special circumstances, and have a matching special treatment for NULL, and if you can do that it's almost as good as being able to eleiminate all possibility of NULLs cropping up, which is of course the best way to go in the rare instances where it is possible).

    Tom Thomson

    Na tog mi gun tuit mi ach ma thuiteas tog!

    Thig crìoch air an t-saoghal ach mairidh gaol is ceòl

    Yes, I agree Tom, there are hazards using NULLs and hazards when not using them too. Knowing when and where to use them is a key. I've also seen queries designed to avoid NULLs, and they're not for the faint of heart, are they? :w00t:

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

  • One important aspect of NULLs that has been overlooked until now on this thread is that in SQL Server, NULLs are used in almost every kind of data type (or all data types?).

    In the first post, it says that NULLs are, in effect, in logic and in algebra. But notice that we should be more careful about that since logic can only use BOOLEAN types. Algebra and mathematics only work with numeric types. I think that it's important to keep that in mind since the potential meanings of "NULL" are limited in each case.

    Actually, I've taken a math class that included logic, but most people do not subsume logic under mathematics or vice-versa. Actually however, Frege (the "father of modern logic") tried to make math a special case of logic, but didn't succeed by most accounts. And that's a long, complicated story and and a sideshow here anyway. I'll try not to bog us down into too much superfluous philosophy here.

    Anyway, it can be important to keep a NULL's column type in mind rather than talking about NULLs in general. In a BOOLEAN column, the NULL can only be replaced by a true or false. But in a numeric column, the NULLs behavior could be different - it can only be replaced with a number. In a character field, well anything can arguably go there except a .T. or .F.

    So viewing NULLs in light of each data type is imperative since we would expect different behaviors from different data types. This complicates matters on a different level (which I'm trying to avoid), but we have to categorize the NULLs this way to prevent confusion.

    The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking

Viewing 15 posts - 46 through 60 (of 67 total)

You must be logged in to reply to this topic. Login to reply