• Dave62 (8/22/2013)


    I can see why this creates confusion with concatenation. People in the "unknown" camp will have difficulty concatenating an unknown value with a known value. In contrast, it should be very easy to concatenate "no data present" with any known value. The result is simply the known value.

    I am sorry, but I (again) have to disagree.

    Let's use an analogy. You know my first name: Hugo. You also know my last name: Kornelis. Now imagine the next wanna-be Don Corleone breaking into your house, pointing the biggest gun you've ever seen to your forehead, and daring you to utter my full name - get it wrong, he pulls the trigger. What would you do? (Apart from breaking down and crying, and probably wetting youyself, that is) I imagine you'd say that you can do that, if he is willing to first disclose my middle name (if any). His response is silence. After a while, you gather all your courage and asks again - and now Mr Wannabe Corleone tells you that he heard the question, but deliberately refused to answer. (See what I did there? His not answering your question about my middle name represents NULL - you asked for a data value (my middle name), you got nothing (the absence of a data value - NULL).

    The big guy is still standing there, gun still pointed at your head. You're trying to think of a plan. Than, just as the guy clears his throat and repeats the question about my full name, the proverbial lightbulb above your head switches on. If he can do it, then so can you, right? So you, politely (of course - he's got a gun) inform the man that you heard the question - but deliberately refuse to answer. He pockets his guns, pats you on the shoulder (you cringe - even with the gun gone, he's still at least twice as big and thrice as strong as you), smiles and compliments you: "smart move - I never demanded that you HAD to answer". And then he leaves your room.

    Congratulations! You not only escaped a life-threatening situation that is so absurd that even the Monty Pythons would have ditched the idea, you also did what any proper DBMS should do when asked to concatenate a string to a missing value. Analog to the "garbage in garbage out" principle, you used the "nothing in, nothing out" principle. Also known as NULL propagation. And the reason that this works is that in a database, NULL is a valid "value" (I still prefer "marker") for every data type. So whatever the datatype of the return expression is, the database always has the option of not supplying an answer, if an input that it needs to ensure the answer is valid is missing. The database should not risk returning a value that could be incorrect - and since any assumption about missing values can be incorrect, the database should not return a value when the expression involves a NULL input.

    So where does all the confusion with "unknown" come from? (To my surprise, I see that even Tom, a man I normally agree with on most points, is confused by this. Understandable!)

    Let's stretch the already weird analogy even further. Next day, Mr Corleone returns. Again with that impressive looking and probably extremely dangerous gun. But with a different question. "Quick", he bellows, "you need to tell me if Hugo has 'John' as his middle name. You know the drill, answer wrong and I'll pull this trigger. And don't try that trick you pulled yesterday, this time I do want an answer - no answer, I also pull the trigger". Ouch! This gets scary. While spoiling yet another perfect pair of trousers, you quickly go through your options. You can take a gamble. Statistically, the chance of me, or anyone, having 'John' as middle name is below 50% - so the answer "no" gives you a better chance of surviving than the answer "yes". But it's your life at stake. I think 10% chance of losing is too much when gambling a fistful of dollars, for my life, I need more, much more margin than that!

    You keep pondering your options until Mr Big gets impatient and reminds you that no answer means death as well. Then, you break down.You cry like a baby, and between your tears you whimper "please, please, please don't shoot me. please spare me. You're asking the wrong man. I don't know. That Dutch idiot never told me his middle name, you didn't tell me, how can I know if it's John? I don't know, okay? Please don't shoot me!".

    He grimaces, pockets his gun, and pats your shoulder (you notice that you start to get used to this part). "You did it again. You gave me an answer. I never told you that only 'yes' or 'no' are valid answers. 'I don't know' is just as valid - and absolutely right. Congratulations, you live. Now go take a shower and find some clean clothes, you stink".

    This analogy represents a database evaulating a query with a WHERE clause. The clause here being WHERE MiddleName = 'John'. When coming across my data, the database does not have the data it needs to evaluate the question. But unlike before (when the problem was in the SELECT clause), the database now does not have the option to simply not answer at all. The query processor needs to decide whether or not to include this row, and it can't simply sit there and wait for someone to come tell it my middle name. The only options appear to be yes (true) or no (false). An incorrect answer is not an option - lifes could be at stake (unlikely with a query on middle name, much more likely with a query on, e.g., allergies - but work with me, okay?). What should the query processor do? And then -bingo!- you came along and supplied (with a bit of help from my maffia friend) the solution. A third option - "I don't know" - aka the truth value "unknown". Yes, there it is, finally, the U-word!

    Instead of the traditional boolean logic (based on the two truth values true and false), we now have defined a logic system that uses a third truth value - and is therefor called three-valued logic (3VL). In 3VL, the three possible truth values are true, unknown, and false. The logic operators AND, OR, and NOT operate as one would intuitively expect (e.g. true AND unknown yields unknown, but true OR unknown yields true - if I ask you "is my first name Hugo AND my middle name John", you wouldn't know, but if I ask you "is my first name Hugo OR my middle name John", you'd say yes).

    There's only one thing left to do. The poor query processor is still facing a problem. Okay, it now did get an answer when trying to evaluate "WHERE MiddleName = 'John'" - but that answer is unknown, so should the row be returned or not? To handle that issue, the designers of the relational model simply made a judgement call. They agreed among each other that rows would only be returned if the predicate evaluates to true; both [false and unknown would result in the row being rejected. They could just as well have decided to handle unknown differently (i.e., to include it in the results_ - but they didn't. Deal with it.

    So, bottom line:

    * NULL in a column means "no data value here". It does not mean "Unknown". In a specific context, it could mean "unknown" (just as in a specific context, 42 could mean "just large enough for only the smallest women") - but that is context. NULL itself means only "no data value here".

    * When using NULL in expressions (other than those specifically designed to handle NULLs), the result has to be NULL - you don't want to utter "Hugo Kornelis" as my full name if that could result in a bullet being inserted in your brain in case it's incorrect.

    * When using NULL in predicates (comparisons) (other than those specifically designed to handle NULLs), the result of the predicate is not a traditional boolean value, but the three-valued truth value ]unknown.

    * Please, please, please do not confuse the evaluation of a condition on a NULL value to unknown with the value itself being or representing "unknown". If you do, I'll have no other option but to send that very tall (and very armed) Corleone type your way yet another time (we still have your address). That would be the third time - and as we all know: third time, you're out.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/