SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Four Rules for NULLs


Four Rules for NULLs

Author
Message
Ninja's_RGR'us
Ninja's_RGR'us
One Orange Chip
One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)

Group: General Forum Members
Points: 28743 Visits: 9671
Do you have an exemple of that?
Mike C
Mike C
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2563 Visits: 1168

I haven't had a chance to thoroughly test this myself yet, but Adam Machanic (MS MVP) has posted a blog on the topic. He says that, according to his tests, ISNULL() outperforms COALESCE() by about 10-12 percent. I ran his test locally a few times, and found ISNULL() to be about 8% faster when the first column is NULL, but found COALESCE() performed about 4% faster when both columns were non-NULL on my local SQL Server. Of course all this might not be applicable to your situation; the results can change as fast as you can say "dual processor".

Here's a link to his article: http://sqljunkies.com/WebLog/amachanic/archive/2004/11/30.aspx

Adam doesn't test the COALESCE(col1, col2, col3, ...) form, which would require a lot of nesting to simulate with ISNULL() [i.e., ISNULL(ISNULL(col1, col2), col3)]. I haven't investigated this myself, but the fact that COALESCE() has to check for more than 2 column names/values might be part of the reason ISNULL(col1, col2) can outperform it in some circumstances.


Adam Machanic
Adam Machanic
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1547 Visits: 714
Great job on the article!


As for the questions on the blog post, I'd say use COALESCE even if it is sometimes a tiny bit slower... That post was just done because I was exceptionally bored -- really, if you have enough time on your hands that you can be concerned with performance between these two operators, you need to get outside more often (as, apparently, I do! )

--
Adam Machanic
SQL Server MVP
SQLblog.com: THE SQL Server Blog Spot on the Web
noeld
noeld
SSCrazy Eights
SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)SSCrazy Eights (9.5K reputation)

Group: General Forum Members
Points: 9522 Visits: 2048

for very large tables I can confirm that for the one-column case ISNULL is a bit faster but I would still prefer coalesce instead of that little bit of speed.




* Noel
DCPeterson
DCPeterson
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1555 Visits: 432

If I might borrow an idea from CJ Date...I'd add a 0th rule that would take precidence over the other 4 listed in the article. Rule 0 states "Do not use NULLs."

Missing values are not allowed in true sets, any more than duplicates are. As a data architect I am well aware of situations where we just don't have a piece of data, and in some cases I will include nullable columns, but this is the exception rather than the rule. And the only reason I do it is because the available DBMS's (and the SQL standard) don't provide an acceptable way of dealing with missing information.

The example of employee middle names is one where I would probably allow nulls because middle name is an incidental piece of data that isn't likely to be used except for fairly insignificant informational purposes. However, if we change our focus to salary, and there are some employees without a salary, the most likely scenario is that salary information would belong in a seperate table from the other employee information and only those employees who have a salary would have a record there...

This illustrates a major problem with SQL NULLS and the way they are used. There are essentially two reasons for missing information. The data is unknown, or the data is not applicable. SQL NULLs lump both reasons together. In reality, if an attribute is not applicable to a given record then that record (and all others to which the attribute is not applicable) is actually of a different type than the records to which the attribute does apply and should logically be placed in a different table.

In addition to the problems pointed out in article, NULLs are a logical nightmare in aggregate functions.



/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek



*****************/
Amit Lohia
Amit Lohia
SSC Eights!
SSC Eights! (944 reputation)SSC Eights! (944 reputation)SSC Eights! (944 reputation)SSC Eights! (944 reputation)SSC Eights! (944 reputation)SSC Eights! (944 reputation)SSC Eights! (944 reputation)SSC Eights! (944 reputation)

Group: General Forum Members
Points: 944 Visits: 174

COALESCE is fair better than isnull and I make my DBAs use COALESCE.

Example

DECLARE @A TinyInt
SELECT ISNULL(@A,256)

This will lead into Arithmetic overflow error for data type tinyint, value = 256.

But not with

DECLARE @A TinyInt
SELECT COALESCE(@A,256)

but if you want your query to fail then obviously use isnull




Kindest Regards,

Amit Lohia
Mike C
Mike C
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2563 Visits: 1168

Thanks for the suggestion! You can't go wrong with Date, Celko and the rest of the gurus.

I am with you on eliminating NULLs as much as possible; but as you also point out, sometimes they are the best/only tool SQL gives us to deal with missing information. I think the key is to use NULLs with caution and keep in mind that they do affect your logic and overall results.

As for your suggestion about storing salary information in a separate table, it makes sense; but when you do a LEFT JOIN on the salary table, you'll still have to deal with NULLs popping up in the result set. One method I've used to deal with "Not Applicable" and other specific statuses is to use a CHAR(1) column as a status code flag to indicate the specific reason the data is missing. It requires a little more programming logic on the front end, but it tends to be well worth it if the specific status information is important to your situation.

I'm submitting a second piece to go along with this article that describes how NULLs affect results in ANSI aggregate functions - that seemed to be a big question on another discussion board here at SQL Server Central.

Thanks again!


Sql DBA
Sql DBA
SSC Rookie
SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)

Group: General Forum Members
Points: 37 Visits: 24

Good points as presented. One might however also note that SQL as it exists is not an especially logically "clean" implementation of >2VL i.e.(arguably, it may be better to avoid nulls or go to a more comprehnsive implementaion of >2VL as for example firstsql attempts).

Specifically, SQL rather "mixes" A Marks and I Marks

I Marks - (Inapplicable - for the tuple in question - the column does not adhere to the table predicate). (Zero length string "user tokens" are suggested for I Marks in the article.)

A Marks - ("Applicable - value at present unknown")

Such mixing can mean lost data. A good example of this muddling from the article is the example of "John Jones":

In the case of John Jones, however, we have set the Middle_Name column to NULL. This indicates that we do not know whether John has a middle name or not. We may discover, in the future, that John has a middle name; likewise, we may discover he has no middle name. In either case, we will update his row accordingly. So, NULLs represent our "known unknowns"; the only thing we know for a fact is that we do *not* know whether John has a middle name or not."

Consider if it was known by the data entry technician that "John Jones" did have a middle name e.g.(but refused to divulge it) - basically such information is lost with a null - but not with an A Mark.

Also, some problems with the suggestion of implementing Zero-length strings, i.e.(instead of proper I Marks which are not available in SQL), include the issues of what to do about numeric columns; and the issue of distinguishing between when a zero length string "user token" - I Mark - is intended to be represented, and when a zero length string is intended to be represented. HTH





Mike C
Mike C
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2563 Visits: 1168

Thank you for the feedback!

ANSI SQL doesn't provide differentiation for 'states' of missing data; only a mark indicating that the data is indeed missing. In order to differentiate between different states of missing data, I would use another column (possibly a CHAR(1)) indicating the state/reason for the missing data.

The name table is a simplistic example, and the reason why a middle name is missing is probably not compelling enough to go through all the effort to track it (excepting possibly some very specialized businesses). A more likely scenario might be when tracking answers from a questionnaire from which you will be building statistical analyses. Some questions might not be applicable to your respondents; others might be disinclined to answer certain questions. For these types of situations, I find that an additional column indicating the reason for missing data works fairly well.

To my mind there are far too many possible reasons that data could be missing to try to implement a separate 'NULL' type 'mark' for each situation, in the same column that contains your actual data . At that point, I think you're headed back towards COBOL, where you place '999999999' in a row for 'N/A' and '88888888' for 'Refusal to answer', etc.

Thanks again!


DrChips
DrChips
SSC Rookie
SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)

Group: General Forum Members
Points: 37 Visits: 3

"To my mind there are far too many possible reasons that data could be missing to try to implement a separate 'NULL' type 'mark' for each situation, in the same column that contains your actual data .  At that point, I think you're headed back towards COBOL, where you place '999999999' in a row for 'N/A' and '88888888' for 'Refusal to answer', etc."

Perhaps, though I Marks and A Marks were Codd's invention / suggestion, one point of which was presumably to suggest such different DBMS provided NULL marks or tokens specifically so that users would not have to resort to a proliferation of user contrived marks such as as in the Zero length string "user mark" suggested in the article, or with the '999999999' in a row for 'N/A' and '88888888'for 'Refusal to answer' of COBOL.

By this logic, supporting marks in the SQL standard may arguably be "better". Both the article and COBOL examples mentioned would seem to support implementing additional DBMS supported NULL marks rather than user implemented marks and single kind of NULL (to avoid heading back towards COBOL where many user encoded marks were the norm)?





Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search