Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Four Rules for NULLs Expand / Collapse
Author
Message
Posted Monday, June 27, 2005 11:34 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 2:45 AM
Points: 21,397, Visits: 9,610
Do you have an exemple of that?
Post #194444
Posted Monday, June 27, 2005 1:22 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

I haven't had a chance to thoroughly test this myself yet, but Adam Machanic (MS MVP) has posted a blog on the topic.  He says that, according to his tests, ISNULL() outperforms COALESCE() by about 10-12 percent.  I ran his test locally a few times, and found ISNULL() to be about 8% faster when the first column is NULL, but found COALESCE() performed about 4% faster when both columns were non-NULL on my local SQL Server.  Of course all this might not be applicable to your situation; the results can change as fast as you can say "dual processor".

Here's a link to his article:  http://sqljunkies.com/WebLog/amachanic/archive/2004/11/30.aspx

Adam doesn't test the COALESCE(col1, col2, col3, ...) form, which would require a lot of nesting to simulate with ISNULL() [i.e., ISNULL(ISNULL(col1, col2), col3)].  I haven't investigated this myself, but the fact that COALESCE() has to check for more than 2 column names/values might be part of the reason ISNULL(col1, col2) can outperform it in some circumstances.

Post #194482
Posted Monday, June 27, 2005 2:57 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, July 30, 2014 9:28 PM
Points: 1,140, Visits: 702
Great job on the article!


As for the questions on the blog post, I'd say use COALESCE even if it is sometimes a tiny bit slower... That post was just done because I was exceptionally bored -- really, if you have enough time on your hands that you can be concerned with performance between these two operators, you need to get outside more often (as, apparently, I do! )


--
Adam Machanic
SQL Server MVP
SQLblog.com: THE SQL Server Blog Spot on the Web
Post #194510
Posted Monday, June 27, 2005 3:00 PM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, May 6, 2014 5:51 AM
Points: 6,266, Visits: 2,028

for very large tables I can confirm that for the one-column case ISNULL is a bit faster but I would still prefer coalesce instead of that little bit of speed. 

 




* Noel
Post #194512
Posted Monday, June 27, 2005 3:14 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, July 23, 2014 8:16 AM
Points: 1,035, Visits: 410

If I might borrow an idea from CJ Date...I'd add a 0th rule that would take precidence over the other 4 listed in the article.  Rule 0 states "Do not use NULLs."

Missing values are not allowed in true sets, any more than duplicates are.  As a data architect I am well aware of situations where we just don't have a piece of data, and in some cases I will include nullable columns, but this is the exception rather than the rule.  And the only reason I do it is because the available DBMS's (and the SQL standard) don't provide an acceptable way of dealing with missing information.

The example of employee middle names is one where I would probably allow nulls because middle name is an incidental piece of data that isn't likely to be used except for fairly insignificant informational purposes.  However, if we change our focus to salary, and there are some employees without a salary, the most likely scenario is that salary information would belong in a seperate table from the other employee information and only those employees who have a salary would have a record there...

This illustrates a major problem with SQL NULLS and the way they are used.  There are essentially two reasons for missing information.  The data is unknown, or the data is not applicable.  SQL NULLs lump both reasons together.  In reality, if an attribute is not applicable to a given record then that record (and all others to which the attribute is not applicable) is actually of a different type than the records to which the attribute does apply and should logically be placed in a different table.

In addition to the problems pointed out in article, NULLs are a logical nightmare in aggregate functions.




/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek



*****************/
Post #194514
Posted Monday, June 27, 2005 3:55 PM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Saturday, January 22, 2011 12:01 PM
Points: 702, Visits: 174

COALESCE is fair better than isnull and I make my DBAs use COALESCE.

Example

DECLARE @A TinyInt
SELECT ISNULL(@A,256)

This will lead into Arithmetic overflow error for data type tinyint, value = 256.

But not with

DECLARE @A TinyInt
SELECT COALESCE(@A,256)

but if you want your query to fail then obviously use isnull

 

 




Kindest Regards,

Amit Lohia
Post #194523
Posted Monday, June 27, 2005 6:54 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

Thanks for the suggestion!  You can't go wrong with Date, Celko and the rest of the gurus.

I am with you on eliminating NULLs as much as possible; but as you also point out, sometimes they are the best/only tool SQL gives us to deal with missing information.  I think the key is to use NULLs with caution and keep in mind that they do affect your logic and overall results.

As for your suggestion about storing salary information in a separate table, it makes sense; but when you do a LEFT JOIN on the salary table, you'll still have to deal with NULLs popping up in the result set.  One method I've used to deal with "Not Applicable" and other specific statuses is to use a CHAR(1) column as a status code flag to indicate the specific reason the data is missing.  It requires a little more programming logic on the front end, but it tends to be well worth it if the specific status information is important to your situation.

I'm submitting a second piece to go along with this article that describes how NULLs affect results in ANSI aggregate functions - that seemed to be a big question on another discussion board here at SQL Server Central.

Thanks again!

Post #194542
Posted Monday, June 27, 2005 8:35 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, November 12, 2008 10:24 AM
Points: 31, Visits: 24

Good points as presented. One might however also note that SQL as it exists is not an especially logically "clean" implementation of  >2VL i.e.(arguably, it may be better to avoid nulls or go to a more comprehnsive implementaion of >2VL as for example firstsql attempts).

Specifically, SQL rather "mixes" A Marks and I Marks

I Marks - (Inapplicable - for the tuple in question - the column does not adhere to the table predicate). (Zero length string "user tokens" are suggested for I Marks in the article.) 

A Marks - ("Applicable - value at present unknown")

Such mixing can mean lost data. A good example of this muddling from the article is the example of "John Jones":

In the case of John Jones, however, we have set the Middle_Name column to NULL. This indicates that we do not know whether John has a middle name or not. We may discover, in the future, that John has a middle name; likewise, we may discover he has no middle name. In either case, we will update his row accordingly. So, NULLs represent our "known unknowns"; the only thing we know for a fact is that we do *not* know whether John has a middle name or not."

Consider if it was known by the data entry technician that "John Jones" did have a middle name e.g.(but refused to divulge it) - basically such information is lost with a null - but not with an A Mark.

Also, some problems with the suggestion of implementing Zero-length strings, i.e.(instead of proper I Marks which are not available in SQL), include the issues of what to do about numeric columns; and the issue of distinguishing between when a zero length string "user token" - I Mark - is intended to be represented, and when a zero length string is intended to be represented. HTH




Post #194546
Posted Monday, June 27, 2005 10:00 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

Thank you for the feedback!

ANSI SQL doesn't provide differentiation for 'states' of missing data; only a mark indicating that the data is indeed missing.  In order to differentiate between different states of missing data, I would use another column (possibly a CHAR(1)) indicating the state/reason for the missing data.

The name table is a simplistic example, and the reason why a middle name is missing is probably not compelling enough to go through all the effort to track it (excepting possibly some very specialized businesses).  A more likely scenario might be when tracking answers from a questionnaire from which you will be building statistical analyses.  Some questions might not be applicable to your respondents; others might be disinclined to answer certain questions.  For these types of situations, I find that an additional column indicating the reason for missing data works fairly well.

To my mind there are far too many possible reasons that data could be missing to try to implement a separate 'NULL' type 'mark' for each situation, in the same column that contains your actual data .  At that point, I think you're headed back towards COBOL, where you place '999999999' in a row for 'N/A' and '88888888' for 'Refusal to answer', etc.

Thanks again!

Post #194565
Posted Tuesday, June 28, 2005 6:11 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Sunday, April 6, 2008 5:44 PM
Points: 17, Visits: 3

"To my mind there are far too many possible reasons that data could be missing to try to implement a separate 'NULL' type 'mark' for each situation, in the same column that contains your actual data .  At that point, I think you're headed back towards COBOL, where you place '999999999' in a row for 'N/A' and '88888888' for 'Refusal to answer', etc."

Perhaps, though I Marks and A Marks were Codd's invention /  suggestion, one point of which was presumably to suggest such different DBMS provided NULL marks or tokens specifically so that users would not have to resort to a proliferation of user contrived marks such as as in the Zero length string "user mark" suggested in the article, or with the '999999999' in a row for 'N/A' and '88888888'for 'Refusal to answer' of COBOL.

By this logic, supporting marks in the SQL standard may arguably be "better". Both the article and COBOL examples mentioned would seem to support implementing additional DBMS supported NULL marks rather than user implemented marks and single kind of NULL (to avoid heading back towards COBOL where many user encoded marks were the norm)?




Post #194658
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse