Four Rules for NULLs

  • Mike C

    SSC-Insane

    Points: 23224

    Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/mcoles/fourrulesfornulls.asp

  • Vishal Gamji-215308

    Mr or Mrs. 500

    Points: 599

    Would it matter, performance wise using

    var1 IS NULL

    instead of

    var1 = NULL

    ?

  • Tatsu

    SSCertifiable

    Points: 7824

    Performance doesn't matter if you're not returning the correct results. You can't sacrifice accuracy for performance or your program will be worthless. In the case you mention, "var1 IS NULL" is the only option that guarantees accuracy every time. I can't say for certain but I would expect that to perform better since it is the default but I doubt there is any siginificant different either way.

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • Mike C

    SSC-Insane

    Points: 23224

    Performance-wise it wouldn't make a difference.

    However, to get a TRUE/FALSE result from a where clause like "WHERE var1 = NULL" or a statement like "IF var1 = NULL", you would need to SET ANSI_NULLS OFF.

    This is not recommended, since you could end up turning ANSI_NULLS OFF in some situations, and ON in other situations; it will make your queries/applications non-portable; and it can cause you to return unexpected/incorrect results.

    I highly recommend leaving SET ANSI_NULLS ON and using ANSI Standard NULLs for these reasons.

  • J D-238096

    SSC Enthusiast

    Points: 158

    Here is the question I asked Michael

     

    > Dear Michael,

    >

    > I just finished reading your article on SQL NULLs and it was a

    > great break

    > down on how NULLS actually function.

    >

    > My question deals with the last rule.

    >

    > Rule #4: The ANSI Standard COALESCE() and CASE syntaxes are

    > preferred over

    > ISNULL() or other proprietary syntax

    >

    > I have always used ISNULL() for replacing my NULL values. Can you

    > providesome insight into why COALESCE() is the preferred? Is it

    > because ISNULL is

    > a Microsoft function only or does it provide some other benefit.

    >

    > Have a good day.

    >

    > Peace.

    >

    > John Dobson

     

    Here is the response so that everyone can see this post.  Peace.

     

    Hi John,

    Excellent question!  If you have a chance, you might want to post it to the discussion board so we can share the answer with others - I'm sure plenty of people have the same or similar questions!

    COALESCE() is preferred because it is ANSI Standard.  That means you can use COALESCE() on SQL Server, Oracle, or any other ANSI SQL-92 Standards compliant RDBMS.  The syntax will always be the same on any platform, and the results will be consistent across platforms.

    ISNULL() is specific to the MS SQL Server platform, which means that it is not necessarily implemented on other RDBMS systems; and even if it is implemented on other RDBMS systems, it might not have the same syntax or return the same result as the Microsoft version.  This even applies to future versions of the same product; i.e., SQL 2005 isn't guaranteed to have the same syntax as SQL 2000 when dealing with the non-ANSI Standard platform-specific functions.  That's not to say that MS will change the syntax or results of the ISNULL() function, or drop it completely from usage; but there is always a small chance that something like this could happen in a future version of SQL Server.

    These are the reasons I recommend using ANSI Standard functionality whenever possible.

    Thanks,

    Michael Coles, MCDBA

  • Yelena Varshal

    SSC-Dedicated

    Points: 34207

    Hello,

    This is a very good article. One has to read it once in a while. I am well familiar with the concept and these 4 rules. But  every now and then, especially in the ad hoc outer join queries worked on with a non-SQL speaking visitor I start automatically typing translating from his English to SQL "...where my column = NULL... oops...IS NULL"

    As for the stored code, it definitely should be as much language-independent as possibe. Listen to Miichael, use ANSI syntax.

    Yelena

    Regards,Yelena Varsha

  • Ninja's_RGR'us

    SSC Guru

    Points: 294069

    That'd be a nice syntaxe : @SomeVar = oppsWTFIsItNull(@SomeVar, 0)

    .

  • cs_troyk

    SSCertifiable

    Points: 5160

    Excellent article, Michael.

    There are a couple of other trade-offs to consider when deciding whether to use COALESCE() or ISNULL(). I'll leave it to you to decide which are the pro's and con's of each

    In the following repro script:

    CREATE TABLE dbo.MyTable(

      i int NOT NULL PRIMARY KEY,

      j int NULL

    )

    GO

    INSERT INTO dbo.MyTable(i, j)

    SELECT 1, 100 UNION

    SELECT 2, 200 UNION

    SELECT 3, NULL

    GO

    SELECT COALESCE(j, 'No Name Given')

    FROM dbo.MyTable

    SELECT ISNULL(j, 'No Name Given')

    FROM dbo.MyTable

    GO

    DROP TABLE dbo.MyTable

     

    We see that COALESCE() has weaker type checking than ISNULL, as it allows the selection of the 'j' column where the value is non-NULL. ISNULL(), on the other hand, rejects the statement from the outset. Score one for ISNULL(), in my opinion.

    But COALESCE() allows us to do this:

    DECLARE @MyVar1 int, @MyVar2 int, @MyVar3 int

    SET @MyVar1 = NULL

    SET @MyVar2 = NULL

    SET @MyVar3 = 3

    SELECT COALESCE(@MyVar1, @MyVar2, @MyVar3)

    So COALESCE() allows "n" number of expressions to be listed. Score one for COALESCE().

     

    This last one will getcha for sure if you're not careful:

    DECLARE @MyVC1 varchar(2), @MyVC2 varchar(4)

    SET @MyVC1 = NULL

    SET @MyVC2 = 'abcd'

    SELECT COALESCE(@MyVC1, @MyVC2)

    SELECT ISNULL(@MyVC1, @MyVC2)

    This seems to be related to the "weaker" type-checking that I pointed out earlier. COALESCE() is a bit looser with the implicit conversions, where ISNULL() will always convert to the datatype of the first argument. In this case, COALESCE() seems to be the better choice (trust me, I spent a half hour helping a developer track down what was going on with an ISNULL() construct once), but it may be only masking some sloppy coding which would come back to bite you at a later time.

     

    HTH,

    TroyK

  • J D-238096

    SSC Enthusiast

    Points: 158

    [Quote]

    We see that COALESCE() has weaker type checking than ISNULL, as it allows the selection of the 'j' column where the value is non-NULL. ISNULL(), on the other hand, rejects the statement from the outset. Score one for ISNULL(), in my opinion.

    But COALESCE() allows us to do this:

    DECLARE @MyVar1 int, @MyVar2 int, @MyVar3 int

    SET @MyVar1 = NULL

    SET @MyVar2 = NULL

    SET @MyVar3 = 3

    SELECT COALESCE(@MyVar1, @MyVar2, @MyVar3)

    So COALESCE() allows "n" number of expressions to be listed. Score one for COALESCE().

    [/End Quote]

    __________________________________________________________-

    The above can be a little misleading.  Even though you can list "n" number of expressions, only the first non-NULL will be displayed.  Just wanted to point that out so noone will try to stack several columns together and thinking this will handle all the NULL values.

    Peace.

  • Mike C

    SSC-Insane

    Points: 23224

    Thanks for the feedback!  Excellent points.  I've also read, but haven't yet had a chance to verify, that ISNULL() performs better in some specific situations in WHERE clauses.  That might be a consideration as well.

    Thanks again.

  • Ninja's_RGR'us

    SSC Guru

    Points: 294069

    Do you have an exemple of that?

  • Mike C

    SSC-Insane

    Points: 23224

    I haven't had a chance to thoroughly test this myself yet, but Adam Machanic (MS MVP) has posted a blog on the topic.  He says that, according to his tests, ISNULL() outperforms COALESCE() by about 10-12 percent.  I ran his test locally a few times, and found ISNULL() to be about 8% faster when the first column is NULL, but found COALESCE() performed about 4% faster when both columns were non-NULL on my local SQL Server.  Of course all this might not be applicable to your situation; the results can change as fast as you can say "dual processor".

    Here's a link to his article:  http://sqljunkies.com/WebLog/amachanic/archive/2004/11/30.aspx

    Adam doesn't test the COALESCE(col1, col2, col3, ...) form, which would require a lot of nesting to simulate with ISNULL() [i.e., ISNULL(ISNULL(col1, col2), col3)].  I haven't investigated this myself, but the fact that COALESCE() has to check for more than 2 column names/values might be part of the reason ISNULL(col1, col2) can outperform it in some circumstances.

  • Adam Machanic

    SSCoach

    Points: 15259

    Great job on the article!

    As for the questions on the blog post, I'd say use COALESCE even if it is sometimes a tiny bit slower... That post was just done because I was exceptionally bored -- really, if you have enough time on your hands that you can be concerned with performance between these two operators, you need to get outside more often (as, apparently, I do! )

    --
    Adam Machanic
    whoisactive

  • noeld

    SSC Guru

    Points: 96590

    for very large tables I can confirm that for the one-column case ISNULL is a bit faster but I would still prefer coalesce instead of that little bit of speed. 

     


    * Noel

  • DCPeterson

    SSCoach

    Points: 15145

    If I might borrow an idea from CJ Date...I'd add a 0th rule that would take precidence over the other 4 listed in the article.  Rule 0 states "Do not use NULLs."

    Missing values are not allowed in true sets, any more than duplicates are.  As a data architect I am well aware of situations where we just don't have a piece of data, and in some cases I will include nullable columns, but this is the exception rather than the rule.  And the only reason I do it is because the available DBMS's (and the SQL standard) don't provide an acceptable way of dealing with missing information.

    The example of employee middle names is one where I would probably allow nulls because middle name is an incidental piece of data that isn't likely to be used except for fairly insignificant informational purposes.  However, if we change our focus to salary, and there are some employees without a salary, the most likely scenario is that salary information would belong in a seperate table from the other employee information and only those employees who have a salary would have a record there...

    This illustrates a major problem with SQL NULLS and the way they are used.  There are essentially two reasons for missing information.  The data is unknown, or the data is not applicable.  SQL NULLs lump both reasons together.  In reality, if an attribute is not applicable to a given record then that record (and all others to which the attribute is not applicable) is actually of a different type than the records to which the attribute does apply and should logically be placed in a different table.

    In addition to the problems pointed out in article, NULLs are a logical nightmare in aggregate functions.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

Viewing 15 posts - 1 through 15 (of 153 total)

You must be logged in to reply to this topic. Login to reply