An interesting thing about isnull

  • richard.maw

    SSC Veteran

    Points: 226

    Comments posted to this topic are about the item An interesting thing about isnull

  • Fal

    SSCrazy

    Points: 2984

    This QotD raises a point I'm always having to remind developers about, and I guess SQL guys can also do with a reminder from time to time.

    However, my 2 cents is rather than...

    declare @S char(4)

    select ISNULL(convert(varchar(16),@s),'Unknown')

    ...I would use...

    declare @S char(4)

    select coalesce(@s,'Unknown')

    ...which also has the ability to take more than 2 arguments.

    S.

  • SQLRNNR

    SSC Guru

    Points: 281252

    Nice question

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR

    SSC Guru

    Points: 281252

    Fal (3/30/2010)


    This QotD raises a point I'm always having to remind developers about, and I guess SQL guys can also do with a reminder from time to time.

    However, my 2 cents is rather than...

    declare @S char(4)

    select ISNULL(convert(varchar(16),@s),'Unknown')

    ...I would use...

    declare @S char(4)

    select coalesce(@s,'Unknown')

    ...which also has the ability to take more than 2 arguments.

    S.

    I, too, prefer the coalesce for this same reason. However, you will find that many prefer the isnull due to an increase in performance.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • richard.maw

    SSC Veteran

    Points: 226

    It's also interesting that coalesce behaves a different way from isnull.

    My point in giving the isnull/convert formulation was to show how to make isnull behave in the way you might expect. I wasn't particularly recommending it.

    Can anyone give an account of why the semantics of isnull and coalesce (with two args) is different?

  • vk-kirov

    SSCertifiable

    Points: 7686

    richard.maw (3/30/2010)


    Can anyone give an account of why the semantics of isnull and coalesce (with two args) is different?

    Here is a nice article about this: http://blogs.msdn.com/sqltips/archive/2008/06/26/differences-between-isnull-and-coalesce.aspx

    COALESCE basically translates to CASE expression and ISNULL is a built-in implemented in the database engine.

    ...

    COALESCE is based on the ANSI SQL standard whereas ISNULL is a proprietary TSQL function

  • James Lean

    SSCertifiable

    Points: 5697

    CirquedeSQLeil (3/30/2010)


    However, you will find that many prefer the isnull due to an increase in performance.

    Do you have a supporting link or further information on the performance difference between ISNULL and COALESCE? I was unaware of any significant difference between the two?

    -----
    JL

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    James Lean (3/31/2010)


    CirquedeSQLeil (3/30/2010)


    However, you will find that many prefer the isnull due to an increase in performance.

    Do you have a supporting link or further information on the performance difference between ISNULL and COALESCE? I was unaware of any significant difference between the two?

    COALESCE is implemented as a quite literal implementation of its definition in the ANSI standard (which is: COALESCE(a, b, ...) is equivalent to CASE WHEN a IS NOT NULL THEN a WHEN b IS NOT NULL THEN b ... ELSE NULL END).

    So if any of the arguments is a subquery, the implementation if COALESCE((subquery), ...) becomes CASE WHEN (subquery) IS NOT NULL THEN (subquery) ... ELSE NULL END. And the optimizer that sits a bit further down the line does apparently not realize that the two subqueries are identical, so it happily produces a plan that will evaluate the same subquery twice.

    Without the subqueries, I have never witnessed a measurable performance difference.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • James Lean

    SSCertifiable

    Points: 5697

    Thanks Hugo. That would seem to be a very specific case, I wouldn't have thought it was a good reason to make a general case for using COALESCE over ISNULL.

    Personally I do tend to use COALESCE, even with only two arguments. That way, if you do need to add more arguments in future you don't have to worry about changing it. Unless I do need to take advantage of the implicit CASTing to the first argument's type, in which case I'll use ISNULL.

    As always, I suppose, "it depends". 😉

    -----
    JL

  • Paul White

    SSC Guru

    Points: 150442

    Hugo Kornelis (3/31/2010)


    So if any of the arguments is a subquery, the implementation if COALESCE((subquery), ...) becomes CASE WHEN (subquery) IS NOT NULL THEN (subquery) ... ELSE NULL END.

    That reminds me of a CASE construction like the following:

    SELECT CASE CHECKSUM(NEWID()) % 3 + 1

    WHEN 1 THEN 'One'

    WHEN 2 THEN 'Two'

    WHEN 3 THEN 'Three'

    ELSE 'WTF?'

    END

    ...which quite frequently hits the ELSE, and that surprises some people.

    And the optimizer that sits a bit further down the line does apparently not realize that the two subqueries are identical, so it happily produces a plan that will evaluate the same subquery twice.

    Once expanded by the CASE, is it true to say that they are identical? It seems to me that the results are non-deterministic, so it would not be safe to collapse these evaluations. (One might argue that it would be safe at the SERIALIZABLE isolation level (or one of the row-versioning isolation levels), but the plan would not then be reusable).

    I blame the expansion of the CASE - which is apparently a documented design decision. I can live with it, though.

    Last thing: the optimizer does a great job at avoiding the extra subquery evaluations where possible. The plan produced is quite likely to feature a PassThrough expression on the joins, so that the subquery is not re-evaluated if a prior term in the COALESCE already produced a non-NULL value.

  • Paul White

    SSC Guru

    Points: 150442

    James Lean (3/31/2010)


    That would seem to be a very specific case, I wouldn't have thought it was a good reason to make a general case for using COALESCE over ISNULL. Personally I do tend to use COALESCE, even with only two arguments. That way, if you do need to add more arguments in future you don't have to worry about changing it. Unless I do need to take advantage of the implicit CASTing to the first argument's type, in which case I'll use ISNULL. As always, I suppose, "it depends". 😉

    Indeed it does depend.

    I think it's a shame, in a way, that almost every QotD or thread that mentions ISNULL or COALESCE always attracts a certain element determined to prove that one is always better than the other. (I'm not including you in that, by the way)

    My own view is that the two functions are very different, and each has its merits. I use both regularly, depending on the query I am writing. I realise that such a balanced view lacks any sort of excitement factor, but there it is 🙂

  • sistemas 95572

    SSC Enthusiast

    Points: 135

    lol owned

    good question

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Paul White NZ (3/31/2010)


    Hugo Kornelis (3/31/2010)


    So if any of the arguments is a subquery, the implementation if COALESCE((subquery), ...) becomes CASE WHEN (subquery) IS NOT NULL THEN (subquery) ... ELSE NULL END.

    That reminds me of a CASE construction like the following:

    SELECT CASE CHECKSUM(NEWID()) % 3 + 1

    WHEN 1 THEN 'One'

    WHEN 2 THEN 'Two'

    WHEN 3 THEN 'Three'

    ELSE 'WTF?'

    END

    ...which quite frequently hits the ELSE, and that surprises some people.

    That is caused by a related and similar reason. This "simple" CASE is also defined in ANSI in terms of the "searched" CASE (and apparently implemented in this way):

    SELECT CASE

    WHEN CHECKSUM(NEWID()) % 3 + 1 = 1 THEN 'One'

    WHEN CHECKSUM(NEWID()) % 3 + 1 = 2 THEN 'Two'

    WHEN CHECKSUM(NEWID()) % 3 + 1 = 3 THEN 'Three'

    ELSE 'WTF?'

    END;

    If you had replaced the CHECKSUM expression with a complex subquery and added another 7 WHEN clauses, the end result would be a query that evaluates the same subquery ten times in a row! <shudder>

    And the optimizer that sits a bit further down the line does apparently not realize that the two subqueries are identical, so it happily produces a plan that will evaluate the same subquery twice.

    Once expanded by the CASE, is it true to say that they are identical? It seems to me that the results are non-deterministic, so it would not be safe to collapse these evaluations. (One might argue that it would be safe at the SERIALIZABLE isolation level (or one of the row-versioning isolation levels), but the plan would not then be reusable).

    I disagree. If the query includes some non-deterministic code (such as NEWID()), there might be room for argument. But if the query in itself is deterministic, than the results should be deterministic. A query is supposed to behave as if it is evaluated at a single moment in time; implementation choices should never influence the results.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Paul White

    SSC Guru

    Points: 150442

    Hugo Kornelis (3/31/2010)


    That is caused by a related and similar reason. This "simple" CASE is also defined in ANSI in terms of the "searched" CASE (and apparently implemented in this way):

    Yep, you can see that by examining the query plan produced - the CASE is always expanded to the searched form.

    I disagree. If the query includes some non-deterministic code (such as NEWID()), there might be room for argument. But if the query in itself is deterministic, than the results should be deterministic.

    I do see your point, but I'm not sure I agree. Once the decision has been taken to expand the CASE, the sub-queries must be re-evaluated. The fact that data access is involved makes the result non-deterministic - the result depends on the state of the database.

    We would not expect the following code to be 'collapsed' into a single sub-query execution:

    SELECT result1 = (SELECT COUNT(*) FROM sys.objects),

    result2 = (SELECT COUNT(*) FROM sys.objects),

    result3 = (SELECT COUNT(*) FROM sys.objects)

    A query is supposed to behave as if it is evaluated at a single moment in time; implementation choices should never influence the results.

    I think this is the crux of the matter. You are only guaranteed this sort of behaviour when running at SERIALIZABLE or one of the row-versioning isolation levels, as I mentioned before. (Also see the example above)

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Paul White NZ (3/31/2010)


    Hugo Kornelis (3/31/2010)


    I disagree. If the query includes some non-deterministic code (such as NEWID()), there might be room for argument. But if the query in itself is deterministic, than the results should be deterministic.

    I do see your point, but I'm not sure I agree. Once the decision has been taken to expand the CASE, the sub-queries must be re-evaluated. The fact that data access is involved makes the result non-deterministic - the result depends on the state of the database.

    We would not expect the following code to be 'collapsed' into a single sub-query execution:

    SELECT result1 = (SELECT COUNT(*) FROM sys.objects),

    result2 = (SELECT COUNT(*) FROM sys.objects),

    result3 = (SELECT COUNT(*) FROM sys.objects)

    Actually, I would expect exactly that to happen. This is a single query, supposed to return results as if it were executed at a single moment in time. Transaction isolation levels govern what happens if multiple statements are executed in succession. So if I change your code to

    SELECT @result1 = (SELECT COUNT(*) FROM sys.objects);

    SELECT @result2 = (SELECT COUNT(*) FROM sys.objects);

    SELECT @result3 = (SELECT COUNT(*) FROM sys.objects);

    then I would agree that, depending on transaction isolation level and @@TRANCOUNT, the results of the three statements might be different on a busy server. But for your code, with the single SELECT, I would consider anything but three equal values a bug.

    And in situations such as the scenario that brought us here, where the actual subquery occurs only once in the submitted query and it's SQL Server itself that expands it, I'd consider it even more buggy!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 1 through 15 (of 45 total)

You must be logged in to reply to this topic. Login to reply