When NULL IS NOT NULL

  • To all my SQL friends out there.

    While trying to construct a 1M row test harness I ran across an interesting enigma shrouded in uncertainty. Eventually I was able to construct the test data I needed but the results below could use a bit of expert examination to explain them.

    3 queries - any takers?

    -- This query may return no rows (if it does just run it

    -- again until you get some) but when it does return something

    -- you gotta ask yourself "why NULL IS NOT NULL?"

    WITH Tally (n) AS (

    SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    SELECT n, [Type], value=(ABS(CHECKSUM(NEWID()))%10000)/100.

    FROM Tally a

    OUTER APPLY (

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'a' END

    UNION ALL

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b' END

    ) c ([Type])

    WHERE [Type] IS NOT NULL

    ORDER BY n;

    -- This query always returns rows and I'm scratching my head

    -- asking myself "why NULL IS NOT NULL?"

    WITH Tally (n) AS (

    SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    SELECT m, [Type], value=(ABS(CHECKSUM(NEWID()))%10000)/100.

    FROM Tally a

    OUTER APPLY (

    SELECT n,CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'a' END

    UNION ALL

    SELECT n,CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b' END

    ) c (m,[Type])

    WHERE [Type] IS NOT NULL;

    -- This query returns an interesting error and now I'm really befuddled

    WITH Tally (n) AS (

    SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    SELECT m, [Type], value=(ABS(CHECKSUM(NEWID()))%10000)/100.

    FROM Tally a

    OUTER APPLY (

    SELECT n,CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'a' END

    UNION ALL

    SELECT n,CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b' END

    ) c (m,[Type])

    WHERE [Type] IS NOT NULL

    ORDER BY m;

    When you change UNION ALL to UNION in the first query, you also get interesting results when you run it multiple times.

    Edit: Note that OUTER APPLY or CROSS APPLY seems to make no difference.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • ... Note that OUTER APPLY or CROSS APPLY seems to make no difference.

    same goes for JOIN, check the third column values πŸ™‚

    WITH Tally (n) AS (

    SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    SELECT n, [Type], CASE WHEN [type] IS NULL THEN 'null' ELSE 'not null' END

    FROM Tally a

    join (

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'a' END

    UNION ALL

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b' END

    ) c ([Type]) on 1=1

    WHERE [Type] IS NOT NULL

    ORDER BY n;

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • And (as ChrisM pointed out to me) it has nothing to do with NULLs:

    WITH Tally (n) AS (

    SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    SELECT n, [Type], CASE WHEN [type] = '#' THEN '#' ELSE 'not #' END

    FROM Tally a

    CROSS APPLY (

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'a' ELSE '#' END

    UNION ALL

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b' ELSE '#' END

    ) c ([Type])

    WHERE [Type] <> '#'

    ORDER BY n;

    Something about the non-deterministic value returned by NEWID() I presume.

    All this and I'm still clueless. :w00t:


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This sounds like a Paul White kinda thing!

    Smells like a bug in the execution plan to me.

    To distil the unexpected result down to it's simplest form, this query will (sometimes) returned rows that are NOT matching the predicate defined in the where clause, e.g. a row (or rows) with the value '#':

    SELECT c.[Type]

    FROM ( SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'a'

    ELSE '#'

    END [Type]

    UNION ALL

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b'

    ELSE '#'

    END [Type]

    ) c

    WHERE c.[Type] <> '#'

    It specifically seems to be the UNION ALL that causes the inconsistency. If you remove that then the results always seem correct.

  • HowardW (7/26/2013)


    This sounds like a Paul White kinda thing!...

    Do we have to put on pointy hats, form a circle, and chant? That's ok, it's friday and I've already set Alice Morgan as my desktop this lunchtime.

    Here's a few more hints at what's going on. Notice in the last two queries - which show the effect - the number of rows filtered out is a multiple of 3: you end up with 0, 3 or 6 rows in the output.

    -- 1. Works as expected

    SELECT [Type]

    FROM (VALUES (1),(2),(3)) a (n)

    CROSS APPLY (VALUES

    (CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b' ELSE '-' END)

    ) c ([Type])

    WHERE [Type] <> '-';

    -- 2. Works as expected

    SELECT [Type]

    FROM (VALUES (1),(2),(3)) a (n)

    CROSS APPLY (SELECT SRV = CHECKSUM(NEWID())) x

    CROSS APPLY (VALUES

    (CASE WHEN SRV > 0 THEN 'a' ELSE '-' END),

    (CASE WHEN SRV > 0 THEN 'b' ELSE '-' END)

    ) c ([Type])

    WHERE [Type] <> '-';

    -- 3. Incorrect

    SELECT [Type]

    FROM (VALUES (1),(2),(3)) a (n)

    CROSS APPLY (VALUES

    (CHECKSUM(NEWID())%2),

    (CHECKSUM(NEWID())%2)

    ) c ([Type])

    WHERE [Type] > 0;

    -- 4. Appears to generate a result set more often than q3,

    -- but still incorrect

    SELECT [Type]

    FROM (VALUES (1),(2),(3)) a (n)

    CROSS APPLY (VALUES

    (ABS(CHECKSUM(NEWID())%2)),

    (ABS(CHECKSUM(NEWID())%2))

    ) c ([Type])

    WHERE [Type] > 0;

    I wonder if it's related to this, which was fixed in 2k?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • To add a little more info, this is reproducible on 2012 and 2008 (and R2), but not 2005.

    In 2005, the execution plan shows two separate constant scans and two filter operators for each part of the UNION ALL, followed by a concatenation, whereas in 2008 onwards, it looks like it's doing a re-write to a single constant scan with two rows (using the new FROM (VALUES(),()) support), followed by a filter.

    Not sure there's much in the execution plan itself to explain the behaviour as other non-deterministic functions (e.g. RAND) don't create the same behaviour, but have identical plans e.g.:

    SELECT *

    FROM (SELECT CASE WHEN CONVERT(INT,RAND()*100,0) > 50 THEN 'a'

    ELSE '#'

    END [Type] UNION ALL

    SELECT CASE WHEN CONVERT(INT,RAND()*100) < 50 THEN 'a'

    ELSE '#'

    END [TYPE]

    ) A

    WHERE TYPE <> '#'

    You can also get around the original bug by modifying the query enough to avoid the rewrite. e.g.:

    SELECT c.[Type]

    FROM ( SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'a'

    ELSE '#'

    END [Type] FROM (SELECT 1 a) a

    UNION ALL

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b'

    ELSE '#'

    END [Type] FROM (SELECT 1 a) a

    ) c

    WHERE c.[Type] <> '#'

    I'm guessing somewhere (maybe in the filter), it's actually re-calculating the result of the function and evaluating that, rather than using the rows produced by the constant scan.

  • Thanks to all for taking a look.

    I hesitate to suggest this is a SQL bug, but perhaps the Gods of SQL were asleep at the wheel on that query plan.

    Nobody's made a suggestion about the strange error delivered by my original query #3. πŸ™‚


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (7/28/2013)


    Nobody's made a suggestion about the strange error delivered by my original query #3. πŸ™‚

    I don't get an error, strange or otherwise from query 3. To be honest, I'm less interested by code that bombs out with an error than I am with code that returns incorrect results, as an error's obvious and the code can be adapted, but incorrect results can have real impacts. Often you find with these kind of bugs that it can throw an assertion failure message depending on the plan generated, I've seen that with a MERGE bug before.

  • No error on this platform:

    Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64)

    Jun 28 2012 08:36:30 Copyright (c) Microsoft Corporation Standard Edition (64-bit)

    on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (7/29/2013)


    No error on this platform:

    Microsoft SQL Server 2008 R2 (SP2) - 10.50.4000.0 (X64)

    Jun 28 2012 08:36:30 Copyright (c) Microsoft Corporation Standard Edition (64-bit)

    on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)

    I will need to go back and check what version I got that error message on then.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Here is my version:

    Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (X64) Jul 9 2008 14:17:44 Copyright (c) 1988-2008 Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )

    And this is the error returned by query #3:

    Msg 681, Level 16, State 3, Line 34

    Attempting to set a non-NULL-able column's value to NULL.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Yeah I have the same error on the:

    Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64)

    Mar 29 2009 10:11:52

    Copyright (c) 1988-2008 Microsoft Corporation

    Express Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

    Looks like all of these constitute a nasty SQL Server bug or a very special feature!

    :w00t:

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Funnily enough I submitted a QotD recently covering a very similar problem, so you will all get your points on that one...

    As far as I understand it, the cause of these issues is that humans assume too much - in this case, that the value of a call to NEWID() will remain the same when referenced more than once.

    I know you know that, and are not expecting the value to be the same for each of the three times is it included in these queries, but you are making that assumption in the WHERE clause.

    If you examine the query plan, the NEWID() values in the outer apply are never materialised into a variable - at least not in mine - this is a clue that whenever the WHERE clause kicks in and the filter is applied, you are not filtering using the same values that will feed the column aliased as [Type] in the SELECT, but filtering on NEW values.

    This means it is possible for the filter to allow through unexpected values in the SELECT. It's like comparing [A] to null, then SELECTING .

    Of course, I could be wrong and will skulk away with my tail between my legs when Paul tells me I am wrong πŸ˜›

    I do not know anything about the error for query 3 - I would say a bug as it does not happen in SQL2012...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Blimey MM if I was a snake oil brewer I'd want you leading my sales force! Whether you're on the right track or not, it's a great story!

    Have a look at Node ID = 2 in the plan for this jobbie:

    SELECT [Type]

    FROM (VALUES (1),(2),(3)) a (n)

    CROSS APPLY (VALUES

    (CHECKSUM(NEWID())%2),

    (CHECKSUM(NEWID())%2)

    ) c ([Type])

    WHERE [Type] > 0;

    The "constant scan" generating the two rows. It references CHECKSUM, NEWID and MOD, and they don't appear to be referenced anywhere else in the plan. There's no Compute Scalar to kid you into thinking the calculations might be performed in an unexpected place either. I don't think there's anything in the plan which really describes what's going on here, marking it as different to this thread.

    Stumped.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • mister.magoo (7/29/2013)

    Of course, I could be wrong and will skulk away with my tail between my legs when Paul tells me I am wrong πŸ˜›

    I guess Paul will tell you you're right! http://sqlblog.com/blogs/paul_white/archive/2012/09/05/compute-scalars-expressions-and-execution-plan-performance.aspx

    To illustrate it, let's rewrite the first query:

    WITH Tally (n) AS (

    SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    SELECT n, [Type], value=(ABS(CHECKSUM(NEWID()))%10000)/100.

    FROM Tally a

    CROSS APPLY ( /* OUTER APPLY changed to CROSS APPLY */

    (

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'a' END

    UNION ALL

    SELECT CASE WHEN CHECKSUM(NEWID()) > 0 THEN 'b' END

    )

    EXCEPT

    SELECT NULL /* Added */

    ) c ([Type])

    --WHERE [Type] IS NOT NULL /* Removed */

    ORDER BY n;

    Looking at the query plan, the UNION ALL/EXCEPT subquery is evaluated only once and all rows in the outer query are all joined with exactly 0, 1 or 2 rows.

    Edit: poor example, because it does not really illustrate that the expression (Scalar Operator(CASE WHEN checksum(newid())>(0) THEN 'a' ELSE NULL END)); (Scalar Operator(CASE WHEN checksum(newid())>(0) THEN 'b' ELSE NULL END)) is not recomputed when checking for NULL. Anyway, Paul White's blog really seems to justify mister.magoo's presumption.

  • Viewing 15 posts - 1 through 15 (of 17 total)

    You must be logged in to reply to this topic. Login to reply