Handle NULLs

  • opc.three (7/8/2011)


    Edit 7/8/2011 7:20 Mtn: corrected property of Jeff's rendition, it "cannot be ordered by column name"

    Any bets? πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/8/2011)


    opc.three (7/8/2011)


    Edit 7/8/2011 7:20 Mtn: corrected property of Jeff's rendition, it "cannot be ordered by column name"

    Any bets? πŸ˜‰

    Oh no (secures wallet), don't get me wrong, not saying it can't be modded to do honor column order, just that it does not do them as it stands. I was hoping you would throw your hat back into the ring now that the requirements had finally settled πŸ˜€

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • SQLkiwi (7/8/2011)


    opc.three (7/8/2011)


    Would you mind grabbing the whole chunk of code from my last post and posting the results from running it on your machine?

    16-core test machine:

    ---------------------------------------------------------------------------

    build temp tables

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1046 ms, elapsed time = 1138 ms.

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2994 ms, elapsed time = 849 ms.

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1294 ms, elapsed time = 1329 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1716 ms, elapsed time = 1792 ms.

    ---------------------------------------------------------------------------

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups

    SQL Server Execution Times:

    CPU time = 2795 ms, elapsed time = 269 ms.

    Thanks! I expected a change in results, especially for the parallel plans, but what a drastic difference when you add additional cores! This is why I stick around some of these threads πŸ™‚

    BTW my last post results were run on 8-cores.

    When moving from <= 8 cores to 16 cores the solution with the lowest CPU ticks changed from "Union All/Cross tab" to tommyh's original "String manipulation". Your parallel version of tommyh's string manip algo takes the cake on elapsed time across the board. I think you're spoiled with the # of cores you're used to working on πŸ˜‰

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Since the requirements are all over hell's half-acre, let's review.

    1. We have two tables which will ALWAYS have a corresponding row in the other table.

    2. If there are no hobbies (just NULLs) for a given row-pair across the two tables, show 4 nulls.

    3. Hobbies must be listed in the order they appear within the columns with Table 1 taking precedence in the order of columns. Any non-null data must be "slammed to the left" in the result of 4 columns with any nulls bringing up the rear to the right.

    4. Dupes are not only allowed but, if they exist, they are required to be displayed in the order identified by #3 above.

    Now... do we have any rules about oddities like it having to be done in (ugh!) all one query?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • opc.three (7/8/2011)


    SQLkiwi (7/8/2011)


    opc.three (7/8/2011)


    Would you mind grabbing the whole chunk of code from my last post and posting the results from running it on your machine?

    16-core test machine:

    ---------------------------------------------------------------------------

    build temp tables

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1046 ms, elapsed time = 1138 ms.

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 2994 ms, elapsed time = 849 ms.

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1294 ms, elapsed time = 1329 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1716 ms, elapsed time = 1792 ms.

    ---------------------------------------------------------------------------

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups

    SQL Server Execution Times:

    CPU time = 2795 ms, elapsed time = 269 ms.

    Thanks! I expected a change in results, especially for the parallel plans, but what a drastic difference when you add additional cores! This is why I stick around some of these threads πŸ™‚

    BTW my last post results were run on 8-cores.

    When moving from <= 8 cores to 16 cores the solution with the lowest CPU ticks changed from "Union All/Cross tab" to tommyh's original "String manipulation". Your parallel version of tommyh's string manip algo takes the cake on elapsed time across the board. I think you're spoiled with the # of cores you're used to working on πŸ˜‰

    Yeah I actually found it very enlightening. It makes sense anyway, but it looks like PIVOT and UNPIVOT don't slow actual data lookup, but are massive CPU hogs in formatting the returned data. So when he threw 16 cores at it, pivot/unpivot sped up nicely.

  • Jeff Moden (7/8/2011)


    Nevyn (7/8/2011)


    Cool stuff, Jeff!

    So do you ever use PIVOT and/or UNPIVOT, or are workarounds almost always better/faster?

    I don't use PIVOT because Cross-Tabs are (usually) both faster and easier to read especially if multiple values need to be aggregated and then pivoted. Since Paul taught me the neat trick with Cross-Apply, I feel the same way about UNPIVOT even though the differences in performance are less pronounced. Heh... I also try to avoid words like "always" and "never". πŸ˜€

    Jeff,

    Is there a link to Paul's CROSS APPLY method for UNPIVOT? I'd love to see it.

    Todd Fifield

  • Nevyn (7/8/2011)


    ...but it looks like PIVOT and UNPIVOT don't slow actual data lookup, but are massive CPU hogs in formatting the returned data. So when he threw 16 cores at it, pivot/unpivot sped up nicely.

    PIVOT is less efficient than writing out MAX & CASE statements because the optimizer does not produce code that is as tight as the hand-crafted version. The difference between UNPIVOT and a manual APPLY is much less significant. There are those that argue that manual APPLY is more flexible in both cases, and it is hard to deny that. Brad Schulz has some excellent posts on this on his blog.

    As far as the PIVOT/UNPIVOT speed-up is concerned, that query form happens to be the only one that 'naturally' goes parallel without any tweaking.

  • tfifield (7/8/2011)


    Is there a link to Paul's CROSS APPLY method for UNPIVOT? I'd love to see it.

    Let me link to someone else's blog for a change...http://bradsruminations.blogspot.com/2010/02/spotlight-on-unpivot-part-1.html

  • Jeff Moden (7/8/2011)


    Heh... I also try to avoid words like "always" and "never". πŸ˜€

    Amen to that. I keep a thesaurus handy for synonyms of 'generally', 'usually'...and so on. It is almost never possible to say never, ever.

    LOL

  • Heh, my first time bit of practice using UNPIVOT, and I discover that I should never seldom use it.

    Now I have to find an excuse to practice using CROSS APPLY.

  • SQLkiwi (7/8/2011)


    tfifield (7/8/2011)


    Is there a link to Paul's CROSS APPLY method for UNPIVOT? I'd love to see it.

    Let me link to someone else's blog for a change...http://bradsruminations.blogspot.com/2010/02/spotlight-on-unpivot-part-1.html

    Paul,

    Thanks for the link. It's in my tool box now.

    Todd Fifield

  • SQLkiwi (7/8/2011)


    tfifield (7/8/2011)


    Is there a link to Paul's CROSS APPLY method for UNPIVOT? I'd love to see it.

    Let me link to someone else's blog for a change...http://bradsruminations.blogspot.com/2010/02/spotlight-on-unpivot-part-1.html

    Hilarious :laugh:

    You know, something just occurred to me…

    Here I am writing a blog post about UNPIVOT, giving it some time in the spotlight, and then I end up slapping it in the face and turning my back on it and showing you a better, more flexible way to accomplish the same thing.

    Poor UNPIVOT.

    Oh well, c’est la vie.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Jeff Moden (7/8/2011)


    Since the requirements are all over hell's half-acre, let's review.

    1. We have two tables which will ALWAYS have a corresponding row in the other table.

    2. If there are no hobbies (just NULLs) for a given row-pair across the two tables, show 4 nulls.

    3. Hobbies must be listed in the order they appear within the columns with Table 1 taking precedence in the order of columns. Any non-null data must be "slammed to the left" in the result of 4 columns with any nulls bringing up the rear to the right.

    4. Dupes are not only allowed but, if they exist, they are required to be displayed in the order identified by #3 above.

    Now... do we have any rules about oddities like it having to be done in (ugh!) all one query?

    Yes. That is all correct.

    For #3, are you saying the order in which they're defined in the table or order in which they're selected from the table? It just so happens that table1.hobby1 is defined in table1 before table1.hobby2,and the same holds for table2.hobby1 and table2.hobby2, but it shouldn't necessarily be the order in which they are defined within each table so much as the order in which they are selected. That's probably what you mean, but I just wanted to make sure. And in this case, it's a moot point since the order in which I'm selecting them is the same as the order in which they are defined in the tables (t1.h1, t1.h2, t2.h1, t2.h2). And yes, table1's hobbies take precedence in the order over table2's hobbies.

    Also, it seems like it should be able to be done in one query, but it doesn't have to be.

    Table1

    1,'BasketBall','Tennis'

    Table2

    1,'Tennis','BasketBall'

    For that example, it would be:

    1, 'BasketBall', 'Tennis', 'Tennis', 'BasketBall'

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Fun problem. πŸ˜€

    This solution might be a little less CPU intensive because the only string manipulation is done with ISNULL or CASE tests.

    The trick is to squeeze out the nulls from right-to-left, instead of left-to-right.

    ;with

    cte1 as (select C1,C2,ISNULL(C3,C4) as C3

    , CASE WHEN C3 IS NULL THEN NULL ELSE C4 END as C4 from #temp)

    ,cte2 as (select C1, ISNULL(C2,C3) as C2

    , CASE WHEN C2 IS NULL THEN C4 ELSE C3 end as C3

    , CASE WHEN C2 IS NULL THEN NULL ELSE C4 END as C4 from cte1)

    ,cte3 as (select ISNULL(C1,C2) as C1

    , CASE WHEN C1 IS NULL then C3 ELSE C2 end as C2

    , CASE WHEN C1 IS NULL THEN C4 ELSE C3 end as C3

    , CASE WHEN C1 IS NULL THEN NULL ELSE C4 END as C4 from cte2)

    select * from cte3

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • My latest submission is based on the string manipulation, but simplified... on my QUAD core PC I like the results, but let's see how it does on yours.

    I borrowed the base string manipulation one (Tommy I think) and Paul's JOIN hint in part...

    SET CONCAT_NULL_YIELDS_NULL OFF;

    SELECT h1.id,

    RTRIM(SUBSTRING(CONVERT(CHAR(36),h1.hobby1) +

    CONVERT(CHAR(36),h1.hobby2) +

    CONVERT(CHAR(36),h2.hobby1) +

    CONVERT(CHAR(36),h2.hobby2), 1, 36)) AS h1,

    RTRIM(SUBSTRING(CONVERT(CHAR(36),h1.hobby1) +

    CONVERT(CHAR(36),h1.hobby2) +

    CONVERT(CHAR(36),h2.hobby1) +

    CONVERT(CHAR(36),h2.hobby2), 37, 36)) AS h2,

    RTRIM(SUBSTRING(CONVERT(CHAR(36),h1.hobby1) +

    CONVERT(CHAR(36),h1.hobby2) +

    CONVERT(CHAR(36),h2.hobby1) +

    CONVERT(CHAR(36),h2.hobby2), 73, 36)) AS h3,

    RTRIM(SUBSTRING(CONVERT(CHAR(36),h1.hobby1) +

    CONVERT(CHAR(36),h1.hobby2) +

    CONVERT(CHAR(36),h2.hobby1) +

    CONVERT(CHAR(36),h2.hobby2), 109, 36)) AS h4

    INTO

    #dummy

    FROM

    dbo.hobby1 h1

    LEFT JOIN

    dbo.hobby2 h2

    ON

    h1.id = h2.id

    OPTION

    (HASH JOIN);

    I wouldn't offer it up if it didn't measure well (this one is at the end)

    ---------------------------------------------------------------------------

    String manipulation (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1794 ms, elapsed time = 1861 ms.

    End of batch 2, completed in 1.872 seconds.

    -------------------------------------------

    ---------------------------------------------------------------------------

    Unpivot/Pivot (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 3351 ms, elapsed time = 1239 ms.

    End of batch 4, completed in 1.328 seconds.

    -------------------------------------------

    ---------------------------------------------------------------------------

    Union All/Cross tab (dups, ordered by column)

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 1981 ms, elapsed time = 1969 ms.

    End of batch 6, completed in 2.009 seconds.

    -------------------------------------------

    ---------------------------------------------------------------------------

    Parsename (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 6021 ms, elapsed time = 6069 ms.

    End of batch 8, completed in 6.079 seconds.

    -------------------------------------------

    ---------------------------------------------------------------------------

    String manipulation SQLKiwi (dups, ordered by column) - modded by opc to allow dups

    SQL Server Execution Times:

    CPU time = 2059 ms, elapsed time = 668 ms.

    End of batch 10, completed in 0.713 seconds.

    --------------------------------------------

    ---------------------------------------------------------------------------

    String manipulation MisterMagoo (via Tommy) (dups, ordered by column)

    SQL Server Execution Times:

    CPU time = 1139 ms, elapsed time = 730 ms.

    End of batch 13, completed in 0.756 seconds.

    --------------------------------------------

    The hidden gem here is that IO is low for this new version as well:

    Table 'hobby2'. Scan count 5, logical reads 797, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'hobby1'. Scan count 5, logical reads 798, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

Viewing 15 posts - 61 through 75 (of 92 total)

You must be logged in to reply to this topic. Login to reply