An Alternative (Better?) Method to UNPIVOT (SQL Spackle)

  • Comments posted to this topic are about the item An Alternative (Better?) Method to UNPIVOT (SQL Spackle)


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Very nicely done, Dwain.

    You can also do similar with SELECT/UNION ALL in 2005.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/2/2012)


    Very nicely done, Dwain.

    You can also do similar with SELECT/UNION ALL in 2005.

    I love it when I attract attention from the big boys!

    Thanks for the praise :blush: and for stopping by.:hehe:


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I was going to send the link to this to a colleague who's been learning unpivot BUT he'd probably be offended withthe mickey take at the top since his friend, another colleague, really is named Philip McCracken!

  • Very nice article.

    Regards,

    Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

    http://basitaalishan.com
  • Nice article.

    I see Itzik using this approach a lot as well. Personally I like it but is stuck due to a third party program in an lower compatibility level quite often :crying:

  • Thanks for the article, very interresting.

    Basically, your "other Unpivot" technique is a brillant way to use the new (from 2008) Table Value Constructor

    Using the same method, from your first example you could replace :

    INSERT INTO #Orders

    SELECT 1, NULL, 3 UNION ALL SELECT 2, 5, 4 UNION ALL SELECT 1, 3, 10

    By the following :

    INSERT INTO #Orders VALUES

    (1, NULL, 3), (2, 5, 4), (1, 3, 10)

    I don't think it's a real gain but maybe it's more readable.

  • Very cool. Will have to dig into this and see whether or not it's something that is viable for use in my environment.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Another cracking good article Dwain!

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Great article Dwain. I hadn't thought of trying the value constructor in the apply operator before... something new to play with.

    Thanks!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Interesting read. Much appreciated.

    Cheers

  • Very interesting Dwain!

    A very nice read.

    Thanks for sharing.

    -- Gianluca Sartori

  • There is a way to get NULL values back in an unpivot. It's not pretty but it works.

    IF OBJECT_ID('tempdb..#Orders','U') IS NOT NULL

    DROP TABLE #Orders

    -- DDL and sample data for UNPIVOT Example 1

    CREATE TABLE #Orders

    (Orderid int identity, GiftCard int, TShirt int, Shipping int)

    INSERT INTO #Orders

    SELECT 1, NULL, 3 UNION ALL SELECT 2, 5, 4 UNION ALL SELECT 1, 3, 10

    SELECT * FROM #Orders

    DECLARE @Start DATETIME

    DECLARE @End DATETIME

    SET @Start = GETDATE()

    -- Traditional UNPIVOT

    SELECT OrderID, convert(varchar(15), ProductName) [ProductName], ProductQty

    FROM (

    SELECT OrderID, GiftCard, TShirt, Shipping

    FROM #Orders) p

    UNPIVOT

    (ProductQty FOR ProductName IN ([GiftCard], [TShirt], [Shipping])) as unpvt

    SET @End = GETDATE()

    SELECT DATEDIFF(ms,@Start,@End) AS ElapsedTime

    -- Set a placeholder for NULL values.

    -- Let's assume, the integers are always greater than zero.

    -- if this is the case, we can use 0 to represent a NULL

    DECLARE @NULL INT

    SET @NULL = 0

    SET @Start = GETDATE()

    SELECT OrderID, convert(varchar(15), ProductName) [ProductName], NULLIF(ProductQty,0) AS ProductQty

    FROM (

    SELECT OrderID

    , COALESCE(GiftCard,@NULL) AS GiftCard

    , COALESCE(TShirt,@NULL) AS TShirt

    , COALESCE(Shipping,@NULL) AS Shipping

    FROM #Orders) p

    UNPIVOT

    (ProductQty FOR ProductName IN ([GiftCard], [TShirt], [Shipping])) as unpvt

    SET @End=GETDATE()

    SELECT DATEDIFF(ms,@Start,@End) AS ElapsedTime

    Like i said, it's not pretty but will return your NULL values and in this case is faster than the traditional method on my laptop.

  • Good to know ... I was unawares.

    Here's the last example done with pivot/unpivot ... a bit more code, performance not as good ... but much more maintainable/flexible (IMO) given the problem setup ... generalizes the computations and can adjust for the source month columns coming in and out of use. In such a real world situation Users would likely accept a performance goal of tolerable ...

    Those of you who continue to profess a belief in the Users will receive the standard substandard training which will result in your eventual elimination. Tron (1982)

    ;WITH [CTE_DETAIL]([ID],[ProductLine],[JanRev],[JanExp],[FebRev],[FebExp],[MarRev],[MarExp],[AprRev],[AprExp],[MayRev],[MayExp],[JunRev],[JunExp],[JulRev],[JulExp],[AugRev],[AugExp],[SepRev],[SepExp],[OctRev],[OctExp],[NovRev],[NovExp],[DecRev],[DecExp])

    AS

    (

    SELECT[ID],

    [ProductLine],

    --$0 AS

    [JanRev],

    --$0 AS

    [JanExp],

    --$0 AS

    [FebRev],

    --$0 AS

    [FebExp],

    --$0 AS

    [MarRev],

    --$0 AS

    [MarExp],

    $0 AS

    [AprRev],

    $0 AS

    [AprExp],

    $0 AS

    [MayRev],

    $0 AS

    [MayExp],

    $0 AS

    [JunRev],

    $0 AS

    [JunExp],

    $0 AS

    [JulRev],

    $0 AS

    [JulExp],

    $0 AS

    [AugRev],

    $0 AS

    [AugExp],

    $0 AS

    [SepRev],

    $0 AS

    [SepExp],

    $0 AS

    [OctRev],

    $0 AS

    [OctExp],

    $0 AS

    [NovRev],

    $0 AS

    [NovExp]

    ,

    $0 AS

    [DecRev],

    $0 AS

    [DecExp]

    FROM#ProfitLoss

    ),

    [CTE_DETAIL_UNPIVOTED]([ID],[ProductLine],[sMONTH],[iMONTH],[sACCT],[AMOUNT])

    AS

    (

    SELECT[ID],

    [ProductLine],

    LEFT([PERIOD_ITEM], 3) AS [sMONTH],

    MONTH(LEFT([PERIOD_ITEM], 3) + '1, 2000') AS [iMONTH],

    RIGHT([PERIOD_ITEM], LEN([PERIOD_ITEM]) - 3) AS [sACCT],

    [AMOUNT]

    FROM(

    SELECT[ID],

    [ProductLine],

    [JanRev],

    [JanExp],

    [JanRev]-[JanExp] AS [JanMargin],

    [FebRev],

    [FebExp],

    [FebRev]-[FebExp] AS [FebMargin],

    [MarRev],

    [MarExp],

    [MarRev]-[MarExp] AS [MarMargin],

    [AprRev],

    [AprExp],

    [AprRev]-[AprExp] AS [AprMargin],

    [MayRev],

    [MayExp],

    [MayRev]-[MayExp] AS [MayMargin],

    [JunRev],

    [JunExp],

    [JunRev]-[JunExp] AS [JunMargin],

    [JulRev],

    [JulExp],

    [JulRev]-[JulExp] AS [JulMargin],

    [AugRev],

    [AugExp],

    [AugRev]-[AugExp] AS [AugMargin],

    [SepRev],

    [SepExp],

    [SepRev]-[SepExp] AS [SepMargin],

    [OctRev],

    [OctExp],

    [OctRev]-[OctExp] AS [OctMargin],

    [NovRev],

    [NovExp],

    [NovRev]-[NovExp] AS [NovMargin],

    [DecRev],

    [DecExp],

    [DecRev]-[DecExp] AS [DecMargin]

    FROM[CTE_DETAIL]

    ) AS [PIVOTED] UNPIVOT

    (

    [AMOUNT] FOR [PERIOD_ITEM] IN

    (

    [JanRev],

    [JanExp],

    [JanMargin],

    [FebRev],

    [FebExp],

    [FebMargin],

    [MarRev],

    [MarExp],

    [MarMargin],

    [AprRev],

    [AprExp],

    [AprMargin],

    [MayRev],

    [MayExp],

    [MayMargin],

    [JunRev],

    [JunExp],

    [JunMargin],

    [JulRev],

    [JulExp],

    [JulMargin],

    [AugRev],

    [AugExp],

    [AugMargin],

    [SepRev],

    [SepExp],

    [SepMargin],

    [OctRev],

    [OctExp],

    [OctMargin],

    [NovExp],

    [NovRev],

    [NovMargin],

    [DecRev],

    [DecExp],

    [DecMargin]

    )

    ) AS [UNPIVIOTED]

    ),

    [CTE_DETAIL_CALC]

    (

    [ID],

    [ProductLine],

    [sMONTH],

    [sACCT],

    [AMOUNT],

    [ $ O/(U) Prev Mo],

    [ % O/(U) Prev Mo]

    )

    AS

    (

    SELECTA.[ID],

    A.[ProductLine],

    A.[sMONTH],

    A.[sACCT],

    A.[AMOUNT],

    COALESCE(A.[AMOUNT] - B.[AMOUNT], $0) AS [ $ O/(U) Prev Mo],

    ROUND(100*COALESCE((A.[AMOUNT] - B.[AMOUNT])/NULLIF(B.[AMOUNT],$0),$0), 1) AS [ % O/(U) Prev Mo]

    FROM[CTE_DETAIL_UNPIVOTED] A LEFT JOIN

    [CTE_DETAIL_UNPIVOTED] B

    ONA.[ID] = B.[ID]

    ANDA.[sACCT] = B.[sACCT]

    ANDA.[iMONTH] = B.[iMONTH] + 1

    )

    SELECT[ID],

    [ProductLine],

    [LineItem],

    MAX([Jan]) AS [Jan],

    MAX([Feb]) AS [Feb],

    MAX([Mar]) AS [Mar],

    MAX([Apr]) AS [Apr],

    MAX([May]) AS [May],

    MAX([Jun]) AS [Jun],

    MAX([Jul]) AS [Jul],

    MAX([Aug]) AS [Aug],

    MAX([Sep]) AS [Sep],

    MAX([Oct]) AS [Oct],

    MAX([Nov]) AS [Nov],

    MAX([Dec]) AS [Dec]

    FROM(

    SELECT[ID],

    [ProductLine],

    [sACCT],

    CASE [sACCT]

    WHEN 'Rev' THEN 'Revenues'

    WHEN 'Exp' THEN 'Expenses'

    WHEN 'Margin' THEN 'Margin'

    ELSE 'UNKNOWN'

    END AS [LineItem],

    [Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec]

    FROM(SELECT[ID], [ProductLine], [sMONTH], [sACCT], [AMOUNT], [ $ O/(U) Prev Mo], [ % O/(U) Prev Mo]FROM [CTE_DETAIL_CALC]) AS [UNPIVOTED] PIVOT

    (MAX([AMOUNT]) FOR [sMONTH] IN ([Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])) AS [PIVOTED_AMOUNT]

    UNION

    SELECT[ID],

    [ProductLine],

    [sACCT],

    ' $ O/(U) Prev Mo' AS [LineItem],

    [Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec]

    FROM(SELECT[ID], [ProductLine], [sMONTH], [sACCT], [AMOUNT], [ $ O/(U) Prev Mo], [ % O/(U) Prev Mo]FROM [CTE_DETAIL_CALC]) AS [UNPIVOTED] PIVOT

    (MAX([ $ O/(U) Prev Mo]) FOR [sMONTH] IN ([Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])) AS [PIVOTED_AMOUNT]

    UNION

    SELECT[ID],

    [ProductLine],

    [sACCT],

    ' % O/(U) Prev Mo' AS [LineItem],

    [Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec]

    FROM(SELECT[ID], [ProductLine], [sMONTH], [sACCT], [AMOUNT], [ $ O/(U) Prev Mo], [ % O/(U) Prev Mo]FROM [CTE_DETAIL_CALC]) AS [UNPIVOTED] PIVOT

    (MAX([ % O/(U) Prev Mo]) FOR [sMONTH] IN ([Jan],[Feb],[Mar],[Apr],[May],[Jun],[Jul],[Aug],[Sep],[Oct],[Nov],[Dec])) AS [PIVOTED_AMOUNT]

    ) A

    GROUP BY [ID],[ProductLine],[LineItem],[sACCT]

    ORDER BY

    [ID],

    [ProductLine],

    CASE [sACCT]

    WHEN 'Rev' THEN 10

    WHEN 'Exp' THEN 20

    WHEN 'Mar' THEN 30

    ELSE 1000

    END,

    CASE [LineItem]

    WHEN 'Revenues' THEN 10

    WHEN 'Expenses' THEN 10

    WHEN 'Margin' THEN 10

    WHEN ' $ O/(U) Prev Mo' THEN 20

    WHEN ' % O/(U) Prev Mo' THEN 30

    ELSE 1000

    END

  • Before I get into thanking everybody for their comments, last weekend I looked into the performance gains achieved by the "other UNPIVOT" approach because I believe there's no such thing as a free lunch. I got some interesting results that I thought worthy of an update.

    I reran the single UNPIVOT example with larger record sets and averaged results over 5 runs appear below.

    I have highlighted in red the slower of the two approaches in case. In this test harness, I dumped the display results to local variables to suppress them.

    From the results, we see the CROSS APPLY VALUES consistently beats the elapsed time of UNPIVOT up through 5,000,000 rows. Two specific points though:

    1. There seems to be a breakeven point where CPU time for CAV begins to exceed that of UNPIVOT (at perhaps 800K rows).

    2. The CAV results that show faster elapsed times than CPU (starting at around 500K rows) seemed odd to me, until I realized that I'm running these tests on a Core i5 processor machine, so this would imply that SQL is parallelizing the query. I tested this theory by adding OPTION (MAXDOP 1) and found that CPU/Elapsed time results were more normal again (i.e., elapsed time slightly greater than CPU time).

    However I got these results at 5,000,000 rows (single run):

    (5000000 row(s) affected)

    SQL Server Execution Times:

    CPU time = 4072 ms, elapsed time = 4304 ms.

    SQL Server Execution Times:

    CPU time = 2652 ms, elapsed time = 2688 ms.

    So in other words, for CAV CPU time dropped from about 6989ms (this was the 75% sparseness case) to 2652ms. I am not sure why UNPIVOT was also reduced (4539ms to 4072ms) but obviously its a lesser drop. In any event, now CAV is beating UNPIVOT in CPU as well and also runs in lesser elapsed time! Seems to be a double benefit.

    While this does seem to confirm my theory about parallelism, I'm not sure exactly what to make of the overall drop in CPU usage. Perhaps with lower parallelism the CPU needs to do less work?

    As usual, the answer to "which approach should I use in my query?" is "it depends!"


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply