T-SQL Pivot Question

  • Hello Everyone.

    This should be an easy one. It's been a long time since I have used PIVOT and think I forgot something. Take a look at these two queries. The first query provides the desired results. I am trying to get the same results using PIVOT but seem to be missing something.

    IF OBJECT_ID('tempdb..#p') IS NOT NULL DROP TABLE #p;

    CREATE TABLE #p(id int, qtr tinyint, sales int);

    DECLARE @rows int = 10000;

    WITH iTally(n) AS

    (SELECT TOP(@rows) ROW_NUMBER() OVER (ORDER BY (SELECT ($)))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    )

    INSERT #p

    SELECTn,

    ceiling(4*rand(convert(varbinary, newid()))),

    ceiling(80*rand(convert(varbinary, newid())))+20

    FROM iTally;

    -- DESIRED RESULTS

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM #p;

    -- I AM STUCK HERE

    SELECT [1] AS q1, [2] AS q2, [3] AS q3, [4] AS q4

    FROM #p AS p

    PIVOT(

    SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])

    ) AS pivottbl;

    Thanks!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hi Alan,

    You need to change your table to a subquery involving only the grouping columns and the ones to be aggregated.

    SELECT [1] AS q1, [2] AS q2, [3] AS q3, [4] AS q4

    FROM (SELECT qtr, sales FROM #p) AS p

    PIVOT(

    SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])

    ) AS pivottbl;

    Why wouldn't you want to use a nice pre-aggregated cross tab approach?

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY qtr)p;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (2/4/2014)


    Hi Alan,

    You need to change your table to a subquery involving only the grouping columns and the ones to be aggregated.

    SELECT [1] AS q1, [2] AS q2, [3] AS q3, [4] AS q4

    FROM (SELECT qtr, sales FROM #p) AS p

    PIVOT(

    SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])

    ) AS pivottbl;

    Why wouldn't you want to use a nice pre-aggregated cross tab approach?

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY qtr)p;

    Duh! (slapping self on head!) You rock as always Luis!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Luis Cazares (2/4/2014)


    Why wouldn't you want to use a nice [font="Arial Black"]pre-aggregated [/font]cross tab approach?

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY qtr)p;

    +1000

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Luis Cazares (2/4/2014)


    ... Why wouldn't you want to use a nice pre-aggregated cross tab approach?

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY qtr)p;

    I'm sorry Luis that I missed this question... I was in a hurry to leave work last night, saw that you had replied to my post and did not finish reading it before thanking you (I knew from the first sentence you wrote what I was doing wrong). I finished your post last night.

    Thank you Jeff for jumping into the thread, I would have likely missed Luis' question otherwise. As a token of my thanks I sent you some snow.

    Ironically, a conversation started last night when I, again, asked my friend/co-worker, Rob, if he read those Jeff Moden Articles I emailed him. Blah, blah... I was showing him why I use the technique in Jeff's article (Cross Tabs and Pivots, Part 1 – Converting Rows to Columns[/url]) instead of PIVOT. I was going to walk him through a the diferent techniques (including the pre-aggregations), do some tests, look at query plans but got stuck at the Pivot. We actually had the table #p filled with 1,000,000 rows when I got stuck on PIVOT (I never use PIVOT). I did not post the complete code...

    IF OBJECT_ID('tempdb..#p') IS NOT NULL DROP TABLE #p;

    CREATE TABLE #p(id int primary key, qtr tinyint, sales int);

    DECLARE @rows int = 1000000;

    WITH iTally(n) AS

    (SELECT TOP(@rows) ROW_NUMBER() OVER (ORDER BY (SELECT ($)))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    )

    INSERT #p

    SELECTn,

    ceiling(4*rand(convert(varbinary, newid()))),

    ceiling(80*rand(convert(varbinary, newid())))+20

    FROM iTally;

    --DBCC FREEPROCCACHE

    SET NOCOUNT ON;

    SET STATISTICS TIME ON;

    --what I need

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM #p;

    --where I was stuck

    SELECT [1] AS q1, [2] AS q2, [3] AS q3, [4] AS q4

    FROM

    (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY qtr) AS p

    PIVOT(

    SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])

    ) AS pivottbl;

    SET STATISTICS TIME OFF;

    I have to run (it has not stopped snowing and we're being kicked out). I am going to chime back in later when I'm at home. If you get a moment (I know it's not snowing where you are Luis) run this code above. I have some more questions, such as, "why no parallelism with the cross tab" ? What does the query optimized assume that a serial version of this query will be faster?

    Again, I'll chime in again later. Thanks to both of you!

    Edit: Grammar.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • It seems that I get very similar results with pivot and cross tabs. The parallelism is created with the pre-aggregation because the normal pivot and cross tabs won't use it. Both pre-aggregated versions return the exact same plan and similar results every time I ran the test. Maybe I'm missing something. I'll try to come back if I get the time to add an extra column to pivot.

    If you have any comments about the test, please let me know.

    Code used for the test:

    IF OBJECT_ID('tempdb..#p') IS NOT NULL DROP TABLE #p;

    CREATE TABLE #p(id int primary key, qtr tinyint, sales int);

    DECLARE @rows int = 1000000;

    WITH iTally(n) AS

    (SELECT TOP(@rows) ROW_NUMBER() OVER (ORDER BY (SELECT ($)))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    )

    INSERT #p

    SELECTn,

    ceiling(4*rand(convert(varbinary, newid()))),

    ceiling(80*rand(convert(varbinary, newid())))+20

    FROM iTally;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    SET NOCOUNT ON;

    PRINT 'Normal Cross-tab'

    SET STATISTICS TIME ON;

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM #p;

    SET STATISTICS TIME OFF;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    PRINT 'Normal Pivot'

    SET STATISTICS TIME ON;

    --where I was stuck

    SELECT [1] AS q1, [2] AS q2, [3] AS q3, [4] AS q4

    FROM

    (SELECT qtr, sales

    FROM #p) AS p

    PIVOT(

    SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])

    ) AS pivottbl;

    SET STATISTICS TIME OFF;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    PRINT 'Pre-aggregated Cross-tab'

    SET STATISTICS TIME ON;

    --where I was stuck

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY qtr)p;

    SET STATISTICS TIME OFF;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    PRINT 'Pre-aggregated Pivot'

    SET STATISTICS TIME ON;

    --where I was stuck

    SELECT [1] AS q1, [2] AS q2, [3] AS q3, [4] AS q4

    FROM

    (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY qtr) AS p

    PIVOT(

    SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])

    ) AS pivottbl;

    SET STATISTICS TIME OFF;

    Results

    Normal Cross-tab

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 1350 ms, elapsed time = 1352 ms.

    Normal Pivot

    SQL Server Execution Times:

    CPU time = 1340 ms, elapsed time = 1345 ms.

    Pre-aggregated Cross-tab

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 1060 ms, elapsed time = 292 ms.

    Pre-aggregated Pivot

    SQL Server Execution Times:

    CPU time = 1070 ms, elapsed time = 320 ms.

    PS. Take care, I hope it doesn't get too terrible with all that snow. I sure miss it but I don't need as much as you might be getting.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • And this is why I really love cross-tabs. Pivot becomes completely annoying with multiple columns or calculations being pivoted. I struggled to get right the pivot approach.

    Again, any comments are welcome.

    Code used for the test:

    IF OBJECT_ID('tempdb..#p') IS NOT NULL DROP TABLE #p;

    CREATE TABLE #p(id int primary key, qtr tinyint, sales int);

    DECLARE @rows int = 1000000;

    WITH iTally(n) AS

    (SELECT TOP(@rows) ROW_NUMBER() OVER (ORDER BY (SELECT ($)))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    )

    INSERT #p

    SELECTn,

    ceiling(4*rand(convert(varbinary, newid()))),

    ceiling(80*rand(convert(varbinary, newid())))+20

    FROM iTally;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    SET NOCOUNT ON;

    PRINT 'Normal Cross-tab'

    SET STATISTICS TIME ON;

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4,

    COUNT(CASE qtr WHEN 1 THEN sales END) AS Count_q1,

    COUNT(CASE qtr WHEN 2 THEN sales END) AS Count_q2,

    COUNT(CASE qtr WHEN 3 THEN sales END) AS Count_q3,

    COUNT(CASE qtr WHEN 4 THEN sales END) AS Count_q4

    FROM #p;

    SET STATISTICS TIME OFF;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    PRINT 'Normal Pivot'

    SET STATISTICS TIME ON;

    SELECT pivottbl.[1] AS q1, pivottbl.[2] AS q2, pivottbl.[3] AS q3, pivottbl.[4] AS q4,

    pivottbl2.[1] AS Count_q1, pivottbl2.[2] AS Count_q2, pivottbl2.[3] AS Count_q3, pivottbl2.[4] AS Count_q4

    FROM (SELECT qtr, sales FROM #p) AS p

    PIVOT(SUM(p.sales) FOR p.qtr IN ([1],[2],[3],[4])) AS pivottbl

    JOIN (SELECT qtr, sales FROM #p) AS p2

    PIVOT(COUNT(p2.sales) FOR p2.qtr IN ([1],[2],[3],[4])) AS pivottbl2 ON 1 = 1;

    SET STATISTICS TIME OFF;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    PRINT 'Pre-aggregated Cross-tab'

    SET STATISTICS TIME ON;

    --where I was stuck

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4,

    SUM(CASE qtr WHEN 1 THEN Cnt END) AS Count_q1,

    SUM(CASE qtr WHEN 2 THEN Cnt END) AS Count_q2,

    SUM(CASE qtr WHEN 3 THEN Cnt END) AS Count_q3,

    SUM(CASE qtr WHEN 4 THEN Cnt END) AS Count_q4

    FROM (SELECT qtr, SUM(sales) sales, COUNT(*) Cnt

    FROM #p

    GROUP BY qtr)p;

    SET STATISTICS TIME OFF;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    PRINT 'Pre-aggregated Pivot'

    SET STATISTICS TIME ON;

    --where I was stuck

    SELECT pivottbl.[1] AS q1, pivottbl.[2] AS q2, pivottbl.[3] AS q3, pivottbl.[4] AS q4,

    pivottbl2.[1] AS Count_q1, pivottbl2.[2] AS Count_q2, pivottbl2.[3] AS Count_q3, pivottbl2.[4] AS Count_q4

    FROM (SELECT qtr, SUM(sales) sales FROM #p GROUP BY qtr) AS p

    PIVOT(SUM(p.sales) FOR p.qtr IN ([1],[2],[3],[4])) AS pivottbl

    JOIN (SELECT qtr, COUNT(*) Cnt FROM #p GROUP BY qtr) AS p2

    PIVOT(SUM(Cnt) FOR p2.qtr IN ([1],[2],[3],[4])) AS pivottbl2 ON 1 = 1;

    SET STATISTICS TIME OFF;

    Results:

    Normal Cross-tab

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 2160 ms, elapsed time = 2120 ms.

    Normal Pivot

    SQL Server Execution Times:

    CPU time = 2540 ms, elapsed time = 1741 ms.

    Pre-aggregated Cross-tab

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    CPU time = 1160 ms, elapsed time = 475 ms.

    Pre-aggregated Pivot

    SQL Server Execution Times:

    CPU time = 1740 ms, elapsed time = 1271 ms.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis. Thanks for putting all this together; I wanted to reply soon but it's been a busy couple days.

    I have been playing around with the code you posted and re-visited Jeff's article about cross-tabs and pivots. I never read it in it's entirety and never really played with pre-aggregated cross-tabs and pre-aggregated pivots. This was an extremely helpful and informative thread.

    I always used the cross-tab approach when pivoting because it was easier to read/write. PIVOT is a real drag when you need to do multiple aggregations; I am now learning that through experience.

    What I find interesting is that, on my system (08R2, 64bit, 16gb RAM) & when dealing with one aggregation, the cross-tab approach shows a faster estimated plan but PIVOT performs better. Using the tests below: at a million rows PIVOT generates a parallel plan and is like 4X faster. At 5million they both produce parallel plans and PIVOT is still twice as fast. I did not expect that.

    -- (1) Sample data

    IF OBJECT_ID('tempdb..#p') IS NOT NULL DROP TABLE #p;

    CREATE TABLE #p

    (id int primary key,

    office int not null,

    qtr int not null,

    sales int not null);

    GO

    DECLARE @rows int = 2000000;

    WITH iTally(n) AS

    (SELECT TOP(@rows) ROW_NUMBER() OVER (ORDER BY (SELECT ($)))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    INSERT #p

    SELECTn,

    ceiling(60*rand(convert(varbinary, newid()))),

    ceiling(4*rand(convert(varbinary, newid()))),

    ceiling(80*rand(convert(varbinary, newid())))+20

    FROM iTally;

    GO

    CREATE INDEX c_oqs ON #p(office) INCLUDE (qtr, sales);

    CREATE INDEX c_qs ON #p(qtr) INCLUDE (sales);

    GO

    -- single aggregate: 1m rows=no paralellism for CROSS-TAB, but yes for piv

    -- both get paralell plan at 5m rows

    ---------------------------------------------------------------------------

    -- quarters only

    SET NOCOUNT ON;

    SET STATISTICS TIME ON;

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM #p;

    WITH p AS

    (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY office, qtr)

    SELECT

    [1] AS q1,

    [2] AS q2,

    [3] AS q3,

    [4] AS q4

    FROM p

    PIVOT

    (SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])) AS pivottbl;

    SET STATISTICS TIME OFF;

    GO

    -- office & quarters only

    SET STATISTICS TIME ON;

    SELECT

    office,

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM #p

    GROUP BY office;

    --piv

    WITH p AS

    (SELECT office, qtr, SUM(sales) sales

    FROM #p

    GROUP BY office, qtr)

    SELECT

    office,

    [1] AS q1,

    [2] AS q2,

    [3] AS q3,

    [4] AS q4

    FROM p

    PIVOT

    (SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])) AS pivottbl;

    SET STATISTICS TIME OFF;

    GO

    Results:

    5M Rows (both create Parelell Plans)

    crossTab

    SQL Server Execution Times:

    CPU time = 3120 ms, elapsed time = 783 ms.

    pivot

    SQL Server Execution Times:

    CPU time = 1560 ms, elapsed time = 396 ms.

    1M Rows (Only PIVOT creates a Parelell Plan)

    crossTab

    SQL Server Execution Times:

    CPU time = 952 ms, elapsed time = 946 ms.

    pivot

    SQL Server Execution Times:

    CPU time = 841 ms, elapsed time = 224 ms.

    Comments questions welcome.

    Thanks again Luis!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan,

    I was getting worried about you, it's good to know that you haven't been around because you were busy.

    Back to the problem, you're cheating with your test. You didn't clean up the buffer after the cross tabs and before the pivot. At least not in the code you posted. That will give the second method (pivot in this case) a great advantage.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (2/7/2014)


    ...you're cheating with your test. You didn't clean up the buffer after the cross tabs and before the pivot. At least not in the code you posted. That will give the second method (pivot in this case) a great advantage.

    Did not know to cleanup the buffer. Learned to do that in this thread. I have been doing so since.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Have you tested again?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Alan.B (2/7/2014)


    Using the tests below: at a million rows PIVOT generates a parallel plan and is like 4X faster. At 5million they both produce parallel plans and PIVOT is still twice as fast. I did not expect that.

    I know that this post is several months old but there's a reason for your findings... you used pre-aggregation for the PIVOT but not for the CROSSTAB. The two tests aren't the same.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/17/2014)


    Alan.B (2/7/2014)


    Using the tests below: at a million rows PIVOT generates a parallel plan and is like 4X faster. At 5million they both produce parallel plans and PIVOT is still twice as fast. I did not expect that.

    I know that this post is several months old but there's a reason for your findings... you used pre-aggregation for the PIVOT but not for the CROSSTAB. The two tests aren't the same.

    Hi Jeff, sorry for the later reply here - I have been off the grid dealing with health issues.

    I fixed my code - please let me know if this is correct:

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    -- grouped by quarters only

    SET NOCOUNT ON;

    SET STATISTICS TIME ON;

    PRINT 'crosstab, quarters only'+char(13);

    SELECT

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM

    (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY qtr) p;

    PRINT 'pivot, quarters only'+char(13);

    WITH p AS

    (SELECT qtr, SUM(sales) sales

    FROM #p

    GROUP BY office, qtr)

    SELECT

    [1] AS q1,

    [2] AS q2,

    [3] AS q3,

    [4] AS q4

    FROM p

    PIVOT

    (SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])) AS pivottbl;

    SET STATISTICS TIME OFF;

    GO

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    -- office & quarters

    SET STATISTICS TIME ON;

    PRINT 'crosstab office and quarters'+char(13);

    SELECT

    Office,

    SUM(CASE qtr WHEN 1 THEN sales END) AS q1,

    SUM(CASE qtr WHEN 2 THEN sales END) AS q2,

    SUM(CASE qtr WHEN 3 THEN sales END) AS q3,

    SUM(CASE qtr WHEN 4 THEN sales END) AS q4

    FROM

    (SELECT Office, qtr, SUM(sales) sales

    FROM #p

    GROUP BY office, qtr) p

    GROUP BY office;

    PRINT 'pivot office and quarters'+char(13);

    WITH p AS

    (SELECT office, qtr, SUM(sales) sales

    FROM #p

    GROUP BY office, qtr)

    SELECT

    office,

    [1] AS q1,

    [2] AS q2,

    [3] AS q3,

    [4] AS q4

    FROM p

    PIVOT

    (SUM(sales)

    FOR qtr IN ([1],[2],[3],[4])) AS pivottbl;

    SET STATISTICS TIME OFF;

    GO

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Luis Cazares (2/11/2014)


    Have you tested again?

    Luis - I completely flaked out and missed this message. I am truly sorry and feel like a jerk. I fixed the my code and re-tested it. The server I originally tested this on does not exist. On my laptop (4 cpu, 16gb ram, SQL 2014) using the code that I posted above I get the following results against 10M rows:

    crosstab, quarters only

    SQL Server Execution Times:

    CPU time = 3796 ms, elapsed time = 1106 ms.

    pivot, quarters only

    SQL Server Execution Times:

    CPU time = 6344 ms, elapsed time = 1703 ms.

    crosstab office and quarters

    SQL Server Execution Times:

    CPU time = 6297 ms, elapsed time = 1684 ms.

    pivot office and quarters

    SQL Server Execution Times:

    CPU time = 6374 ms, elapsed time = 1751 ms.

    I did 10M because the results are pretty close.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (7/24/2014)


    Luis Cazares (2/11/2014)


    Have you tested again?

    Luis - I completely flaked out and missed this message. I am truly sorry and feel like a jerk. I fixed the my code and re-tested it. The server I originally tested this on does not exist. On my laptop (4 cpu, 16gb ram, SQL 2014) using the code that I posted above I get the following results against 10M rows:

    Hey, don't feel that way. It's completely understandable that we can't be here all the time and miss some questions.

    Thank you for sharing.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply