Cross Tabs and Pivots, Part 1 – Converting Rows to Columns

  • Ah... the fly in the ointment with the double pivot shown above is that it still returns NULLs for missing entries and row totals are still a pain.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The other fly in the ointment is that it's also coming up with the wrong sums.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • david.leyden - Sunday, October 1, 2017 11:16 PM

    I've come across this article and a few others that suggest it is a limitation of PIVOT that you can't PIVOT more than one column. I am not sure I understand the need for creating two pivots and joining them. I would just alias the column we are pivoting on for each PIVOT clause.


    SELECT [Company],
       [Year],
       [1_Qty]=MAX([1_Qty]),
       [2_Qty]=MAX([2_Qty]),
       [3_Qty]=MAX([3_Qty]),
       [4_Qty]=MAX([4_Qty]),
       [1_Amt]=MAX([1_Amt]),
       [2_Amt]=MAX([2_Amt]),
       [3_Amt]=MAX([3_Amt]),
       [4_Amt]=MAX([4_Amt])
    FROM
    (
    SELECT [Company],
       Year,
       [Qtr_Qty]=CONVERT( CHAR(1), Quarter)+'_Qty',
       [Qtr_Amt]=CONVERT( CHAR(1), Quarter)+'_Amt',
       [Quantity],
       [Amount]
    FROM @testtable
    ) [pt] PIVOT(SUM([Quantity]) FOR [Qtr_Qty] IN([1_Qty],
                    [2_Qty],
                    [3_Qty],
                    [4_Qty]))
    AS [qty] PIVOT(SUM([Amount]) FOR [Qtr_Amt] IN([1_Amt],
                    [2_Amt],
                    [3_Amt],
                    [4_Amt]))
    AS [amt]
    GROUP BY [Company],
       [Year];

    Yowch!  Pivot breaks when used as a double pivot in that manner.  Either that or I'm seriously missing something that has gone wrong with your code.

    If we run the million row data generator from the article...


    --===== Create and populate a 1,000,000 row test table.
         -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
         -- Column "Company" has a range of "AAA" to "BBB" non-unique 3 character strings
         -- Column "Amount has a range of 0.0000 to 9999.9900 non-unique numbers
         -- Column "Quantity" has a range of 1 to 50,000 non-unique numbers
         -- Column "Date" has a range of  >=01/01/2000 and <01/01/2010 non-unique date/times
         -- Columns Year and Quarter are the similarly named components of Date
         -- Jeff Moden

     SELECT TOP 1000000 --<<Look!  Change this number for testing different size tables
            RowNum       = IDENTITY(INT,1,1),
            Company      = CHAR(ABS(CHECKSUM(NEWID()))%2+65)
                         + CHAR(ABS(CHECKSUM(NEWID()))%2+65)
                         + CHAR(ABS(CHECKSUM(NEWID()))%2+65),
            Amount       = CAST(ABS(CHECKSUM(NEWID()))%1000000/100.0 AS MONEY),
            Quantity     = ABS(CHECKSUM(NEWID()))%50000+1,
            Date         = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
            Year         = CAST(NULL AS SMALLINT),
            Quarter      = CAST(NULL AS TINYINT)
       INTO #SomeTable3
       FROM Master.sys.SysColumns t1
      CROSS JOIN
            Master.sys.SysColumns t2
    --===== Fill in the Year and Quarter columns from the Date column
     UPDATE #SomeTable3
        SET Year    = DATEPART(yy,Date),
            Quarter = DATEPART(qq,Date)
    --===== A table is not properly formed unless a Primary Key has been assigned
         -- Takes about 1 second to execute.
      ALTER TABLE #SomeTable3
            ADD PRIMARY KEY CLUSTERED (RowNum)
    CREATE NONCLUSTERED INDEX IX_#SomeTable3_Cover1
        ON dbo.#SomeTable3 (Company, Year)
           INCLUDE (Amount, Quantity, Quarter)
    GO
        SET STATISTICS TIME OFF
        SET STATISTICS IO OFF
    GO

    It does, in fact, create a million rows...


    SELECT COUNT(*) FROM #SomeTable3

    -----------
    1000000
    (1 row affected)

    If we take out the aggregation in the first SELECT and replace it with just a "*" and remove the GROUP BY, it should return a million rows...


    SELECT *
    --[COMPANY],
    --   [YEAR],
    --   [1_QTY]=MAX([1_QTY]),
    --   [2_QTY]=MAX([2_QTY]),
    --   [3_QTY]=MAX([3_QTY]),
    --   [4_QTY]=MAX([4_QTY]),
    --   [1_AMT]=MAX([1_AMT]),
    --   [2_AMT]=MAX([2_AMT]),
    --   [3_AMT]=MAX([3_AMT]),
    --   [4_AMT]=MAX([4_AMT])
    FROM
    (
      SELECT [Company],
       Year,
       [Qtr_Qty]=CONVERT( CHAR(1), Quarter)+'_Qty',
       [Qtr_Amt]=CONVERT( CHAR(1), Quarter)+'_Amt',
       [Quantity],
       [Amount]
      FROM #SomeTable3

    ) [pt] PIVOT(SUM([Quantity]) FOR [Qtr_Qty] IN([1_Qty],
                    [2_Qty],
                    [3_Qty],
                    [4_Qty]))
    AS [qty] PIVOT(SUM([Amount]) FOR [Qtr_Amt] IN([1_Amt],
                    [2_Amt],
                    [3_Amt],
                    [4_Amt]))
    AS [amt]
    --GROUP BY [Company],
    --   [Year];

    BUT, IT DOESN"T RETURN THE MILLION ROWS!  It's actually missing 10's of thousands of rows...


    (968031 rows affected)

    Although I'm not sure that Microsoft supports your method of doing a multi-column pivot, it looks like the PIVOT operator cause a "break" and doesn't consider all of the rows.  Either that, or the optimizer doesn't handle the way the column names were dynamically created.

    Since it's also more complex than a CROSS TAB, I'll never even consider using PIVOT especially since we just proved this method "breaks" and produces the wrong answers because it ignores 10's of thousand of rows.  And, yes, I checked.  The derived table in the FROM clause produces a million rows and even when saving that as a Temp Table and using that Temp Table in place of the derived table, you still end up with the missing rows and the wrong answers.

    p.s.  I also understand why you used MAX but since the PIVOT is also NOT doing an actual PIVOT on this, you need to use a SUM but it still doesn't come up with the correct answer because it's still missing rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... now I'm sure.  If you run the code to generate the test table as we did above and then run the code below, you'll see what I mean.  Details are in the comments.


    --===== This is a single column pivot and it works just fine (for a PIVOT).
         -- It produces the expected 80 rows from a million.
     SELECT  Company
            ,YEAR
            ,Q1Amt = amt.[1]
            ,Q2Amt = amt.[2]
            ,Q3Amt = amt.[3]
            ,Q4Amt = amt.[4]
       FROM (SELECT Company,[Year],[Quarter],Amount FROM #SomeTable3) AS src
              PIVOT (SUM(Amount) FOR [Quarter] IN ([1],[2],[3],[4])
            ) amt
      ORDER BY Company, Year
    ;
    --===== This demonstrates that you can't have more in the src query than what the PIVOT will use
         -- which also means that you MUST have a source query for each column to be pivoted.
         -- All we did was add the Quantity column to the "src" aliased derived table and Pivot no
         -- longer works correctly.  Try it and see. And check out the row count.
    SELECT Company
            ,YEAR
            ,Q1Amt = amt.[1]
            ,Q2Amt = amt.[2]
            ,Q3Amt = amt.[3]
            ,Q4Amt = amt.[4]
       FROM (SELECT Company,[Year],[Quarter],Amount,Quantity FROM #SomeTable3) AS src
              PIVOT (SUM(Amount) FOR [Quarter] IN ([1],[2],[3],[4])
            ) amt
      ORDER BY Company, Year
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 241 through 244 (of 244 total)

You must be logged in to reply to this topic. Login to reply