Table join / pivoting question

  • I have a table that contains 96 status columns, one for every 15-minute period of the data. I also have another table with priorities for each status. I need to run a query that will return the status and priority side by side. I can do this with sub query, but performance is terrible. Here's a sample with three columns:

    declare @tbl table

    (

    IndexID int identity(1,1),

    Status1 int,

    Status2 int,

    Status3 int

    )

    declare @spTable table

    (

    StatusID int,

    PriorityID int

    )

    insert@spTable ( StatusID, PriorityID )

    values( 1, 20 ),

    ( 2, 15 ),

    ( 3, 30 )

    insert@tbl ( Status1, Status2, Status3 )

    values( 1, 3, 1 ),

    ( 2, 1, 3 ),

    ( 1, 3, 3 )

    How can I efficiently get the following results?

    IndexID Status1 Priority1 Status2 Priority2 Status3 Priority3

    ----------- ----------- ----------- ----------- ----------- ----------- -----------

    1 1 20 3 30 1 20

    2 2 15 1 20 3 30

    3 1 20 3 30 3 30

  • I don't know how efficient you can get with this, but here is my version

    select

    t.IndexID

    , Status1

    , max(case Status1 when StatusId then PriorityID end) as Priority1

    , Status2

    , max(case Status2 when StatusId then PriorityID end) as Priority2

    , Status3

    , max(case Status3 when StatusId then PriorityID end) as Priority3

    from @tbl t

    cross join @spTable s

    group by t.IndexID

    , Status1

    , Status2

    , Status3

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I doubt that you're actually using Table Variables for the real thing but we'll continue with that example. In the following, I've added indexes to the Table Variables in the form of PKs that I believe that you should have on your real tables. That will get rid of a pretty nasty sort that's showing up in Magoo's fine code.

    You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick. 😛

    Details are in the code...

    --===== Declare the test tables

    DECLARE @Tbl TABLE

    (

    IndexID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED --added PK

    ,Status1 INT

    ,Status2 INT

    ,Status3 INT

    )

    ;

    DECLARE @spTable TABLE

    (

    StatusID INT NOT NULL --added NOT NULL

    ,PriorityID INT NOT NULL --added NOT NULL

    ,PRIMARY KEY CLUSTERED(StatusID,PriorityID) --added PK

    )

    ;

    --===== Populate the test tables

    INSERT @Tbl (Status1, Status2, Status3)

    SELECT 1, 3, 1 UNION ALL --Converted to 2005 so others can play

    SELECT 2, 1, 3 UNION ALL

    SELECT 1, 3, 3

    ;

    INSERT INTO @spTable

    (StatusID, PriorityID)

    SELECT 1, 20 UNION ALL --Converted to 2005 so others can play

    SELECT 2, 15 UNION ALL

    SELECT 3, 30

    ;

    --===== Solve the problem like Magoo did but with the MAX trick to limit the GROUP BY.

    -- I also extended the CASE statements to prevent the "NULL AGGREGATE" message.

    SELECT t.IndexID

    ,Status1 = MAX(t.Status1)

    ,Priority1 = MAX(CASE WHEN t.Status1 = s.StatusId THEN s.PriorityID ELSE -1 END)

    ,Status2 = MAX(t.Status2)

    ,Priority2 = MAX(CASE WHEN t.Status2 = s.StatusId THEN s.PriorityID ELSE -1 END)

    ,Status3 = MAX(t.Status3)

    ,Priority3 = MAX(CASE WHEN t.Status3 = s.StatusId THEN s.PriorityID ELSE -1 END)

    FROM @Tbl t

    CROSS JOIN @spTable s

    GROUP BY t.IndexID

    ORDER BY t.IndexID

    ;

    Of course, if your column names really are numeric in nature, you could do some pretty amazing and simple stuff to "auto-magically" generate all of the code instead of using CPR (Copy, Paste, Replace). 😉 It would also be a snap to change the column names in the output to Time-of-Day to make it easier for humans to read.

    Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, my table is quite large and it does have primary keys, and I do generate the code automatically. In any case, here's a version that I ended up using which uses cross apply, instead of the case statement:

    selectX.IndexID,

    X.Status1,

    Priority1 = case when PX.ColID = 1 then S.PriorityID end,

    X.Status2,

    Priority2 = case when PX.ColID = 2 then S.PriorityID end,

    X.Status3,

    Priority3 = case when PX.ColID = 3 then S.PriorityID end

    from@tbl X cross apply

    (

    values( 1, X.Status1 ),

    ( 2, X.Status2 ),

    ( 3, X.Status3 )

    ) PX ( ColID, StatusID ) inner join

    @spTable S on

    S.StatusID = PX.StatusID

    group by X.IndexID, X.Status1, X.Status2, X.Status3

  • Jeff Moden (12/13/2013)


    You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick. 😛

    ...

    Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".

    First, let me say thanks Jeff (mostly for not saying "You've done it again Magoo") for the kind words 😀

    There is a slight mistake in your code, which is just a copy/paste thing of referencing Status1 three times...

    As for the "old-fart-that-hates-typing-and-reading-lots-of-code trick", it's interesting that the query engine is smart enough to realise (for a large enough set of data) that our two queries are logically identical and it will generate the exact same plan for both (at least it did for 1,000,000 rows) I am constantly in awe of the programmers that develop(ed) that code!

    I'm not convinced on the typing thing though - my fingers hate typing MAx( before and ) after each column when I can just type or copy/paste the column list into the group by - but that is (as I just said) immaterial as the engine doesn't care either way! 😎

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • N_Muller (12/14/2013)


    Yes, my table is quite large and it does have primary keys, and I do generate the code automatically. In any case, here's a version that I ended up using which uses cross apply, instead of the case statement:

    selectX.IndexID,

    X.Status1,

    Priority1 = case when PX.ColID = 1 then S.PriorityID end,

    X.Status2,

    Priority2 = case when PX.ColID = 2 then S.PriorityID end,

    X.Status3,

    Priority3 = case when PX.ColID = 3 then S.PriorityID end

    from@tbl X cross apply

    (

    values( 1, X.Status1 ),

    ( 2, X.Status2 ),

    ( 3, X.Status3 )

    ) PX ( ColID, StatusID ) inner join

    @spTable S on

    S.StatusID = PX.StatusID

    group by X.IndexID, X.Status1, X.Status2, X.Status3

    You said you used cross apply instead of case? You have both :w00t:

    Your query, as it is, won't work without some aggregate functions on the PriorityX columns to coordinate with that group by...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (12/14/2013)


    Jeff Moden (12/13/2013)


    You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick. 😛

    ...

    Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".

    First, let me say thanks Jeff (mostly for not saying "You've done it again Magoo") for the kind words 😀

    There is a slight mistake in your code, which is just a copy/paste thing of referencing Status1 three times...

    As for the "old-fart-that-hates-typing-and-reading-lots-of-code trick", it's interesting that the query engine is smart enough to realise (for a large enough set of data) that our two queries are logically identical and it will generate the exact same plan for both (at least it did for 1,000,000 rows) I am constantly in awe of the programmers that develop(ed) that code!

    I'm not convinced on the typing thing though - my fingers hate typing MAx( before and ) after each column when I can just type or copy/paste the column list into the group by - but that is (as I just said) immaterial as the engine doesn't care either way! 😎

    Thanks for catching the CPR mistake. I've fixed that in the code where you found it.

    As for the typing thing, I wouldn't actually type any of the SELECT list more than once. I'd type out a little dynamic SQL and let it generate the 96 (in this case) paired items including more meaningful labels. Since I wouldn't have do the same thing for the GROUP BY, I've save myself a little typing. 😀

    I'm also in awe of the programmers that worked on the execution plan. In fact, I'm totally in awe of the optimizer that decides how it's going to put all of this stuff together and whether or not to use an index based on statistics, etc, ad infinitum. Despite some of it's faults, it's an amazing piece of software where we don't have to spend half our waking hours writing loops to retrieve data from files and then loops to join and then loops to select and loops to... you get the idea.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It appears that Monday morning without coffee isn't particularly conduciveto understanding requirements, so post removed.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • You are correct. The query won't work without an aggregate. However, going back to my original question on the usage of sub-queries. So far I have the two options, using the cross apply or sub-query. I'm sure there are others, but I haven't figured those out yet. The sub-query approach takes a lot longer to generate the query plan when compared to the cross apply where compilation takes no time at all. However, once the query plan is built running the query repeatedly is much faster than the cross apply. The query is embedded in a stored procedure which is called with different parameters. Also, my problem isn't as simple as this example, which I built for illustration. I actually have 96 columns and thousands of rows. My question is "is there a way to get the compilation time of the cross apply and the execution time of the sub-query?" Thanks in advance to anyone that can help me.

    Cross apply method:

    selectX.IndexID,

    X.Status1,

    Priority1 = sum(case when PX.ColID = 1 then S.PriorityID end),

    X.Status2,

    Priority2 = sum(case when PX.ColID = 2 then S.PriorityID end),

    X.Status3,

    Priority3 = sum(case when PX.ColID = 3 then S.PriorityID end)

    from@tbl X cross apply

    (

    values( 1, X.Status1 ),

    ( 2, X.Status2 ),

    ( 3, X.Status3 )

    ) PX ( ColID, StatusID ) inner join

    @spTable S on

    S.StatusID = PX.StatusID

    group by X.IndexID, X.Status1, X.Status2, X.Status3

    Sub-query method:

    selectX.IndexID,

    X.Status1,

    Priority1 = ( select top 1 PriorityID from @spTable where StatusID = X.Status1 ),

    X.Status2,

    Priority2 = ( select top 1 PriorityID from @spTable where StatusID = X.Status2 ),

    X.Status3,

    Priority3 = ( select top 1 PriorityID from @spTable where StatusID = X.Status3 )

    from@tbl X

  • N_Muller (12/16/2013)


    You are correct. The query won't work without an aggregate. However, going back to my original question on the usage of sub-queries. So far I have the two options, using the cross apply or sub-query. I'm sure there are others, but I haven't figured those out yet. The sub-query approach takes a lot longer to generate the query plan when compared to the cross apply where compilation takes no time at all. However, once the query plan is built running the query repeatedly is much faster than the cross apply. The query is embedded in a stored procedure which is called with different parameters. Also, my problem isn't as simple as this example, which I built for illustration. I actually have 96 columns and thousands of rows. My question is "is there a way to get the compilation time of the cross apply and the execution time of the sub-query?" Thanks in advance to anyone that can help me.

    Cross apply method:

    selectX.IndexID,

    X.Status1,

    Priority1 = sum(case when PX.ColID = 1 then S.PriorityID end),

    X.Status2,

    Priority2 = sum(case when PX.ColID = 2 then S.PriorityID end),

    X.Status3,

    Priority3 = sum(case when PX.ColID = 3 then S.PriorityID end)

    from@tbl X cross apply

    (

    values( 1, X.Status1 ),

    ( 2, X.Status2 ),

    ( 3, X.Status3 )

    ) PX ( ColID, StatusID ) inner join

    @spTable S on

    S.StatusID = PX.StatusID

    group by X.IndexID, X.Status1, X.Status2, X.Status3

    Sub-query method:

    selectX.IndexID,

    X.Status1,

    Priority1 = ( select top 1 PriorityID from @spTable where StatusID = X.Status1 ),

    X.Status2,

    Priority2 = ( select top 1 PriorityID from @spTable where StatusID = X.Status2 ),

    X.Status3,

    Priority3 = ( select top 1 PriorityID from @spTable where StatusID = X.Status3 )

    from@tbl X

    Actually, you have 4 options and the right one is neither of the two above. Go back and look at Magoo's code for one of the right ways to do it and then look at mine for a shorter alternative. Both should blow the doors off the two examples you have above.

    Subqueries are out of the question for this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't know what was going on in my head. Thanks very much.

  • Viewing 11 posts - 1 through 10 (of 10 total)

    You must be logged in to reply to this topic. Login to reply