Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Table join / pivoting question Expand / Collapse
Author
Message
Posted Friday, December 13, 2013 4:05 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, July 23, 2014 2:58 PM
Points: 52, Visits: 171
I have a table that contains 96 status columns, one for every 15-minute period of the data. I also have another table with priorities for each status. I need to run a query that will return the status and priority side by side. I can do this with sub query, but performance is terrible. Here's a sample with three columns:

declare @tbl table
(
IndexID int identity(1,1),
Status1 int,
Status2 int,
Status3 int
)
declare @spTable table
(
StatusID int,
PriorityID int
)

insert @spTable ( StatusID, PriorityID )
values ( 1, 20 ),
( 2, 15 ),
( 3, 30 )

insert @tbl ( Status1, Status2, Status3 )
values ( 1, 3, 1 ),
( 2, 1, 3 ),
( 1, 3, 3 )

How can I efficiently get the following results?

IndexID Status1 Priority1 Status2 Priority2 Status3 Priority3
----------- ----------- ----------- ----------- ----------- ----------- -----------
1 1 20 3 30 1 20
2 2 15 1 20 3 30
3 1 20 3 30 3 30
Post #1522885
Posted Friday, December 13, 2013 4:32 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 4:19 PM
Points: 1,786, Visits: 5,682
I don't know how efficient you can get with this, but here is my version

select 
t.IndexID
, Status1
, max(case Status1 when StatusId then PriorityID end) as Priority1
, Status2
, max(case Status2 when StatusId then PriorityID end) as Priority2
, Status3
, max(case Status3 when StatusId then PriorityID end) as Priority3
from @tbl t
cross join @spTable s
group by t.IndexID
, Status1
, Status2
, Status3



MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1522889
    Posted Friday, December 13, 2013 9:15 PM


    SSC-Dedicated

    SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

    Group: General Forum Members
    Last Login: Today @ 4:53 PM
    Points: 36,795, Visits: 31,257
    I doubt that you're actually using Table Variables for the real thing but we'll continue with that example. In the following, I've added indexes to the Table Variables in the form of PKs that I believe that you should have on your real tables. That will get rid of a pretty nasty sort that's showing up in Magoo's fine code.

    You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick.

    Details are in the code...
    --===== Declare the test tables
    DECLARE @Tbl TABLE
    (
    IndexID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED --added PK
    ,Status1 INT
    ,Status2 INT
    ,Status3 INT
    )
    ;
    DECLARE @spTable TABLE
    (
    StatusID INT NOT NULL --added NOT NULL
    ,PriorityID INT NOT NULL --added NOT NULL
    ,PRIMARY KEY CLUSTERED(StatusID,PriorityID) --added PK
    )
    ;
    --===== Populate the test tables
    INSERT @Tbl (Status1, Status2, Status3)
    SELECT 1, 3, 1 UNION ALL --Converted to 2005 so others can play
    SELECT 2, 1, 3 UNION ALL
    SELECT 1, 3, 3
    ;
    INSERT INTO @spTable
    (StatusID, PriorityID)
    SELECT 1, 20 UNION ALL --Converted to 2005 so others can play
    SELECT 2, 15 UNION ALL
    SELECT 3, 30
    ;
    --===== Solve the problem like Magoo did but with the MAX trick to limit the GROUP BY.
    -- I also extended the CASE statements to prevent the "NULL AGGREGATE" message.
    SELECT t.IndexID
    ,Status1 = MAX(t.Status1)
    ,Priority1 = MAX(CASE WHEN t.Status1 = s.StatusId THEN s.PriorityID ELSE -1 END)
    ,Status2 = MAX(t.Status2)
    ,Priority2 = MAX(CASE WHEN t.Status2 = s.StatusId THEN s.PriorityID ELSE -1 END)
    ,Status3 = MAX(t.Status3)
    ,Priority3 = MAX(CASE WHEN t.Status3 = s.StatusId THEN s.PriorityID ELSE -1 END)
    FROM @Tbl t
    CROSS JOIN @spTable s
    GROUP BY t.IndexID
    ORDER BY t.IndexID
    ;


    Of course, if your column names really are numeric in nature, you could do some pretty amazing and simple stuff to "auto-magically" generate all of the code instead of using CPR (Copy, Paste, Replace). It would also be a snap to change the column names in the output to Time-of-Day to make it easier for humans to read.

    Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".


    --Jeff Moden
    "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

    (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

    Helpful Links:
    How to post code problems
    How to post performance problems
    Post #1522926
    Posted Saturday, December 14, 2013 5:31 PM
    Valued Member

    Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

    Group: General Forum Members
    Last Login: Wednesday, July 23, 2014 2:58 PM
    Points: 52, Visits: 171
    Yes, my table is quite large and it does have primary keys, and I do generate the code automatically. In any case, here's a version that I ended up using which uses cross apply, instead of the case statement:

    select X.IndexID,
    X.Status1,
    Priority1 = case when PX.ColID = 1 then S.PriorityID end,
    X.Status2,
    Priority2 = case when PX.ColID = 2 then S.PriorityID end,
    X.Status3,
    Priority3 = case when PX.ColID = 3 then S.PriorityID end
    from @tbl X cross apply
    (
    values ( 1, X.Status1 ),
    ( 2, X.Status2 ),
    ( 3, X.Status3 )
    ) PX ( ColID, StatusID ) inner join
    @spTable S on
    S.StatusID = PX.StatusID
    group by X.IndexID, X.Status1, X.Status2, X.Status3
    Post #1522997
    Posted Saturday, December 14, 2013 5:55 PM


    SSCommitted

    SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

    Group: General Forum Members
    Last Login: Yesterday @ 4:19 PM
    Points: 1,786, Visits: 5,682
    Jeff Moden (12/13/2013)


    You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick.

    ...

    Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".


    First, let me say thanks Jeff (mostly for not saying "You've done it again Magoo") for the kind words

    There is a slight mistake in your code, which is just a copy/paste thing of referencing Status1 three times...

    As for the "old-fart-that-hates-typing-and-reading-lots-of-code trick", it's interesting that the query engine is smart enough to realise (for a large enough set of data) that our two queries are logically identical and it will generate the exact same plan for both (at least it did for 1,000,000 rows) I am constantly in awe of the programmers that develop(ed) that code!

    I'm not convinced on the typing thing though - my fingers hate typing MAx( before and ) after each column when I can just type or copy/paste the column list into the group by - but that is (as I just said) immaterial as the engine doesn't care either way!


    MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1522999
    Posted Saturday, December 14, 2013 6:00 PM


    SSCommitted

    SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

    Group: General Forum Members
    Last Login: Yesterday @ 4:19 PM
    Points: 1,786, Visits: 5,682
    N_Muller (12/14/2013)
    Yes, my table is quite large and it does have primary keys, and I do generate the code automatically. In any case, here's a version that I ended up using which uses cross apply, instead of the case statement:

    select X.IndexID,
    X.Status1,
    Priority1 = case when PX.ColID = 1 then S.PriorityID end,
    X.Status2,
    Priority2 = case when PX.ColID = 2 then S.PriorityID end,
    X.Status3,
    Priority3 = case when PX.ColID = 3 then S.PriorityID end
    from @tbl X cross apply
    (
    values ( 1, X.Status1 ),
    ( 2, X.Status2 ),
    ( 3, X.Status3 )
    ) PX ( ColID, StatusID ) inner join
    @spTable S on
    S.StatusID = PX.StatusID
    group by X.IndexID, X.Status1, X.Status2, X.Status3


    You said you used cross apply instead of case? You have both
    Your query, as it is, won't work without some aggregate functions on the PriorityX columns to coordinate with that group by...


    MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1523000
    Posted Sunday, December 15, 2013 7:02 AM


    SSC-Dedicated

    SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

    Group: General Forum Members
    Last Login: Today @ 4:53 PM
    Points: 36,795, Visits: 31,257
    mister.magoo (12/14/2013)
    Jeff Moden (12/13/2013)


    You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick.

    ...

    Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".


    First, let me say thanks Jeff (mostly for not saying "You've done it again Magoo") for the kind words

    There is a slight mistake in your code, which is just a copy/paste thing of referencing Status1 three times...

    As for the "old-fart-that-hates-typing-and-reading-lots-of-code trick", it's interesting that the query engine is smart enough to realise (for a large enough set of data) that our two queries are logically identical and it will generate the exact same plan for both (at least it did for 1,000,000 rows) I am constantly in awe of the programmers that develop(ed) that code!

    I'm not convinced on the typing thing though - my fingers hate typing MAx( before and ) after each column when I can just type or copy/paste the column list into the group by - but that is (as I just said) immaterial as the engine doesn't care either way!



    Thanks for catching the CPR mistake. I've fixed that in the code where you found it.

    As for the typing thing, I wouldn't actually type any of the SELECT list more than once. I'd type out a little dynamic SQL and let it generate the 96 (in this case) paired items including more meaningful labels. Since I wouldn't have do the same thing for the GROUP BY, I've save myself a little typing.

    I'm also in awe of the programmers that worked on the execution plan. In fact, I'm totally in awe of the optimizer that decides how it's going to put all of this stuff together and whether or not to use an index based on statistics, etc, ad infinitum. Despite some of it's faults, it's an amazing piece of software where we don't have to spend half our waking hours writing loops to retrieve data from files and then loops to join and then loops to select and loops to... you get the idea.


    --Jeff Moden
    "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

    (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

    Helpful Links:
    How to post code problems
    How to post performance problems
    Post #1523023
    Posted Sunday, December 15, 2013 5:36 PM


    Hall of Fame

    Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

    Group: General Forum Members
    Last Login: Yesterday @ 5:45 PM
    Points: 3,617, Visits: 5,237
    It appears that Monday morning without coffee isn't particularly conduciveto understanding requirements, so post removed.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
    Since random numbers are too important to be left to chance, let's generate some!
    Learn to understand recursive CTEs by example.
    Splitting strings based on patterns can be fast!
    Post #1523070
    Posted Monday, December 16, 2013 8:54 AM
    Valued Member

    Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

    Group: General Forum Members
    Last Login: Wednesday, July 23, 2014 2:58 PM
    Points: 52, Visits: 171
    You are correct. The query won't work without an aggregate. However, going back to my original question on the usage of sub-queries. So far I have the two options, using the cross apply or sub-query. I'm sure there are others, but I haven't figured those out yet. The sub-query approach takes a lot longer to generate the query plan when compared to the cross apply where compilation takes no time at all. However, once the query plan is built running the query repeatedly is much faster than the cross apply. The query is embedded in a stored procedure which is called with different parameters. Also, my problem isn't as simple as this example, which I built for illustration. I actually have 96 columns and thousands of rows. My question is "is there a way to get the compilation time of the cross apply and the execution time of the sub-query?" Thanks in advance to anyone that can help me.

    Cross apply method:

    select X.IndexID,
    X.Status1,
    Priority1 = sum(case when PX.ColID = 1 then S.PriorityID end),
    X.Status2,
    Priority2 = sum(case when PX.ColID = 2 then S.PriorityID end),
    X.Status3,
    Priority3 = sum(case when PX.ColID = 3 then S.PriorityID end)
    from @tbl X cross apply
    (
    values ( 1, X.Status1 ),
    ( 2, X.Status2 ),
    ( 3, X.Status3 )
    ) PX ( ColID, StatusID ) inner join
    @spTable S on
    S.StatusID = PX.StatusID
    group by X.IndexID, X.Status1, X.Status2, X.Status3

    Sub-query method:

    select X.IndexID,
    X.Status1,
    Priority1 = ( select top 1 PriorityID from @spTable where StatusID = X.Status1 ),
    X.Status2,
    Priority2 = ( select top 1 PriorityID from @spTable where StatusID = X.Status2 ),
    X.Status3,
    Priority3 = ( select top 1 PriorityID from @spTable where StatusID = X.Status3 )
    from @tbl X
    Post #1523282
    Posted Monday, December 16, 2013 1:44 PM


    SSC-Dedicated

    SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

    Group: General Forum Members
    Last Login: Today @ 4:53 PM
    Points: 36,795, Visits: 31,257
    N_Muller (12/16/2013)
    You are correct. The query won't work without an aggregate. However, going back to my original question on the usage of sub-queries. So far I have the two options, using the cross apply or sub-query. I'm sure there are others, but I haven't figured those out yet. The sub-query approach takes a lot longer to generate the query plan when compared to the cross apply where compilation takes no time at all. However, once the query plan is built running the query repeatedly is much faster than the cross apply. The query is embedded in a stored procedure which is called with different parameters. Also, my problem isn't as simple as this example, which I built for illustration. I actually have 96 columns and thousands of rows. My question is "is there a way to get the compilation time of the cross apply and the execution time of the sub-query?" Thanks in advance to anyone that can help me.

    Cross apply method:

    select X.IndexID,
    X.Status1,
    Priority1 = sum(case when PX.ColID = 1 then S.PriorityID end),
    X.Status2,
    Priority2 = sum(case when PX.ColID = 2 then S.PriorityID end),
    X.Status3,
    Priority3 = sum(case when PX.ColID = 3 then S.PriorityID end)
    from @tbl X cross apply
    (
    values ( 1, X.Status1 ),
    ( 2, X.Status2 ),
    ( 3, X.Status3 )
    ) PX ( ColID, StatusID ) inner join
    @spTable S on
    S.StatusID = PX.StatusID
    group by X.IndexID, X.Status1, X.Status2, X.Status3

    Sub-query method:

    select X.IndexID,
    X.Status1,
    Priority1 = ( select top 1 PriorityID from @spTable where StatusID = X.Status1 ),
    X.Status2,
    Priority2 = ( select top 1 PriorityID from @spTable where StatusID = X.Status2 ),
    X.Status3,
    Priority3 = ( select top 1 PriorityID from @spTable where StatusID = X.Status3 )
    from @tbl X


    Actually, you have 4 options and the right one is neither of the two above. Go back and look at Magoo's code for one of the right ways to do it and then look at mine for a shorter alternative. Both should blow the doors off the two examples you have above.

    Subqueries are out of the question for this.


    --Jeff Moden
    "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

    (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

    Helpful Links:
    How to post code problems
    How to post performance problems
    Post #1523419
    « Prev Topic | Next Topic »

    Add to briefcase 12»»

    Permissions Expand / Collapse