Table join / pivoting question

Question

Post reply

Table join / pivoting question

N_Muller

SSCarpal Tunnel

Points: 4037
More actions
December 13, 2013 at 4:05 pm

#301380

I have a table that contains 96 status columns, one for every 15-minute period of the data. I also have another table with priorities for each status. I need to run a query that will return the status and priority side by side. I can do this with sub query, but performance is terrible. Here's a sample with three columns:
declare @tbl table
(
IndexID int identity(1,1),
Status1 int,
Status2 int,
Status3 int
)
declare @spTable table
(
StatusID int,
PriorityID int
)
insert@spTable ( StatusID, PriorityID )
values( 1, 20 ),
( 2, 15 ),
( 3, 30 )
insert@tbl ( Status1, Status2, Status3 )
values( 1, 3, 1 ),
( 2, 1, 3 ),
( 1, 3, 3 )
How can I efficiently get the following results?
IndexID Status1 Priority1 Status2 Priority2 Status3 Priority3
----------- ----------- ----------- ----------- ----------- ----------- -----------
1 1 20 3 30 1 20
2 2 15 1 20 3 30
3 1 20 3 30 3 30
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

mister.magoo SSC-Forever Points: 47068 More actions · Answer 1

I don't know how efficient you can get with this, but here is my version

select

t.IndexID

, Status1

, max(case Status1 when StatusId then PriorityID end) as Priority1

, Status2

, max(case Status2 when StatusId then PriorityID end) as Priority2

, Status3

, max(case Status3 when StatusId then PriorityID end) as Priority3

from @tbl t

cross join @spTable s

group by t.IndexID

, Status1

, Status2

, Status3

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 2

I doubt that you're actually using Table Variables for the real thing but we'll continue with that example. In the following, I've added indexes to the Table Variables in the form of PKs that I believe that you should have on your real tables. That will get rid of a pretty nasty sort that's showing up in Magoo's fine code.

You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick. 😛

Details are in the code...

--===== Declare the test tables

DECLARE @Tbl TABLE

(

IndexID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED --added PK

,Status1 INT

,Status2 INT

,Status3 INT

)

;

DECLARE @spTable TABLE

(

StatusID INT NOT NULL --added NOT NULL

,PriorityID INT NOT NULL --added NOT NULL

,PRIMARY KEY CLUSTERED(StatusID,PriorityID) --added PK

)

;

--===== Populate the test tables

INSERT @Tbl (Status1, Status2, Status3)

SELECT 1, 3, 1 UNION ALL --Converted to 2005 so others can play

SELECT 2, 1, 3 UNION ALL

SELECT 1, 3, 3

;

INSERT INTO @spTable

(StatusID, PriorityID)

SELECT 1, 20 UNION ALL --Converted to 2005 so others can play

SELECT 2, 15 UNION ALL

SELECT 3, 30

;

--===== Solve the problem like Magoo did but with the MAX trick to limit the GROUP BY.

-- I also extended the CASE statements to prevent the "NULL AGGREGATE" message.

SELECT t.IndexID

,Status1 = MAX(t.Status1)

,Priority1 = MAX(CASE WHEN t.Status1 = s.StatusId THEN s.PriorityID ELSE -1 END)

,Status2 = MAX(t.Status2)

,Priority2 = MAX(CASE WHEN t.Status2 = s.StatusId THEN s.PriorityID ELSE -1 END)

,Status3 = MAX(t.Status3)

,Priority3 = MAX(CASE WHEN t.Status3 = s.StatusId THEN s.PriorityID ELSE -1 END)

FROM @Tbl t

CROSS JOIN @spTable s

GROUP BY t.IndexID

ORDER BY t.IndexID

;

Of course, if your column names really are numeric in nature, you could do some pretty amazing and simple stuff to "auto-magically" generate all of the code instead of using CPR (Copy, Paste, Replace). 😉 It would also be a snap to change the column names in the output to Time-of-Day to make it easier for humans to read.

Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

N_Muller SSCarpal Tunnel Points: 4037 More actions · Answer 3

Yes, my table is quite large and it does have primary keys, and I do generate the code automatically. In any case, here's a version that I ended up using which uses cross apply, instead of the case statement:

selectX.IndexID,

X.Status1,

Priority1 = case when PX.ColID = 1 then S.PriorityID end,

X.Status2,

Priority2 = case when PX.ColID = 2 then S.PriorityID end,

X.Status3,

Priority3 = case when PX.ColID = 3 then S.PriorityID end

from@tbl X cross apply

(

values( 1, X.Status1 ),

( 2, X.Status2 ),

( 3, X.Status3 )

) PX ( ColID, StatusID ) inner join

@spTable S on

S.StatusID = PX.StatusID

group by X.IndexID, X.Status1, X.Status2, X.Status3

mister.magoo SSC-Forever Points: 47068 More actions · Answer 4

Jeff Moden (12/13/2013)
You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick. 😛
...
Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".

First, let me say thanks Jeff (mostly for not saying "You've done it again Magoo") for the kind words 😀

There is a slight mistake in your code, which is just a copy/paste thing of referencing Status1 three times...

As for the "old-fart-that-hates-typing-and-reading-lots-of-code trick", it's interesting that the query engine is smart enough to realise (for a large enough set of data) that our two queries are logically identical and it will generate the exact same plan for both (at least it did for 1,000,000 rows) I am constantly in awe of the programmers that develop(ed) that code!

I'm not convinced on the typing thing though - my fingers hate typing MAx( before and ) after each column when I can just type or copy/paste the column list into the group by - but that is (as I just said) immaterial as the engine doesn't care either way! 😎

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mister.magoo SSC-Forever Points: 47068 More actions · Answer 5

N_Muller (12/14/2013)
Yes, my table is quite large and it does have primary keys, and I do generate the code automatically. In any case, here's a version that I ended up using which uses cross apply, instead of the case statement:
selectX.IndexID,
X.Status1,
Priority1 = case when PX.ColID = 1 then S.PriorityID end,
X.Status2,
Priority2 = case when PX.ColID = 2 then S.PriorityID end,
X.Status3,
Priority3 = case when PX.ColID = 3 then S.PriorityID end
from@tbl X cross apply
(
values( 1, X.Status1 ),
( 2, X.Status2 ),
( 3, X.Status3 )
) PX ( ColID, StatusID ) inner join
@spTable S on
S.StatusID = PX.StatusID
group by X.IndexID, X.Status1, X.Status2, X.Status3

You said you used cross apply instead of case? You have both :w00t:

Your query, as it is, won't work without some aggregate functions on the PriorityX columns to coordinate with that group by...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 6

mister.magoo (12/14/2013)
Jeff Moden (12/13/2013)
You can also make the GROUP BY a whole lot easier and shorter if you understand that the IndexID for @Tbl is, in fact, unique and that allows you to use MAX for the StatusN columns to return the StatusN values without having to include them all in the GROUP BY. It's an ancient old-fart-that-hates-typing-and-reading-lots-of-code trick. 😛
...
Magoo... hat's off to you, Sir. Awesome job with the CROSS JOINed CROSSTAB and recognizing the original table was the same as a preaggregation. It easily avoids an "unpivot, match, and repivot".
First, let me say thanks Jeff (mostly for not saying "You've done it again Magoo") for the kind words 😀
There is a slight mistake in your code, which is just a copy/paste thing of referencing Status1 three times...
As for the "old-fart-that-hates-typing-and-reading-lots-of-code trick", it's interesting that the query engine is smart enough to realise (for a large enough set of data) that our two queries are logically identical and it will generate the exact same plan for both (at least it did for 1,000,000 rows) I am constantly in awe of the programmers that develop(ed) that code!
I'm not convinced on the typing thing though - my fingers hate typing MAx( before and ) after each column when I can just type or copy/paste the column list into the group by - but that is (as I just said) immaterial as the engine doesn't care either way! 😎

Thanks for catching the CPR mistake. I've fixed that in the code where you found it.

As for the typing thing, I wouldn't actually type any of the SELECT list more than once. I'd type out a little dynamic SQL and let it generate the 96 (in this case) paired items including more meaningful labels. Since I wouldn't have do the same thing for the GROUP BY, I've save myself a little typing. 😀

I'm also in awe of the programmers that worked on the execution plan. In fact, I'm totally in awe of the optimizer that decides how it's going to put all of this stuff together and whether or not to use an index based on statistics, etc, ad infinitum. Despite some of it's faults, it's an amazing piece of software where we don't have to spend half our waking hours writing loops to retrieve data from files and then loops to join and then loops to select and loops to... you get the idea.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86908 More actions · Answer 7

It appears that Monday morning without coffee isn't particularly conduciveto understanding requirements, so post removed.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

N_Muller SSCarpal Tunnel Points: 4037 More actions · Answer 8

You are correct. The query won't work without an aggregate. However, going back to my original question on the usage of sub-queries. So far I have the two options, using the cross apply or sub-query. I'm sure there are others, but I haven't figured those out yet. The sub-query approach takes a lot longer to generate the query plan when compared to the cross apply where compilation takes no time at all. However, once the query plan is built running the query repeatedly is much faster than the cross apply. The query is embedded in a stored procedure which is called with different parameters. Also, my problem isn't as simple as this example, which I built for illustration. I actually have 96 columns and thousands of rows. My question is "is there a way to get the compilation time of the cross apply and the execution time of the sub-query?" Thanks in advance to anyone that can help me.

Cross apply method:

selectX.IndexID,

X.Status1,

Priority1 = sum(case when PX.ColID = 1 then S.PriorityID end),

X.Status2,

Priority2 = sum(case when PX.ColID = 2 then S.PriorityID end),

X.Status3,

Priority3 = sum(case when PX.ColID = 3 then S.PriorityID end)

from@tbl X cross apply

(

values( 1, X.Status1 ),

( 2, X.Status2 ),

( 3, X.Status3 )

) PX ( ColID, StatusID ) inner join

@spTable S on

S.StatusID = PX.StatusID

group by X.IndexID, X.Status1, X.Status2, X.Status3

Sub-query method:

selectX.IndexID,

X.Status1,

Priority1 = ( select top 1 PriorityID from @spTable where StatusID = X.Status1 ),

X.Status2,

Priority2 = ( select top 1 PriorityID from @spTable where StatusID = X.Status2 ),

X.Status3,

Priority3 = ( select top 1 PriorityID from @spTable where StatusID = X.Status3 )

from@tbl X

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 9

N_Muller (12/16/2013)
You are correct. The query won't work without an aggregate. However, going back to my original question on the usage of sub-queries. So far I have the two options, using the cross apply or sub-query. I'm sure there are others, but I haven't figured those out yet. The sub-query approach takes a lot longer to generate the query plan when compared to the cross apply where compilation takes no time at all. However, once the query plan is built running the query repeatedly is much faster than the cross apply. The query is embedded in a stored procedure which is called with different parameters. Also, my problem isn't as simple as this example, which I built for illustration. I actually have 96 columns and thousands of rows. My question is "is there a way to get the compilation time of the cross apply and the execution time of the sub-query?" Thanks in advance to anyone that can help me.
Cross apply method:
selectX.IndexID,
X.Status1,
Priority1 = sum(case when PX.ColID = 1 then S.PriorityID end),
X.Status2,
Priority2 = sum(case when PX.ColID = 2 then S.PriorityID end),
X.Status3,
Priority3 = sum(case when PX.ColID = 3 then S.PriorityID end)
from@tbl X cross apply
(
values( 1, X.Status1 ),
( 2, X.Status2 ),
( 3, X.Status3 )
) PX ( ColID, StatusID ) inner join
@spTable S on
S.StatusID = PX.StatusID
group by X.IndexID, X.Status1, X.Status2, X.Status3
Sub-query method:
selectX.IndexID,
X.Status1,
Priority1 = ( select top 1 PriorityID from @spTable where StatusID = X.Status1 ),
X.Status2,
Priority2 = ( select top 1 PriorityID from @spTable where StatusID = X.Status2 ),
X.Status3,
Priority3 = ( select top 1 PriorityID from @spTable where StatusID = X.Status3 )
from@tbl X

Actually, you have 4 options and the right one is neither of the two above. Go back and look at Magoo's code for one of the right ways to do it and then look at mine for a shorter alternative. Both should blow the doors off the two examples you have above.

Subqueries are out of the question for this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

N_Muller SSCarpal Tunnel Points: 4037 More actions · Answer 10

I don't know what was going on in my head. Thanks very much.