SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Best way to do this kind of group by


Best way to do this kind of group by

Author
Message
Sam S Kolli
Sam S Kolli
SSC-Enthusiastic
SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)

Group: General Forum Members
Points: 141 Visits: 3486
What would be the best way to write a sql query where the result set contains a group by where one column is grouped by one set of columns and another column is grouped by another set of columns and still have the two different group by appear on the same row.

Lets say there is a table such as follows:


IF OBJECT_ID('tempdb..#a') IS NOT NULL DROP TABLE #a
CREATE TABLE #a (Grp_A CHAR(2), Grp_B CHAR(2), Grp_C CHAR(2), Val_A TINYINT, Val_B TINYINT)



Sample Data:


INSERT INTO #a SELECT 'A1', 'B1', 'C1', 1, 1
INSERT INTO #a SELECT 'A1', 'B1', 'C2', 1, 1
INSERT INTO #a SELECT 'A1', 'B2', 'C3', 1, 1
INSERT INTO #a SELECT 'A2', 'B2', 'C4', 1, 1



The desired result set for the above would be:


SELECT A1.Grp_A, A1.Grp_B, A1.Grp_C, SUM(Val_A) AS GrpBy_ABC, A2.GrpBy_AB
FROM #a A1
INNER JOIN (
SELECT A.Grp_A, A.Grp_B, SUM(Val_B) AS GrpBy_AB FROM #a A GROUP BY A.Grp_A, A.Grp_B
) AS A2
ON (A1.Grp_A = A2.Grp_A AND A1.Grp_B = A2.Grp_B)
GROUP BY A1.Grp_A, A1.Grp_B, A1.Grp_C, A2.GrpBy_AB



While the above query gives me the desired result, it basically involves two scans on the base table which gives me the following concern. For example, if the base table contains a few million rows and if there are around 6 Group by columns, it can be a pretty expensive query cos there would be two huge scans and a join involving 6 columns.

Using grouping sets helps me to do one scan on the base table and get the aggregates.


SELECT GROUPING_ID(A1.Grp_A, A1.Grp_B, A1.Grp_C) AS GroupId,
A1.Grp_A, A1.Grp_B, A1.Grp_C,
SUM(Val_A) AS GrpBy_ABC, SUM(Val_B) AS GrpBy_AB
FROM #a A1
GROUP BY GROUPING SETS(
(A1.Grp_A, A1.Grp_B, A1.Grp_C),
(A1.Grp_A, A1.Grp_B)
)



But the output is not in the desired format since the second GROUPING sET information is not returned on the same row but on a new row. And to get the output into desired format, I will have to do a self join the intermediate output obtained by GROUPING SETS which in essence results in two more 2 huge scans and ends up being worse than the above query using self join on the base table.

So what can be the best way to approach this query? This would be part of a Load query, so while there is no requirement to have the query run in a few seconds, I would still like to do it in the best way possible.

P.S. Do other DBMS or SQL dialects allow for such a query to be written? Basically something like this;

SELECT A.Grp_A, A.Grp_B, A.Grp_C,
SUM(Val_A GROUP BY A.Grp_A, A.Grp_B, A.Grp_C) AS GrpBy_ABC,
SUM(Val_B GROUP BY A.Grp_A, A.Grp_B) AS GrpBy_AB
FROM #a A

In essence, there is only one scan and there is no need for a self join.
Mark Cowne
Mark Cowne
SSCertifiable
SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)

Group: General Forum Members
Points: 6273 Visits: 25530
Probably not the best way, but seems to work


SELECT DISTINCT
Grp_A, Grp_B, Grp_C,
SUM(Val_A) OVER(PARTITION BY Grp_A, Grp_B, Grp_C) AS GrpBy_ABC,
SUM(Val_B) OVER(PARTITION BY Grp_A, Grp_B) AS GrpBy_AB
FROM #a



____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537




ChrisM@Work
ChrisM@Work
SSC-Forever
SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)

Group: General Forum Members
Points: 40324 Visits: 20000
Mark-101232 (9/9/2013)

Probably not the best way, but seems to work


SELECT DISTINCT
Grp_A, Grp_B, Grp_C,
SUM(Val_A) OVER(PARTITION BY Grp_A, Grp_B, Grp_C) AS GrpBy_ABC,
SUM(Val_B) OVER(PARTITION BY Grp_A, Grp_B) AS GrpBy_AB
FROM #a



So does this:
SELECT 
d.Grp_A, d.Grp_B, d.Grp_C,
d.SUM_Val_A,
SUM_Val_B = SUM(d.PartSUM_Val_B) OVER (PARTITION BY d.Grp_A, d.Grp_B)
FROM (
SELECT
A1.Grp_A, A1.Grp_B, A1.Grp_C,
SUM(Val_A) AS SUM_Val_A,
SUM(Val_B) AS PartSUM_Val_B
FROM #a A1
GROUP BY A1.Grp_A, A1.Grp_B, A1.Grp_C
) d


both queries read the base table once then use table spools for the dirty work.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Sam S Kolli
Sam S Kolli
SSC-Enthusiastic
SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)SSC-Enthusiastic (141 reputation)

Group: General Forum Members
Points: 141 Visits: 3486
Mark and Chris...thank you for your input. I havent yet got a chance to test them; but they do look promising.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search