Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Best way to do this kind of group by Expand / Collapse
Author
Message
Posted Sunday, September 8, 2013 9:44 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Today @ 6:37 AM
Points: 25, Visits: 2,596
What would be the best way to write a sql query where the result set contains a group by where one column is grouped by one set of columns and another column is grouped by another set of columns and still have the two different group by appear on the same row.

Lets say there is a table such as follows:

IF OBJECT_ID('tempdb..#a') IS NOT NULL DROP TABLE #a
CREATE TABLE #a (Grp_A CHAR(2), Grp_B CHAR(2), Grp_C CHAR(2), Val_A TINYINT, Val_B TINYINT)

Sample Data:

INSERT INTO #a SELECT 'A1', 'B1', 'C1', 1, 1
INSERT INTO #a SELECT 'A1', 'B1', 'C2', 1, 1
INSERT INTO #a SELECT 'A1', 'B2', 'C3', 1, 1
INSERT INTO #a SELECT 'A2', 'B2', 'C4', 1, 1

The desired result set for the above would be:

SELECT  A1.Grp_A, A1.Grp_B, A1.Grp_C, SUM(Val_A) AS GrpBy_ABC, A2.GrpBy_AB
FROM #a A1
INNER JOIN (
SELECT A.Grp_A, A.Grp_B, SUM(Val_B) AS GrpBy_AB FROM #a A GROUP BY A.Grp_A, A.Grp_B
) AS A2
ON (A1.Grp_A = A2.Grp_A AND A1.Grp_B = A2.Grp_B)
GROUP BY A1.Grp_A, A1.Grp_B, A1.Grp_C, A2.GrpBy_AB

While the above query gives me the desired result, it basically involves two scans on the base table which gives me the following concern. For example, if the base table contains a few million rows and if there are around 6 Group by columns, it can be a pretty expensive query cos there would be two huge scans and a join involving 6 columns.

Using grouping sets helps me to do one scan on the base table and get the aggregates.

SELECT GROUPING_ID(A1.Grp_A, A1.Grp_B, A1.Grp_C) AS GroupId, 
A1.Grp_A, A1.Grp_B, A1.Grp_C,
SUM(Val_A) AS GrpBy_ABC, SUM(Val_B) AS GrpBy_AB
FROM #a A1
GROUP BY GROUPING SETS(
(A1.Grp_A, A1.Grp_B, A1.Grp_C),
(A1.Grp_A, A1.Grp_B)
)

But the output is not in the desired format since the second GROUPING sET information is not returned on the same row but on a new row. And to get the output into desired format, I will have to do a self join the intermediate output obtained by GROUPING SETS which in essence results in two more 2 huge scans and ends up being worse than the above query using self join on the base table.

So what can be the best way to approach this query? This would be part of a Load query, so while there is no requirement to have the query run in a few seconds, I would still like to do it in the best way possible.

P.S. Do other DBMS or SQL dialects allow for such a query to be written? Basically something like this;

SELECT A.Grp_A, A.Grp_B, A.Grp_C,
SUM(Val_A GROUP BY A.Grp_A, A.Grp_B, A.Grp_C) AS GrpBy_ABC,
SUM(Val_B GROUP BY A.Grp_A, A.Grp_B) AS GrpBy_AB
FROM #a A

In essence, there is only one scan and there is no need for a self join.


Post #1492652
Posted Monday, September 9, 2013 4:37 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, November 11, 2014 3:53 AM
Points: 1,678, Visits: 19,555

Probably not the best way, but seems to work


SELECT DISTINCT
Grp_A, Grp_B, Grp_C,
SUM(Val_A) OVER(PARTITION BY Grp_A, Grp_B, Grp_C) AS GrpBy_ABC,
SUM(Val_B) OVER(PARTITION BY Grp_A, Grp_B) AS GrpBy_AB
FROM #a



____________________________________________________

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Never approach a goat from the front, a horse from the rear, or a fool from any direction.
Post #1492725
Posted Monday, September 9, 2013 6:46 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:33 AM
Points: 6,858, Visits: 14,148
Mark-101232 (9/9/2013)

Probably not the best way, but seems to work


SELECT DISTINCT
Grp_A, Grp_B, Grp_C,
SUM(Val_A) OVER(PARTITION BY Grp_A, Grp_B, Grp_C) AS GrpBy_ABC,
SUM(Val_B) OVER(PARTITION BY Grp_A, Grp_B) AS GrpBy_AB
FROM #a



So does this:
SELECT 
d.Grp_A, d.Grp_B, d.Grp_C,
d.SUM_Val_A,
SUM_Val_B = SUM(d.PartSUM_Val_B) OVER (PARTITION BY d.Grp_A, d.Grp_B)
FROM (
SELECT
A1.Grp_A, A1.Grp_B, A1.Grp_C,
SUM(Val_A) AS SUM_Val_A,
SUM(Val_B) AS PartSUM_Val_B
FROM #a A1
GROUP BY A1.Grp_A, A1.Grp_B, A1.Grp_C
) d

both queries read the base table once then use table spools for the dirty work.


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1492754
Posted Monday, September 9, 2013 7:50 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Today @ 6:37 AM
Points: 25, Visits: 2,596
Mark and Chris...thank you for your input. I havent yet got a chance to test them; but they do look promising.
Post #1492967
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse