What would be the best way to write a sql query where the result set contains a group by where one column is grouped by one set of columns and another column is grouped by another set of columns and still have the two different group by appear on the same row.
Lets say there is a table such as follows:
IF OBJECT_ID('tempdb..#a') IS NOT NULL DROP TABLE #a
CREATE TABLE #a (Grp_A CHAR(2), Grp_B CHAR(2), Grp_C CHAR(2), Val_A TINYINT, Val_B TINYINT)
INSERT INTO #a SELECT 'A1', 'B1', 'C1', 1, 1
INSERT INTO #a SELECT 'A1', 'B1', 'C2', 1, 1
INSERT INTO #a SELECT 'A1', 'B2', 'C3', 1, 1
INSERT INTO #a SELECT 'A2', 'B2', 'C4', 1, 1
The desired result set for the above would be:
SELECT A1.Grp_A, A1.Grp_B, A1.Grp_C, SUM(Val_A) AS GrpBy_ABC, A2.GrpBy_AB
FROM #a A1
INNER JOIN (
SELECT A.Grp_A, A.Grp_B, SUM(Val_B) AS GrpBy_AB FROM #a A GROUP BY A.Grp_A, A.Grp_B
) AS A2
ON (A1.Grp_A = A2.Grp_A AND A1.Grp_B = A2.Grp_B)
GROUP BY A1.Grp_A, A1.Grp_B, A1.Grp_C, A2.GrpBy_AB
While the above query gives me the desired result, it basically involves two scans on the base table which gives me the following concern. For example, if the base table contains a few million rows and if there are around 6 Group by columns, it can be a pretty expensive query cos there would be two huge scans and a join involving 6 columns.
Using grouping sets helps me to do one scan on the base table and get the aggregates.
SELECT GROUPING_ID(A1.Grp_A, A1.Grp_B, A1.Grp_C) AS GroupId,
A1.Grp_A, A1.Grp_B, A1.Grp_C,
SUM(Val_A) AS GrpBy_ABC, SUM(Val_B) AS GrpBy_AB
FROM #a A1
GROUP BY GROUPING SETS(
(A1.Grp_A, A1.Grp_B, A1.Grp_C),
But the output is not in the desired format since the second GROUPING sET information is not returned on the same row but on a new row. And to get the output into desired format, I will have to do a self join the intermediate output obtained by GROUPING SETS which in essence results in two more 2 huge scans and ends up being worse than the above query using self join on the base table.
So what can be the best way to approach this query? This would be part of a Load query, so while there is no requirement to have the query run in a few seconds, I would still like to do it in the best way possible.
P.S. Do other DBMS or SQL dialects allow for such a query to be written? Basically something like this;
SELECT A.Grp_A, A.Grp_B, A.Grp_C,
SUM(Val_A GROUP BY A.Grp_A, A.Grp_B, A.Grp_C) AS GrpBy_ABC,
SUM(Val_B GROUP BY A.Grp_A, A.Grp_B) AS GrpBy_AB
FROM #a A
In essence, there is only one scan and there is no need for a self join.