SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


GROUPING SETS - 1


GROUPING SETS - 1

Author
Message
Hugo Kornelis
Hugo Kornelis
SSC-Dedicated
SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)

Group: General Forum Members
Points: 34646 Visits: 13126
I don't get it.

First, the idea of the question. We are given SQL to create and populate a table, *and* SQL that (supposedly?) shows the correct answers. Are we supposed to run this? Work it out in our heads? On a SQL Forum, I'd much sooner expect the task to be to WRITE the query to find the correct answers!

Second, the explanation. It appears to be incomplete. It doesn't start with how I expect a sentence to start, and at the end is an empty dropdown. It appears to be part of the explanation of GROUPING SETS, but there is no reference made at all to the question or answers.

Third, the query given. Even though the explanation mentions the requirement of aggregate functions in the SELECT, the query uses two columns not included in the GROUP BY and not in an aggregate. That causes a lot of NULL values in the result set, and (which is more important) misleading information. Also, the GROUPING SETS specification is only partially related to the questions asked, and the GROUPING_ID has the wrong column specification, causing the grp_id column to be useless.

Fourth, the answers supposed to be correct. They are not. At least not all.
1. "The highest value sold to (one) custid" is $ 723.79 (to customer B), as can be seen from running this query:
SELECT   custid, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY custid
ORDER BY TotalValue DESC;



2. "The highest quantity sold to any / a single custid" depends on whether this refers to the total of all sales to a customer, or to the highest in a single sale. In the first case, it is 72. In the second case, it is 40.
SELECT   custid, SUM(qty) AS Sumqty, MAX(qty) AS Maxqty
FROM dbo.Orders
GROUP BY custid;



3. "Empid with highest value sold to a single custid is 4" / "Empid with highest value sold is 3" - I thought these were both true when going through this in my head, but it turns out I was wrong. Employee 4 sells a total of $ 418.10 to customer A and nothing else. Employee sells a total of $612.45, but spread over different customers. However, I had overlooked that one of those customers is good for $499.95, so employee 3 also sold the highest to a single custid.
SELECT   empid, custid, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY ROLLUP(empid, custid)
ORDER BY TotalValue DESC;



4. 2013 is the year with the highest quantity sold:
SELECT   YEAR(orderdate) AS SaleYear, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY YEAR(orderdate);



So, not four correct answers (as Ron thought when submitting the question), not two (as I thought when answering), but three.


Oh, and for those who want to learn about GROUPING SETS and GROUPING_ID, here are two references:
http://msdn.microsoft.com/en-us/library/ms177673.aspx
http://msdn.microsoft.com/en-us/library/bb510624.aspx

And here's a single query that uses GROUPING_SETS to find all the relevant answers. I've included a corrected GROUPING_ID as well.
SELECT   GROUPING_ID(custid,
empid,
YEAR(orderdate)) AS grp_id,
empid, custid, YEAR(orderdate) AS SaleYear,
SUM(qty) AS TotalQty, SUM(TotalSales) AS SalesValue,
SUM(qty) AS Sumqty, MAX(qty) AS Maxqty
FROM dbo.Orders
GROUP BY GROUPING SETS (custid, -- Total per customer
empid, -- Total per employee
(custid, empid), -- Per employee / customer combination
YEAR(orderdate)) -- Per year
ORDER BY grp_id;



Note that the grp_id column is a bitmap that holds a 1 in the corresponding bit for the aggregated columns, and the LAST column is 1. So the year is 1, empid 2, and custid is 4.
The value 6 is 4 + 2 (or binary 110), meeaning that empid and custid are aggregated, but year is not - so these rows represent the total per year.
For the totals per employee/customer combination, I want year to be aggregated and empid and customer not - so that would be 1 (binary 001).
For the per-customer totals, I want employee and year aggregated, so I need 2 + 1 = 3 (binary 011).
And the value 5 finally represents the per-employee totals (binary 101, or 4+1, so customer and year aggregated).

Bottom line: using GROUPING SETS is hard. Very hard. It is a useful and very powerful instrument, but it takes a lot of time to learn to wield properly. (I think I have spent at least an hour reading up and trying things before I understood enough to be able to write this reply). Anyone who thinks they can benefit from this often-overlooked function should take the time to get to know this instrument.

Thanks, Ron, for the question. I do not think it is a good question, but it did prompt me to read up on GROUPING SETS and GROUPING_ID, and expand my knowledge - so it is useful after all! Wink


Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Raghavendra Mudugal
Raghavendra Mudugal
SSCertifiable
SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)

Group: General Forum Members
Points: 5572 Visits: 2958
Hugo Kornelis (5/31/2013)
I don't get it.

First, the idea of the question. We are given SQL to create and populate a table, *and* SQL that (supposedly?) shows the correct answers. Are we supposed to run this? Work it out in our heads? On a SQL Forum, I'd much sooner expect the task to be to WRITE the query to find the correct answers!

Second, the explanation. It appears to be incomplete. It doesn't start with how I expect a sentence to start, and at the end is an empty dropdown. It appears to be part of the explanation of GROUPING SETS, but there is no reference made at all to the question or answers.

Third, the query given. Even though the explanation mentions the requirement of aggregate functions in the SELECT, the query uses two columns not included in the GROUP BY and not in an aggregate. That causes a lot of NULL values in the result set, and (which is more important) misleading information. Also, the GROUPING SETS specification is only partially related to the questions asked, and the GROUPING_ID has the wrong column specification, causing the grp_id column to be useless.

Fourth, the answers supposed to be correct. They are not. At least not all.
1. "The highest value sold to (one) custid" is $ 723.79 (to customer B), as can be seen from running this query:
SELECT   custid, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY custid
ORDER BY TotalValue DESC;



2. "The highest quantity sold to any / a single custid" depends on whether this refers to the total of all sales to a customer, or to the highest in a single sale. In the first case, it is 72. In the second case, it is 40.
SELECT   custid, SUM(qty) AS Sumqty, MAX(qty) AS Maxqty
FROM dbo.Orders
GROUP BY custid;



3. "Empid with highest value sold to a single custid is 4" / "Empid with highest value sold is 3" - I thought these were both true when going through this in my head, but it turns out I was wrong. Employee 4 sells a total of $ 418.10 to customer A and nothing else. Employee sells a total of $612.45, but spread over different customers. However, I had overlooked that one of those customers is good for $499.95, so employee 3 also sold the highest to a single custid.
SELECT   empid, custid, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY ROLLUP(empid, custid)
ORDER BY TotalValue DESC;



4. 2013 is the year with the highest quantity sold:
SELECT   YEAR(orderdate) AS SaleYear, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY YEAR(orderdate);



So, not four correct answers (as Ron thought when submitting the question), not two (as I thought when answering), but three.


Oh, and for those who want to learn about GROUPING SETS and GROUPING_ID, here are two references:
http://msdn.microsoft.com/en-us/library/ms177673.aspx
http://msdn.microsoft.com/en-us/library/bb510624.aspx

And here's a single query that uses GROUPING_SETS to find all the relevant answers. I've included a corrected GROUPING_ID as well.
SELECT   GROUPING_ID(custid,
empid,
YEAR(orderdate)) AS grp_id,
empid, custid, YEAR(orderdate) AS SaleYear,
SUM(qty) AS TotalQty, SUM(TotalSales) AS SalesValue,
SUM(qty) AS Sumqty, MAX(qty) AS Maxqty
FROM dbo.Orders
GROUP BY GROUPING SETS (custid, -- Total per customer
empid, -- Total per employee
(custid, empid), -- Per employee / customer combination
YEAR(orderdate)) -- Per year
ORDER BY grp_id;



Note that the grp_id column is a bitmap that holds a 1 in the corresponding bit for the aggregated columns, and the LAST column is 1. So the year is 1, empid 2, and custid is 4.
The value 6 is 4 + 2 (or binary 110), meeaning that empid and custid are aggregated, but year is not - so these rows represent the total per year.
For the totals per employee/customer combination, I want year to be aggregated and empid and customer not - so that would be 1 (binary 001).
For the per-customer totals, I want employee and year aggregated, so I need 2 + 1 = 3 (binary 011).
And the value 5 finally represents the per-employee totals (binary 101, or 4+1, so customer and year aggregated).

Bottom line: using GROUPING SETS is hard. Very hard. It is a useful and very powerful instrument, but it takes a lot of time to learn to wield properly. (I think I have spent at least an hour reading up and trying things before I understood enough to be able to write this reply). Anyone who thinks they can benefit from this often-overlooked function should take the time to get to know this instrument.

Thanks, Ron, for the question. I do not think it is a good question, but it did prompt me to read up on GROUPING SETS and GROUPING_ID, and expand my knowledge - so it is useful after all! Wink


+ 1*6.028*10^28

(this post is going in to my PDF document collection) :-)

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.
batgirl
batgirl
Hall of Fame
Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)

Group: General Forum Members
Points: 3243 Visits: 1820
Hugo Kornelis (5/31/2013)
I don't get it.


+1

The only "answer" that is even remotely clear in the results of the suggested query is that 2013 had the highest quantity sold. I get the fact that I wasn't supposed to run it to find the answers, but of course the implication is that it should provide them right?

I do appreciate the reason to examine grouping sets. The "-1" leads me to believe that there will be more like this forthcoming. I hope for something a bit more straightforward.
jlennartz
jlennartz
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1152 Visits: 1197
Rich Weissler (5/31/2013)
*scratches head* >The highest quantity sold to any custid is 50
A 72
B 47
C 56
D 30

??



+1

Only 3 correct answers and I guessed wrong on the 4th. As 72 is the highest quantity sold to any custid not 50.
Tom Thomson
Tom Thomson
SSC Guru
SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)

Group: General Forum Members
Points: 51086 Visits: 13160
Hugo Kornelis (5/31/2013)
I don't get it.

I did get it, in the sense that I think I .understood the tortuous misuse of the English language in the formulation of the question. Whenever highest quantity or highest value is referred to, even if it's called highest for a single customer or highest for an employee or highest for a year the question is not referring to the quantities that the English leads one to expect, but to a different set of values: only the values in a single order are to be considered, not any aggregates. I spotted that when looking at the options for one of the four categories and then answered on the basis that all the values referred to single order amounts - that got it right, but I see from your comments that some values would also have been right with the more normal interpretation of the words. I think it a pretty horrible question, but people can learn from it - how not to choose a grouping id for example (in this case watching the pattern of NULLs is a better way of identifying groups than using the badly chosen grouping id), and people who haven't come across the grouping sets concept before can at least get a view of what that is (although I would have hated to learn it from anything like this).

First, the idea of the question. We are given SQL to create and populate a table, *and* SQL that (supposedly?) shows the correct answers. Are we supposed to run this? Work it out in our heads? On a SQL Forum, I'd much sooner expect the task to be to WRITE the query to find the correct answers!

I'll add to that that the query is utterly pointless from the point of view of answering those questions, it would be easier to answer them from the orders table(since they all refer to single orders, not to aggregates other than MAX, which is nowhere used in the query although it's the only one relevant to choosing between the options provided). It is of course possible to answer those questions by looking at the output from that query, that is a good deal harder, in my view, that just looking at the base table.

I agree with the rest of your comments and don't think I could add anything useful, so I won't waste everyone's time quoting them and agreeing verbosely.

Tom

Hugo Kornelis
Hugo Kornelis
SSC-Dedicated
SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)

Group: General Forum Members
Points: 34646 Visits: 13126
L' Eomot Inversé (6/1/2013)
(in this case watching the pattern of NULLs is a better way of identifying groups than using the badly chosen grouping id)

True, but only in this case. If the base data has NULL values, you do need to use GROUPING_ID to differentiate a NULL from the base data from a NULL that represents aggregation for that particular column.


Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Revenant
Revenant
SSC-Insane
SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)

Group: General Forum Members
Points: 22623 Visits: 5137
Hugo Kornelis (5/31/2013)
I don't get it.

. . .

Neither do I. But I lucked out. ;-)
timothy bates
timothy bates
Old Hand
Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)Old Hand (324 reputation)

Group: General Forum Members
Points: 324 Visits: 128
I too lucked out. I didn't like the question as posed. Having individual orders with odd quantities and pricing, was placed it out of the scope of a QotD in my opinion. I like the idea of presenting grouping sets, however, having to resort to mulitple uses of calculator (e.g. 12 * 6.57) to find totals became more of a time demand than I was ready to invest.

Perhaps, an interesting question might have been:
How many distinct groups are created? (4).

For the first two questions it seems they should have included "by any one employee."

The below will generate answers different to those credited but seem to more accurately reflect the question posed.

If instead we used the following
GROUP BY GROUPING SETS (
(custid) -- Addresses first two questions (sold and quantity to one customer)
,(empid) -- Addresses third question (emp id with highest sold)
,(YEAR(orderdate)) -- Addresses last question (year with highest quantity sold)
)
SQLRNNR
SQLRNNR
SSC Guru
SSC Guru (145K reputation)SSC Guru (145K reputation)SSC Guru (145K reputation)SSC Guru (145K reputation)SSC Guru (145K reputation)SSC Guru (145K reputation)SSC Guru (145K reputation)SSC Guru (145K reputation)

Group: General Forum Members
Points: 145741 Visits: 18652
Thanks Hugo for the following. I will have to dive into grouping sets a bit further.

Hugo Kornelis (5/31/2013)
I don't get it.

First, the idea of the question. We are given SQL to create and populate a table, *and* SQL that (supposedly?) shows the correct answers. Are we supposed to run this? Work it out in our heads? On a SQL Forum, I'd much sooner expect the task to be to WRITE the query to find the correct answers!

Second, the explanation. It appears to be incomplete. It doesn't start with how I expect a sentence to start, and at the end is an empty dropdown. It appears to be part of the explanation of GROUPING SETS, but there is no reference made at all to the question or answers.

Third, the query given. Even though the explanation mentions the requirement of aggregate functions in the SELECT, the query uses two columns not included in the GROUP BY and not in an aggregate. That causes a lot of NULL values in the result set, and (which is more important) misleading information. Also, the GROUPING SETS specification is only partially related to the questions asked, and the GROUPING_ID has the wrong column specification, causing the grp_id column to be useless.

Fourth, the answers supposed to be correct. They are not. At least not all.
1. "The highest value sold to (one) custid" is $ 723.79 (to customer B), as can be seen from running this query:
SELECT   custid, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY custid
ORDER BY TotalValue DESC;



2. "The highest quantity sold to any / a single custid" depends on whether this refers to the total of all sales to a customer, or to the highest in a single sale. In the first case, it is 72. In the second case, it is 40.
SELECT   custid, SUM(qty) AS Sumqty, MAX(qty) AS Maxqty
FROM dbo.Orders
GROUP BY custid;



3. "Empid with highest value sold to a single custid is 4" / "Empid with highest value sold is 3" - I thought these were both true when going through this in my head, but it turns out I was wrong. Employee 4 sells a total of $ 418.10 to customer A and nothing else. Employee sells a total of $612.45, but spread over different customers. However, I had overlooked that one of those customers is good for $499.95, so employee 3 also sold the highest to a single custid.
SELECT   empid, custid, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY ROLLUP(empid, custid)
ORDER BY TotalValue DESC;



4. 2013 is the year with the highest quantity sold:
SELECT   YEAR(orderdate) AS SaleYear, SUM(TotalSales) AS TotalValue
FROM dbo.Orders
GROUP BY YEAR(orderdate);



So, not four correct answers (as Ron thought when submitting the question), not two (as I thought when answering), but three.


Oh, and for those who want to learn about GROUPING SETS and GROUPING_ID, here are two references:
http://msdn.microsoft.com/en-us/library/ms177673.aspx
http://msdn.microsoft.com/en-us/library/bb510624.aspx

And here's a single query that uses GROUPING_SETS to find all the relevant answers. I've included a corrected GROUPING_ID as well.
SELECT   GROUPING_ID(custid,
empid,
YEAR(orderdate)) AS grp_id,
empid, custid, YEAR(orderdate) AS SaleYear,
SUM(qty) AS TotalQty, SUM(TotalSales) AS SalesValue,
SUM(qty) AS Sumqty, MAX(qty) AS Maxqty
FROM dbo.Orders
GROUP BY GROUPING SETS (custid, -- Total per customer
empid, -- Total per employee
(custid, empid), -- Per employee / customer combination
YEAR(orderdate)) -- Per year
ORDER BY grp_id;



Note that the grp_id column is a bitmap that holds a 1 in the corresponding bit for the aggregated columns, and the LAST column is 1. So the year is 1, empid 2, and custid is 4.
The value 6 is 4 + 2 (or binary 110), meeaning that empid and custid are aggregated, but year is not - so these rows represent the total per year.
For the totals per employee/customer combination, I want year to be aggregated and empid and customer not - so that would be 1 (binary 001).
For the per-customer totals, I want employee and year aggregated, so I need 2 + 1 = 3 (binary 011).
And the value 5 finally represents the per-employee totals (binary 101, or 4+1, so customer and year aggregated).

Bottom line: using GROUPING SETS is hard. Very hard. It is a useful and very powerful instrument, but it takes a lot of time to learn to wield properly. (I think I have spent at least an hour reading up and trying things before I understood enough to be able to write this reply). Anyone who thinks they can benefit from this often-overlooked function should take the time to get to know this instrument.

Thanks, Ron, for the question. I do not think it is a good question, but it did prompt me to read up on GROUPING SETS and GROUPING_ID, and expand my knowledge - so it is useful after all! Wink


Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw
Learn Extended Events

Paul White
Paul White
SSC Guru
SSC Guru (80K reputation)SSC Guru (80K reputation)SSC Guru (80K reputation)SSC Guru (80K reputation)SSC Guru (80K reputation)SSC Guru (80K reputation)SSC Guru (80K reputation)SSC Guru (80K reputation)

Group: General Forum Members
Points: 80552 Visits: 11400
Yikes.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search