GROUPING SETS - 1

  • Ron McCullough

    SSC Guru

    Points: 63877

    Comments posted to this topic are about the item GROUPING SETS - 1

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Raghavendra Mudugal

    SSChampion

    Points: 10658

    Thank you for the post, bucket. Good one.

    (:crying: not sure how I was suppose to calculate this whole in my head... so I executed the code and picked the answers and and you know what...:doze::sick: I selected the wrong answers :crying:)

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Rune Bivrin

    SSCertifiable

    Points: 7806

    I did what any BI consultant would do: I stuffed the numbers into Excel and did a Power Pivot.


    Just because you're right doesn't mean everybody else is wrong.

  • This was removed by the editor as SPAM

  • call.copse

    SSCoach

    Points: 17179

    I confess I'd have struggled to actually work these out in any way so apologise for 'cheating' (I ran it on my desktop instance) although I think learning about grouping sets was worthwhile.

    I'd like anyone's advice about when this might be a useful technique to use? I'd admit that in this instance the result set seemed confusing to me and I'd never put something like that in front of (my) clients. If I needed all those bits of data to display as a single presentation, say for a dashboard, I'd probably find writing separate queries less confusing to write and maintain.

  • Rich Weissler

    Hall of Fame

    Points: 3235

    *scratches head* >The highest quantity sold to any custid is 50

    A 72

    B 47

    C 56

    D 30

    ??

  • palotaiarpad

    SSCertifiable

    Points: 5317

    Sorry, but i hate codes like this. It took several minutes to clear up and make it well formated.

    But finally i got it right!

    The 50 is ok in my opinion, as you have to take empid also into account.

  • Ed Wagner

    SSC Guru

    Points: 286982

    I liked the question. I admit I haven't worked with grouping sets and it made me examine them. So, I learned something today, which makes it a good day. Thanks for a good question to finish out the week.

  • vinu512

    SSCoach

    Points: 15729

    I don't agree....I thought the options weren't precise enough....I couldn't work it out in my head....so I executed it to find out that my answers don't match the condition in the question which says "Select 4 options".

    So, I went with plain logic instead and select four and got it right. But, I still think that the options weren't precise enough.

    Anyways, thanks for the effort Bucket.

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • Lokesh Vij

    SSChampion

    Points: 10836

    Firstly, I would appreciate Ron for coming up with a nice Qotd with an intent to teach grouping sets. Secondly, I do feel, solving this question was rather tricky and I have to take those values into the excel sheet to work out the grouping sets. I know most of you would disagree with me if I say the question was clear enough, but you would appreciate that such questions are worth when it comes to "learning the basics". All glories to Ron for this!!

    ~ Lokesh Vij


    Guidelines for quicker answers on T-SQL question[/url]
    Guidelines for answers on Performance questions

    Link to my Blog Post --> www.SQLPathy.com[/url]

    Follow me @Twitter

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    I don't get it.

    First, the idea of the question. We are given SQL to create and populate a table, *and* SQL that (supposedly?) shows the correct answers. Are we supposed to run this? Work it out in our heads? On a SQL Forum, I'd much sooner expect the task to be to WRITE the query to find the correct answers!

    Second, the explanation. It appears to be incomplete. It doesn't start with how I expect a sentence to start, and at the end is an empty dropdown. It appears to be part of the explanation of GROUPING SETS, but there is no reference made at all to the question or answers.

    Third, the query given. Even though the explanation mentions the requirement of aggregate functions in the SELECT, the query uses two columns not included in the GROUP BY and not in an aggregate. That causes a lot of NULL values in the result set, and (which is more important) misleading information. Also, the GROUPING SETS specification is only partially related to the questions asked, and the GROUPING_ID has the wrong column specification, causing the grp_id column to be useless.

    Fourth, the answers supposed to be correct. They are not. At least not all.

    1. "The highest value sold to (one) custid" is $ 723.79 (to customer B), as can be seen from running this query:

    SELECT custid, SUM(TotalSales) AS TotalValue

    FROM dbo.Orders

    GROUP BY custid

    ORDER BY TotalValue DESC;

    2. "The highest quantity sold to any / a single custid" depends on whether this refers to the total of all sales to a customer, or to the highest in a single sale. In the first case, it is 72. In the second case, it is 40.

    SELECT custid, SUM(qty) AS Sumqty, MAX(qty) AS Maxqty

    FROM dbo.Orders

    GROUP BY custid;

    3. "Empid with highest value sold to a single custid is 4" / "Empid with highest value sold is 3" - I thought these were both true when going through this in my head, but it turns out I was wrong. Employee 4 sells a total of $ 418.10 to customer A and nothing else. Employee sells a total of $612.45, but spread over different customers. However, I had overlooked that one of those customers is good for $499.95, so employee 3 also sold the highest to a single custid.

    SELECT empid, custid, SUM(TotalSales) AS TotalValue

    FROM dbo.Orders

    GROUP BY ROLLUP(empid, custid)

    ORDER BY TotalValue DESC;

    4. 2013 is the year with the highest quantity sold:

    SELECT YEAR(orderdate) AS SaleYear, SUM(TotalSales) AS TotalValue

    FROM dbo.Orders

    GROUP BY YEAR(orderdate);

    So, not four correct answers (as Ron thought when submitting the question), not two (as I thought when answering), but three.

    Oh, and for those who want to learn about GROUPING SETS and GROUPING_ID, here are two references:

    http://msdn.microsoft.com/en-us/library/ms177673.aspx

    http://msdn.microsoft.com/en-us/library/bb510624.aspx

    And here's a single query that uses GROUPING_SETS to find all the relevant answers. I've included a corrected GROUPING_ID as well.

    SELECT GROUPING_ID(custid,

    empid,

    YEAR(orderdate)) AS grp_id,

    empid, custid, YEAR(orderdate) AS SaleYear,

    SUM(qty) AS TotalQty, SUM(TotalSales) AS SalesValue,

    SUM(qty) AS Sumqty, MAX(qty) AS Maxqty

    FROM dbo.Orders

    GROUP BY GROUPING SETS (custid, -- Total per customer

    empid, -- Total per employee

    (custid, empid), -- Per employee / customer combination

    YEAR(orderdate)) -- Per year

    ORDER BY grp_id;

    Note that the grp_id column is a bitmap that holds a 1 in the corresponding bit for the aggregated columns, and the LAST column is 1. So the year is 1, empid 2, and custid is 4.

    The value 6 is 4 + 2 (or binary 110), meeaning that empid and custid are aggregated, but year is not - so these rows represent the total per year.

    For the totals per employee/customer combination, I want year to be aggregated and empid and customer not - so that would be 1 (binary 001).

    For the per-customer totals, I want employee and year aggregated, so I need 2 + 1 = 3 (binary 011).

    And the value 5 finally represents the per-employee totals (binary 101, or 4+1, so customer and year aggregated).

    Bottom line: using GROUPING SETS is hard. Very hard. It is a useful and very powerful instrument, but it takes a lot of time to learn to wield properly. (I think I have spent at least an hour reading up and trying things before I understood enough to be able to write this reply). Anyone who thinks they can benefit from this often-overlooked function should take the time to get to know this instrument.

    Thanks, Ron, for the question. I do not think it is a good question, but it did prompt me to read up on GROUPING SETS and GROUPING_ID, and expand my knowledge - so it is useful after all! 😉


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Raghavendra Mudugal

    SSChampion

    Points: 10658

    Hugo Kornelis (5/31/2013)


    I don't get it.

    First, the idea of the question. We are given SQL to create and populate a table, *and* SQL that (supposedly?) shows the correct answers. Are we supposed to run this? Work it out in our heads? On a SQL Forum, I'd much sooner expect the task to be to WRITE the query to find the correct answers!

    Second, the explanation. It appears to be incomplete. It doesn't start with how I expect a sentence to start, and at the end is an empty dropdown. It appears to be part of the explanation of GROUPING SETS, but there is no reference made at all to the question or answers.

    Third, the query given. Even though the explanation mentions the requirement of aggregate functions in the SELECT, the query uses two columns not included in the GROUP BY and not in an aggregate. That causes a lot of NULL values in the result set, and (which is more important) misleading information. Also, the GROUPING SETS specification is only partially related to the questions asked, and the GROUPING_ID has the wrong column specification, causing the grp_id column to be useless.

    Fourth, the answers supposed to be correct. They are not. At least not all.

    1. "The highest value sold to (one) custid" is $ 723.79 (to customer B), as can be seen from running this query:

    SELECT custid, SUM(TotalSales) AS TotalValue

    FROM dbo.Orders

    GROUP BY custid

    ORDER BY TotalValue DESC;

    2. "The highest quantity sold to any / a single custid" depends on whether this refers to the total of all sales to a customer, or to the highest in a single sale. In the first case, it is 72. In the second case, it is 40.

    SELECT custid, SUM(qty) AS Sumqty, MAX(qty) AS Maxqty

    FROM dbo.Orders

    GROUP BY custid;

    3. "Empid with highest value sold to a single custid is 4" / "Empid with highest value sold is 3" - I thought these were both true when going through this in my head, but it turns out I was wrong. Employee 4 sells a total of $ 418.10 to customer A and nothing else. Employee sells a total of $612.45, but spread over different customers. However, I had overlooked that one of those customers is good for $499.95, so employee 3 also sold the highest to a single custid.

    SELECT empid, custid, SUM(TotalSales) AS TotalValue

    FROM dbo.Orders

    GROUP BY ROLLUP(empid, custid)

    ORDER BY TotalValue DESC;

    4. 2013 is the year with the highest quantity sold:

    SELECT YEAR(orderdate) AS SaleYear, SUM(TotalSales) AS TotalValue

    FROM dbo.Orders

    GROUP BY YEAR(orderdate);

    So, not four correct answers (as Ron thought when submitting the question), not two (as I thought when answering), but three.

    Oh, and for those who want to learn about GROUPING SETS and GROUPING_ID, here are two references:

    http://msdn.microsoft.com/en-us/library/ms177673.aspx

    http://msdn.microsoft.com/en-us/library/bb510624.aspx

    And here's a single query that uses GROUPING_SETS to find all the relevant answers. I've included a corrected GROUPING_ID as well.

    SELECT GROUPING_ID(custid,

    empid,

    YEAR(orderdate)) AS grp_id,

    empid, custid, YEAR(orderdate) AS SaleYear,

    SUM(qty) AS TotalQty, SUM(TotalSales) AS SalesValue,

    SUM(qty) AS Sumqty, MAX(qty) AS Maxqty

    FROM dbo.Orders

    GROUP BY GROUPING SETS (custid, -- Total per customer

    empid, -- Total per employee

    (custid, empid), -- Per employee / customer combination

    YEAR(orderdate)) -- Per year

    ORDER BY grp_id;

    Note that the grp_id column is a bitmap that holds a 1 in the corresponding bit for the aggregated columns, and the LAST column is 1. So the year is 1, empid 2, and custid is 4.

    The value 6 is 4 + 2 (or binary 110), meeaning that empid and custid are aggregated, but year is not - so these rows represent the total per year.

    For the totals per employee/customer combination, I want year to be aggregated and empid and customer not - so that would be 1 (binary 001).

    For the per-customer totals, I want employee and year aggregated, so I need 2 + 1 = 3 (binary 011).

    And the value 5 finally represents the per-employee totals (binary 101, or 4+1, so customer and year aggregated).

    Bottom line: using GROUPING SETS is hard. Very hard. It is a useful and very powerful instrument, but it takes a lot of time to learn to wield properly. (I think I have spent at least an hour reading up and trying things before I understood enough to be able to write this reply). Anyone who thinks they can benefit from this often-overlooked function should take the time to get to know this instrument.

    Thanks, Ron, for the question. I do not think it is a good question, but it did prompt me to read up on GROUPING SETS and GROUPING_ID, and expand my knowledge - so it is useful after all! 😉

    + 1*6.028*10^28

    (this post is going in to my PDF document collection) 🙂

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • batgirl

    SSCarpal Tunnel

    Points: 4979

    Hugo Kornelis (5/31/2013)


    I don't get it.

    +1

    The only "answer" that is even remotely clear in the results of the suggested query is that 2013 had the highest quantity sold. I get the fact that I wasn't supposed to run it to find the answers, but of course the implication is that it should provide them right?

    I do appreciate the reason to examine grouping sets. The "-1" leads me to believe that there will be more like this forthcoming. I hope for something a bit more straightforward.

  • jlennartz

    SSCommitted

    Points: 1574

    Rich Weissler (5/31/2013)


    *scratches head* >The highest quantity sold to any custid is 50

    A 72

    B 47

    C 56

    D 30

    ??

    +1

    Only 3 correct answers and I guessed wrong on the 4th. As 72 is the highest quantity sold to any custid not 50.

  • TomThomson

    SSC Guru

    Points: 104773

    Hugo Kornelis (5/31/2013)


    I don't get it.

    I did get it, in the sense that I think I .understood the tortuous misuse of the English language in the formulation of the question. Whenever highest quantity or highest value is referred to, even if it's called highest for a single customer or highest for an employee or highest for a year the question is not referring to the quantities that the English leads one to expect, but to a different set of values: only the values in a single order are to be considered, not any aggregates. I spotted that when looking at the options for one of the four categories and then answered on the basis that all the values referred to single order amounts - that got it right, but I see from your comments that some values would also have been right with the more normal interpretation of the words. I think it a pretty horrible question, but people can learn from it - how not to choose a grouping id for example (in this case watching the pattern of NULLs is a better way of identifying groups than using the badly chosen grouping id), and people who haven't come across the grouping sets concept before can at least get a view of what that is (although I would have hated to learn it from anything like this).

    First, the idea of the question. We are given SQL to create and populate a table, *and* SQL that (supposedly?) shows the correct answers. Are we supposed to run this? Work it out in our heads? On a SQL Forum, I'd much sooner expect the task to be to WRITE the query to find the correct answers!

    I'll add to that that the query is utterly pointless from the point of view of answering those questions, it would be easier to answer them from the orders table(since they all refer to single orders, not to aggregates other than MAX, which is nowhere used in the query although it's the only one relevant to choosing between the options provided). It is of course possible to answer those questions by looking at the output from that query, that is a good deal harder, in my view, that just looking at the base table.

    I agree with the rest of your comments and don't think I could add anything useful, so I won't waste everyone's time quoting them and agreeing verbosely.

    Tom

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply