SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Generating SubTotals using GROUPING


Generating SubTotals using GROUPING

Author
Message
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)

Group: General Forum Members
Points: 203404 Visits: 41949
Hugo,

The gain is that there is nothing you can naturally order by that will give you the correct original order. ;-) And stop playing the "gotcha back" tricks... can you make the natural sort fail or not?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Hugo Kornelis
Hugo Kornelis
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18201 Visits: 12426
Jeff Moden (7/19/2010)
Hugo,

The gain is that there is nothing you can naturally order by that will give you the correct original order. ;-)

Huh?
USE AdventureWorks2008;
SELECT TerritoryID, GROUPING(TerritoryID),
SalesPersonID, GROUPING(SalesPersonID),
SUM(TotalDue) AS SumOfTotalDue
FROM Sales.SalesOrderHeader
GROUP BY ROLLUP(TerritoryID, SalesPersonID)
ORDER BY GROUPING(TerritoryID), TerritoryID,
GROUPING(SalesPersonID), SalesPersonID;



And stop playing the "gotcha back" tricks... can you make the natural sort fail or not?

Whether or not I can is completely irrelevant. I advise people to include the ORDER BY, because it guarantees correct results. You advise people to omit the ORDER BY clause; in my opinion that places the burden on you to prove that you will never need it. Not even after installing a service pack or upgrading to the next version of SQL Server. Can you guarantee that?
Heck, can you even guarantee that this will always work on the current version of SQL Server? I'm on holiday, with only a simple laptop computer, so my testing capacity is limited. But imagine a SQL Server instance on a computer with 16 cores, with Sales.SalesOrderHeader partitioned by TerritoryID and spread over 16 seperate spindles, and with the amount of data in that table bumped to several billion rows. I would expect (and, in fact, even HOPE) that each core gets to process the data for a single partition. Without the ORDER BY, the data will be returned as soon as it's ready. Why would SQL Server wait for core #1 to finish if core #5 already has some data ready to be returned?

Again - if you advise people to rely on undocumented behaviour, the burden is on you to prove that it will work in all cases. And that it will continue to work in the future. If you can't, and you still keep giving the same advise, you are repeating the mistake many people made in the days of SQL Server 6.5 when they told people to omit the ORDER BY after a GROUP BY.


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
R.P.Rozema
R.P.Rozema
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2690 Visits: 1716
Guys, this topic was about the article on the usage of grouping() but you've moved away to wether or not to add a sort clause now. I feel a little guilty for that because I first made a note on the sorting. You've both made your points and since neither can give proof there's no need to keep poluting the topic any more. I personally totally see Hugo's point, so I'll keep on adding the order by clause. Even though I totally respect -if not live by- Jeff's advices normally. Since the sorting is undocumented behavior (unless someone else has a link?) I think it's up to each and everyone to decide for themselves whether or not their solution should rely on it and I don't see any reason not to add the order by clause. If you want to continue this debate, can you please start a separate topic for it?



Posting Data Etiquette - Jeff Moden
Posting Performance Based Questions - Gail Shaw
Hidden RBAR - Jeff Moden
Cross Tabs and Pivots - Jeff Moden
Catch-all queries - Gail Shaw


If you don't have time to do it right, when will you have time to do it over?
john.arnott
john.arnott
Hall of Fame
Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)

Group: General Forum Members
Points: 3466 Visits: 3059
Gosh, I hope I don't get my shoes wet by stepping near this, but.....

The original script in the article absolutely needs an ORDER BY to properly interleave the two halves of the UNION ALL statement (the detail rows and the aggregated rows).

Also, BOL is quite clear in saying that GROUP BY results are not guaranteed to be in any particular order and explicitly prescribes an ORDER BY clause if the order is important. With no direction from MS either way on queries using WITH ROLLUP or ROLLUP(), I'd have to agree that this advice would best be followed in those cases as well.

-----
edit:
Sorry, R.P. I saw your post only after hitting send on mine.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)

Group: General Forum Members
Points: 203404 Visits: 41949
Hugo Kornelis (7/19/2010)
Jeff Moden (7/19/2010)
Hugo,

The gain is that there is nothing you can naturally order by that will give you the correct original order. ;-)

Huh?
USE AdventureWorks2008;
SELECT TerritoryID, GROUPING(TerritoryID),
SalesPersonID, GROUPING(SalesPersonID),
SUM(TotalDue) AS SumOfTotalDue
FROM Sales.SalesOrderHeader
GROUP BY ROLLUP(TerritoryID, SalesPersonID)
ORDER BY GROUPING(TerritoryID), TerritoryID,
GROUPING(SalesPersonID), SalesPersonID;



And stop playing the "gotcha back" tricks... can you make the natural sort fail or not?

Whether or not I can is completely irrelevant. I advise people to include the ORDER BY, because it guarantees correct results. You advise people to omit the ORDER BY clause; in my opinion that places the burden on you to prove that you will never need it. Not even after installing a service pack or upgrading to the next version of SQL Server. Can you guarantee that?
Heck, can you even guarantee that this will always work on the current version of SQL Server? I'm on holiday, with only a simple laptop computer, so my testing capacity is limited. But imagine a SQL Server instance on a computer with 16 cores, with Sales.SalesOrderHeader partitioned by TerritoryID and spread over 16 seperate spindles, and with the amount of data in that table bumped to several billion rows. I would expect (and, in fact, even HOPE) that each core gets to process the data for a single partition. Without the ORDER BY, the data will be returned as soon as it's ready. Why would SQL Server wait for core #1 to finish if core #5 already has some data ready to be returned?

Again - if you advise people to rely on undocumented behaviour, the burden is on you to prove that it will work in all cases. And that it will continue to work in the future. If you can't, and you still keep giving the same advise, you are repeating the mistake many people made in the days of SQL Server 6.5 when they told people to omit the ORDER BY after a GROUP BY.


I humbly stand corrected. That sort works. Well done, Hugo. Still, the order of run return is, in fact, documented in Books Online...

ROLLUP
Specifies that in addition to the usual rows provided by GROUP BY, summary rows are introduced into the result set. Groups are summarized in a hierarchical order, from the lowest level in the group to the highest. The group hierarchy is determined by the order in which the grouping columns are specified. Changing the order of the grouping columns can affect the number of rows produced in the result set.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)SSC Guru (203K reputation)

Group: General Forum Members
Points: 203404 Visits: 41949
R.P.Rozema (7/19/2010)
Guys, this topic was about the article on the usage of grouping() but you've moved away to wether or not to add a sort clause now. I feel a little guilty for that because I first made a note on the sorting. You've both made your points and since neither can give proof there's no need to keep poluting the topic any more. I personally totally see Hugo's point, so I'll keep on adding the order by clause. Even though I totally respect -if not live by- Jeff's advices normally. Since the sorting is undocumented behavior (unless someone else has a link?) I think it's up to each and everyone to decide for themselves whether or not their solution should rely on it and I don't see any reason not to add the order by clause. If you want to continue this debate, can you please start a separate topic for it?


The sorting is not undocumented... and that was my point. ;-) And, I am truly humbled by your comment. Thank you.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Hugo Kornelis
Hugo Kornelis
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18201 Visits: 12426
My final comment in this topic, and then I'll respect R.P.'s request.

Jeff Moden (7/19/2010)
Still, the order of run return is, in fact, documented in Books Online...

ROLLUP
Specifies that in addition to the usual rows provided by GROUP BY, summary rows are introduced into the result set. Groups are summarized in a hierarchical order, from the lowest level in the group to the highest. The group hierarchy is determined by the order in which the grouping columns are specified. Changing the order of the grouping columns can affect the number of rows produced in the result set.


This quote describes which possible summary rows are and are not introduced in the result set, not the order in which results are returned.

(I'm also tempted to point out the use of the term "result set" rather than "recordset" in this quote, but the terminology in BOL as a whole is so often wrong that I'd probably better not go there)

Jeff, or anyone else - if you want to continue this debate, then please start a new topic and send me a PM with the link, or include the link in this topic. Without a link, I'll probably never find the new topic.


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
witasj
witasj
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 6
The result set I get is not the same as the result set posted in the article. Can you repost the code to populate the tables? I had a DBA run it here at my company and he got the same results that I got.

First 10 rows:
'Product1', 'Scheme1', 1524.15, 124.32, '2010-07-20 16:39:32'
'Product1', 'Scheme2', 1524.15, 124.32, '2010-07-20 16:39:32'
'Product1', 'Scheme3', 1524.15, 124.32, '2010-07-20 16:39:32'
'Product1', 'Scheme4', 1524.15, 124.32, '2010-07-20 16:39:32'
'Product1', 'Scheme5', 1524.15, 124.32, '2010-07-20 16:39:32'
'Product1', 'Scheme1', 9144.90, 745.92, '2010-07-20 16:39:32'
'Product1', 'Scheme2', 4572.45, 372.96, '2010-07-20 16:39:32'
'Product1', 'Scheme3', 3048.30, 248.64, '2010-07-20 16:39:32'
'Product1', 'Scheme4', 3048.30, 248.64, '2010-07-20 16:39:32'
'Product1', 'Scheme5', 3048.30, 248.64, '2010-07-20 16:39:32'
john.arnott
john.arnott
Hall of Fame
Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)

Group: General Forum Members
Points: 3466 Visits: 3059
witasj,

I see the same contents in @Grouptest as you do.

It looks as though the loaded table shown in the article may not be from the final version of the code to build it, but the results with aggregated subtotals do seem to match up to what's shown.
john.arnott
john.arnott
Hall of Fame
Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)

Group: General Forum Members
Points: 3466 Visits: 3059
Here one question comes in the mind that why only one row is indicated by 1? The Scheme_Total and Product_Total is also aggregated but it is not indicated by 1. I have not yet received any satisfactory answer for this. I will post the answer here once I get it.


I tried to answer this question from the article last week, and realized today that I was wrong in my explanation. I've corrected that post above.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search