Log in  ::  Register  ::  Not logged in

 Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5 Rate Topic Display Mode Topic Options
Author
 Message
 Posted Thursday, February 14, 2008 5:06 PM
 SSC-Dedicated Group: General Forum Members Last Login: Yesterday @ 4:21 PM Points: 34,537, Visits: 28,701
 Christopher Ford (2/13/2008)Earlier Jeff posted a question on using SUM(SAL) OVER()Since OVER() doesn't support ORDER BY when used with an aggregation. How about this one?`SELECT J1.AccountID, J1.Date, J1.Amount AS AmountThisDate, SUM(J2.Amount) AS RunningBalance, CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalanceFROM dbo.JBMTest AS J1 JOIN dbo.JBMTest AS J2 ON J2.AccountID = J1.AccountID AND (J2.Date >= '01/01/2000' --Anchor Date from example AND J2.Date <= J1.Date)GROUP BY J1.AccountID, J1.Date, J1.AmountORDER BY J1.AccountID, J1.Date;`Not quite as elegant as SUM(SAL) OVER()But I'm willing to bet that the query plans generated for both would be similiar to this query.Credit doesn't go to me for this method. Learned this, like many other things from other people. This method gets credited to Itzik, since I learned it from one of his many articles, but it's useful in so many circumstances.And this one works on SQL 2000. =)Like I said in the Triangular Join Article, not all Triangular Joins are bad if they're against small sets. This one only generates about 11 million internal rows which isn't bad for a million row table. And, in the presence of a good and proper index, it only takes 00:01:24 to execute... that makes it only about 12 times slower than the UPDATE method. Sounds like I'm being sarcastic, and I'm not... that's actually pretty good for something that uses triangular joins. --Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." "Change is inevitable. Change for the better is not." -- 04 August 2013(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013Helpful Links:How to post code problemsHow to post performance problems
Post #456062
 Posted Thursday, February 14, 2008 6:57 PM
 SSC-Dedicated Group: General Forum Members Last Login: Yesterday @ 4:21 PM Points: 34,537, Visits: 28,701
 Yep... tested the heck out of the CTE solution with ROW_NUMBER() for the running balance... without the clustered index on the appropriate columns, if falls apart in that the running balance starts over within the same AccountID... here's the code I used to find groups of AccountID's that have more than one "start" in a running balance. Pick any AccountID in the first query result and plug it into the second query to see what I'm talking about... SELECT AccountID,COUNT(*) AS FalseReset   FROM dbo.JBMTest   WHERE Amount = GrpBal  GROUP BY AccountID HAVING COUNT(*) > 1   ORDER BY AccountIDSELECT * FROM JBMTEST WHERE AccountID = ???? ORDER BY ACCOUNTID,DATEThe reason it fails is because it picks up the "other" index (PK) which turns out to be a Merry-Go-Round index. This also proves what Matt was saying... Row_Number() doesn't guarantee the correct order of processing... only the the Row_Number() will be in the correct order when compared to it's own ORDER BY clause.Too bad... that was a really nice idea... --Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." "Change is inevitable. Change for the better is not." -- 04 August 2013(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013Helpful Links:How to post code problemsHow to post performance problems
Post #456094
 Posted Thursday, February 14, 2008 7:12 PM
 SSC-Dedicated Group: General Forum Members Last Login: Yesterday @ 4:21 PM Points: 34,537, Visits: 28,701
 This has all been totally awesome! Give yourselves a huge pat on the back... 122 posts on a fairly controversial subject and no flame wars. My hat is off to all of you! :)Second of all, 122 posts of mostly great ideas even if some of them don't quite work as expected... at least now we know what works, what doesn't work, why it does or doesn't work, and how long most of them take. You can't BUY that kind of quality testing and we've all benefited. So, give yourselves another huge pat on the back... this has been an absolutely wonderful showing by some real professionals! --Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." "Change is inevitable. Change for the better is not." -- 04 August 2013(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013Helpful Links:How to post code problemsHow to post performance problems
Post #456095
 Posted Friday, February 15, 2008 1:00 AM
 Valued Member Group: General Forum Members Last Login: Friday, November 15, 2013 10:05 PM Points: 74, Visits: 442
 Jeff Moden (2/14/2008)Yep... tested the heck out of the CTE solution with ROW_NUMBER() for the running balance... without the clustered index on the appropriate columns, if falls apart in that the running balance starts over within the same AccountID... here's the code I used to find groups of AccountID's that have more than one "start" in a running balance. Pick any AccountID in the first query result and plug it into the second query to see what I'm talking about... SELECT AccountID,COUNT(*) AS FalseReset   FROM dbo.JBMTest   WHERE Amount = GrpBal  GROUP BY AccountID HAVING COUNT(*) > 1   ORDER BY AccountIDSELECT * FROM JBMTEST WHERE AccountID = ???? ORDER BY ACCOUNTID,DATEThe reason it fails is because it picks up the "other" index (PK) which turns out to be a Merry-Go-Round index. This also proves what Matt was saying... Row_Number() doesn't guarantee the correct order of processing... only the the Row_Number() will be in the correct order when compared to it's own ORDER BY clause.Too bad... that was a really nice idea...Well...There's got to be a way to use the CTE and ROW_NUMBER() and guarantee an ordered update...Other than just creating a clustered index...Isn't there? I mean...Jeff, let's face it...several people here have had all previous beliefs shattered on what they believed the order to be. And also, several people who have been preaching this idea for a while are finally singing hallelujah praises that someone has proven this to be true...I finally understand why SQL Server's default scripting behaviour favors inserting all your data into a new temp table, dropping your old table and renaming the tmp table your old tables name.So, Can anyone quickly break down the very short list of how we can do a guaranteed order'd running total update to a table?There's a treasure load of method's that could be used in many other situations that have been posted here...I'll definitely be testing them in other area's, especially the one that does the calculation in XML, that was sweet. Christopher Ford
Post #456179
 Posted Friday, February 15, 2008 1:38 AM
 SSCertifiable Group: General Forum Members Last Login: Yesterday @ 11:54 PM Points: 6,982, Visits: 8,359
 Well it certainly started with a great article, handling a fairly common issue And it resulted in a solution that worked a lightning speed and performedthe actual update in a single statement run. And despite many efforts, re-confirmed the fact that there is no order in a set-based approach !But the challenge has been set ! Provide an order-based solution that beats lightning JohanDon't drive faster than your guardian angel can fly ...but keeping both feet on the ground won't get you anywhere - How to post Performance Problems- How to post data/code to get the best help- How to prevent a sore throat after hours of presenting ppt ?"press F1 for solution", "press shift+F1 for urgent solution" Need a bit of Powershell? How about thisWho am I ? Sometimes this is me but most of the time this is me
Post #456187
 Posted Friday, February 15, 2008 1:56 AM
 Valued Member Group: General Forum Members Last Login: Friday, November 15, 2013 10:05 PM Points: 74, Visits: 442
 ALZDBA (2/15/2008)Well it certainly started with a great article, handling a fairly common issue And it resulted in a solution that worked a lightning speed and performedthe actual update in a single statement run. And despite many efforts, re-confirmed the fact that there is no order in a set-based approach !But the challenge has been set ! Provide an order-based solution that beats lightning I know some one can figure it out. Not me this time. Christopher Ford
Post #456195
 Posted Friday, February 15, 2008 5:50 AM
 Hall of Fame Group: General Forum Members Last Login: Yesterday @ 8:46 PM Points: 3,925, Visits: 5,536
 >>Well...There's got to be a way to use the CTE and ROW_NUMBER() and guarantee an ordered update...Sorry, but there doesn't have to be. Just because we really wish something to be true, and logically KNOW something to be true doesn't make it so. Review this thread and you will see a post I made long ago that mentions Itzik Ben-Gan's crusade (joined by many others) requesting Microsoft to fully implement the OVER clause that WOULD provide a true non-triangular-join set-based solution to this problem. Best,Kevin G. BolesSQL Server ConsultantSQL MVP 2007-2012TheSQLGuru at GMail
Post #456264
 Posted Friday, February 15, 2008 7:45 AM
 SSC-Dedicated Group: General Forum Members Last Login: Yesterday @ 4:21 PM Points: 34,537, Visits: 28,701
 ALZDBA (2/15/2008)Well it certainly started with a great article, handling a fairly common issue And it resulted in a solution that worked a lightning speed and performedthe actual update in a single statement run. And despite many efforts, re-confirmed the fact that there is no order in a set-based approach !But the challenge has been set ! Provide an order-based solution that beats lightning Great summary, Johan! And, thanks for the kudo! --Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." "Change is inevitable. Change for the better is not." -- 04 August 2013(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013Helpful Links:How to post code problemsHow to post performance problems
Post #456301
 Posted Friday, February 15, 2008 8:57 AM
 Valued Member Group: General Forum Members Last Login: Friday, November 15, 2013 10:05 PM Points: 74, Visits: 442
 TheSQLGuru (2/15/2008)>>Well...There's got to be a way to use the CTE and ROW_NUMBER() and guarantee an ordered update...Sorry, but there doesn't have to be. Just because we really wish something to be true, and logically KNOW something to be true doesn't make it so. Review this thread and you will see a post I made long ago that mentions Itzik Ben-Gan's crusade (joined by many others) requesting Microsoft to fully implement the OVER clause that WOULD provide a true non-triangular-join set-based solution to this problem.I surrender....for now. :D Christopher Ford
Post #456345
 Posted Friday, February 15, 2008 2:36 PM
 SSC Rookie Group: General Forum Members Last Login: Monday, October 28, 2013 8:28 AM Points: 39, Visits: 297
 OK, some fun on a slow Friday afternoon: this can indeed be implemented so that it is very, very quick without breaking any of the rules/guarantees provided by SQL.The key is to recognize that CTE's defined as recursive queries are in fact a set-oriented tail recursion operator for SQL. This is an extremely powerful concept, and thus CTEs can be employed to do all sorts of wonderful things in a fast set-oriented manner.Here's my code:The source table create and data load; this takes 2:04 (mins:secs) on my machine (pretty much lifted from the original article):`CREATE TABLE dbo.JBMTest ( RowNum INT IDENTITY (1,1) NOT NULL, AccountID INT not NULL, Amount MONEY not NULL, Date DATETIME not NULL --RunBal MONEY NULL, --GrpBal MONEY NULL, --RunCnt INT NULL, --GrpCnt INT NULL ) CREATE NONCLUSTERED INDEX IX_JBMTest_AccountID_Date --not clustered for "Merry-go-Round" test ON dbo.JBMTest (AccountID, Date)-- 2mins, 4secs on my machine:--===== Build the table 100 rows at a time to "mix things up"DECLARE @Counter INT SET @Counter = 0 WHILE @Counter < 1000000 BEGIN --===== Add 1000 rows to the test table INSERT INTO dbo.JBMTest (AccountID, Amount, Date) SELECT TOP 100 AccountID = ABS(CHECKSUM(NEWID()))%50000+1, Amount = CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY), Date = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) FROM Master.dbo.SysColumns t1 CROSS JOIN Master.dbo.SysColumns t2 --===== Increment the counter SET @Counter = @Counter + 100 END`Now the tricky CTE; we first build a table variable that sucks the whole source table out and applies a clustered index, then do the CTE query that adds the running totals:`declare @sequential as table( AccountID INT not NULL , seq int not null , Date DATETIME not NULL , Amount MONEY not NULL , unique clustered (AccountID, seq));insert into @sequential select AccountID , row_number() over(partition by AccountID order by Date) as seq , Date , Amount from JBMTest;with summed(AccountID, seq, Date, Ammount, Running) as( select sq.AccountID , sq.seq , sq.Date , sq.Amount , sq.Amount as Running from @sequential as sq where sq.seq = 1 UNION ALL select sq.AccountID , sq.seq , sq.Date , sq.Amount , sq.Amount + pri.Running as Running from @sequential as sq join summed as pri on sq.AccountID = pri.AccountID and sq.seq = pri.seq + 1 where sq.seq > 1)select * from summed;`The above query -- INCLUDING generating the table variable -- takes 29 seconds on my machine, and is eligible for full parallelism, etc. (Note also that I did not specify any table/index hints.) If we add a final sort (order by accountID, seq) (most likely in practice, if used for reporting), the time balloons to all of 43 seconds.....Reviewing the query plans shows that each row is visited exactly once; the recursive CTE is trully a tail-recursion operator in SQL!Out of curiousity, if we were to grab the current balance using a sum...over(partition) construct:`select distinct(AccountID), sum(Amount) over(partition by AccountID) as cAmountfrom JBMTest`that takes 2 seconds on my machine.-frank The End.
Post #456536

 Permissions

 Copyright © 2002-2013 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use. Report Abuse.