Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1112131415»»»

Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5 Expand / Collapse
Author
Message
Posted Thursday, February 14, 2008 5:06 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:02 PM
Points: 35,397, Visits: 31,955
Christopher Ford (2/13/2008)
Earlier Jeff posted a question on using SUM(SAL) OVER()

Since OVER() doesn't support ORDER BY when used with an aggregation.

How about this one?

SELECT J1.AccountID,
J1.Date,
J1.Amount AS AmountThisDate,
SUM(J2.Amount) AS RunningBalance,
CAST(AVG(1.*J2.Amount) AS DECIMAL(12, 2)) AS AvgRunningBalance
FROM dbo.JBMTest AS J1
JOIN dbo.JBMTest AS J2
ON J2.AccountID = J1.AccountID
AND (J2.Date >= '01/01/2000' --Anchor Date from example
AND J2.Date <= J1.Date)
GROUP BY J1.AccountID, J1.Date, J1.Amount
ORDER BY J1.AccountID, J1.Date;

Not quite as elegant as SUM(SAL) OVER()

But I'm willing to bet that the query plans generated for both would be similiar to this query.

Credit doesn't go to me for this method. Learned this, like many other things from other people. This method gets credited to Itzik, since I learned it from one of his many articles, but it's useful in so many circumstances.

And this one works on SQL 2000. =)


Like I said in the Triangular Join Article, not all Triangular Joins are bad if they're against small sets. This one only generates about 11 million internal rows which isn't bad for a million row table. And, in the presence of a good and proper index, it only takes 00:01:24 to execute... that makes it only about 12 times slower than the UPDATE method. Sounds like I'm being sarcastic, and I'm not... that's actually pretty good for something that uses triangular joins.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #456062
Posted Thursday, February 14, 2008 6:57 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:02 PM
Points: 35,397, Visits: 31,955
Yep... tested the heck out of the CTE solution with ROW_NUMBER() for the running balance... without the clustered index on the appropriate columns, if falls apart in that the running balance starts over within the same AccountID... here's the code I used to find groups of AccountID's that have more than one "start" in a running balance. Pick any AccountID in the first query result and plug it into the second query to see what I'm talking about...


SELECT AccountID,COUNT(*) AS FalseReset
   FROM dbo.JBMTest 
  WHERE Amount = GrpBal
  GROUP BY AccountID
 HAVING COUNT(*) > 1 
  ORDER BY AccountID

SELECT * FROM JBMTEST WHERE AccountID = ???? ORDER BY ACCOUNTID,DATE



The reason it fails is because it picks up the "other" index (PK) which turns out to be a Merry-Go-Round index. This also proves what Matt was saying... Row_Number() doesn't guarantee the correct order of processing... only the the Row_Number() will be in the correct order when compared to it's own ORDER BY clause.

Too bad... that was a really nice idea...


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #456094
Posted Thursday, February 14, 2008 7:12 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:02 PM
Points: 35,397, Visits: 31,955
This has all been totally awesome! Give yourselves a huge pat on the back... 122 posts on a fairly controversial subject and no flame wars. My hat is off to all of you! :)

Second of all, 122 posts of mostly great ideas even if some of them don't quite work as expected... at least now we know what works, what doesn't work, why it does or doesn't work, and how long most of them take. You can't BUY that kind of quality testing and we've all benefited. So, give yourselves another huge pat on the back... this has been an absolutely wonderful showing by some real professionals!


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #456095
Posted Friday, February 15, 2008 1:00 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, October 23, 2014 9:52 AM
Points: 74, Visits: 450
Jeff Moden (2/14/2008)
Yep... tested the heck out of the CTE solution with ROW_NUMBER() for the running balance... without the clustered index on the appropriate columns, if falls apart in that the running balance starts over within the same AccountID... here's the code I used to find groups of AccountID's that have more than one "start" in a running balance. Pick any AccountID in the first query result and plug it into the second query to see what I'm talking about...


SELECT AccountID,COUNT(*) AS FalseReset
   FROM dbo.JBMTest 
  WHERE Amount = GrpBal
  GROUP BY AccountID
 HAVING COUNT(*) > 1 
  ORDER BY AccountID

SELECT * FROM JBMTEST WHERE AccountID = ???? ORDER BY ACCOUNTID,DATE



The reason it fails is because it picks up the "other" index (PK) which turns out to be a Merry-Go-Round index. This also proves what Matt was saying... Row_Number() doesn't guarantee the correct order of processing... only the the Row_Number() will be in the correct order when compared to it's own ORDER BY clause.

Too bad... that was a really nice idea...


Well...There's got to be a way to use the CTE and ROW_NUMBER() and guarantee an ordered update...

Other than just creating a clustered index...

Isn't there? I mean...Jeff, let's face it...several people here have had all previous beliefs shattered on what they believed the order to be. And also, several people who have been preaching this idea for a while are finally singing hallelujah praises that someone has proven this to be true...

I finally understand why SQL Server's default scripting behaviour favors inserting all your data into a new temp table, dropping your old table and renaming the tmp table your old tables name.

So, Can anyone quickly break down the very short list of how we can do a guaranteed order'd running total update to a table?

There's a treasure load of method's that could be used in many other situations that have been posted here...I'll definitely be testing them in other area's, especially the one that does the calculation in XML, that was sweet.


Christopher Ford
Post #456179
Posted Friday, February 15, 2008 1:38 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 8:09 AM
Points: 6,735, Visits: 8,495
Well it certainly started with a great article, handling a fairly common issue

And it resulted in a solution that worked a lightning speed and performed
the actual update in a single statement run.

And despite many efforts, re-confirmed the fact that there is no order in a set-based approach !

But the challenge has been set ! Provide an order-based solution that beats lightning


Johan


Don't drive faster than your guardian angel can fly ...
but keeping both feet on the ground won't get you anywhere

- How to post Performance Problems
- How to post data/code to get the best help


- How to prevent a sore throat after hours of presenting ppt ?


"press F1 for solution", "press shift+F1 for urgent solution"


Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me
Post #456187
Posted Friday, February 15, 2008 1:56 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, October 23, 2014 9:52 AM
Points: 74, Visits: 450
ALZDBA (2/15/2008)
Well it certainly started with a great article, handling a fairly common issue

And it resulted in a solution that worked a lightning speed and performed
the actual update in a single statement run.

And despite many efforts, re-confirmed the fact that there is no order in a set-based approach !

But the challenge has been set ! Provide an order-based solution that beats lightning


I know some one can figure it out. Not me this time.


Christopher Ford
Post #456195
Posted Friday, February 15, 2008 5:50 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 8:00 AM
Points: 4,410, Visits: 6,281
>>Well...There's got to be a way to use the CTE and ROW_NUMBER() and guarantee an ordered update...

Sorry, but there doesn't have to be. Just because we really wish something to be true, and logically KNOW something to be true doesn't make it so.

Review this thread and you will see a post I made long ago that mentions Itzik Ben-Gan's crusade (joined by many others) requesting Microsoft to fully implement the OVER clause that WOULD provide a true non-triangular-join set-based solution to this problem.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #456264
Posted Friday, February 15, 2008 7:45 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:02 PM
Points: 35,397, Visits: 31,955
ALZDBA (2/15/2008)
Well it certainly started with a great article, handling a fairly common issue

And it resulted in a solution that worked a lightning speed and performed
the actual update in a single statement run.

And despite many efforts, re-confirmed the fact that there is no order in a set-based approach !

But the challenge has been set ! Provide an order-based solution that beats lightning


Great summary, Johan! And, thanks for the kudo!


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #456301
Posted Friday, February 15, 2008 8:57 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, October 23, 2014 9:52 AM
Points: 74, Visits: 450
TheSQLGuru (2/15/2008)
>>Well...There's got to be a way to use the CTE and ROW_NUMBER() and guarantee an ordered update...

Sorry, but there doesn't have to be. Just because we really wish something to be true, and logically KNOW something to be true doesn't make it so.

Review this thread and you will see a post I made long ago that mentions Itzik Ben-Gan's crusade (joined by many others) requesting Microsoft to fully implement the OVER clause that WOULD provide a true non-triangular-join set-based solution to this problem.


I surrender....for now. :D


Christopher Ford
Post #456345
Posted Friday, February 15, 2008 2:36 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, December 6, 2013 8:32 AM
Points: 39, Visits: 298
OK, some fun on a slow Friday afternoon: this can indeed be implemented so that it is very, very quick without breaking any of the rules/guarantees provided by SQL.

The key is to recognize that CTE's defined as recursive queries are in fact a set-oriented tail recursion operator for SQL. This is an extremely powerful concept, and thus CTEs can be employed to do all sorts of wonderful things in a fast set-oriented manner.

Here's my code:

The source table create and data load; this takes 2:04 (mins:secs) on my machine (pretty much lifted from the original article):
CREATE TABLE dbo.JBMTest         
( RowNum INT IDENTITY (1,1) NOT NULL,
AccountID INT not NULL,
Amount MONEY not NULL,
Date DATETIME not NULL
--RunBal MONEY NULL,
--GrpBal MONEY NULL,
--RunCnt INT NULL,
--GrpCnt INT NULL
)

CREATE NONCLUSTERED INDEX IX_JBMTest_AccountID_Date --not clustered for "Merry-go-Round" test
ON dbo.JBMTest (AccountID, Date)

-- 2mins, 4secs on my machine:
--===== Build the table 100 rows at a time to "mix things up"
DECLARE @Counter INT
SET @Counter = 0
WHILE @Counter < 1000000
BEGIN --===== Add 1000 rows to the test table
INSERT INTO dbo.JBMTest
(AccountID, Amount, Date)
SELECT TOP 100
AccountID = ABS(CHECKSUM(NEWID()))%50000+1
, Amount = CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY)
, Date = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2

--===== Increment the counter
SET @Counter = @Counter + 100
END


Now the tricky CTE; we first build a table variable that sucks the whole source table out and applies a clustered index, then do the CTE query that adds the running totals:
declare @sequential as table
(
AccountID INT not NULL
, seq int not null
, Date DATETIME not NULL
, Amount MONEY not NULL
, unique clustered (AccountID, seq)
);
insert into @sequential
select
AccountID
, row_number() over(partition by AccountID order by Date) as seq
, Date
, Amount
from JBMTest;

with
summed(AccountID, seq, Date, Ammount, Running) as
(
select
sq.AccountID
, sq.seq
, sq.Date
, sq.Amount
, sq.Amount as Running
from @sequential as sq
where sq.seq = 1
UNION ALL
select
sq.AccountID
, sq.seq
, sq.Date
, sq.Amount
, sq.Amount + pri.Running as Running
from @sequential as sq
join summed as pri on
sq.AccountID = pri.AccountID
and sq.seq = pri.seq + 1
where sq.seq > 1
)
select * from summed;

The above query -- INCLUDING generating the table variable -- takes 29 seconds on my machine, and is eligible for full parallelism, etc. (Note also that I did not specify any table/index hints.) If we add a final sort (order by accountID, seq) (most likely in practice, if used for reporting), the time balloons to all of 43 seconds.....

Reviewing the query plans shows that each row is visited exactly once; the recursive CTE is trully a tail-recursion operator in SQL!


Out of curiousity, if we were to grab the current balance using a sum...over(partition) construct:
select distinct(AccountID), sum(Amount) over(partition by AccountID) as cAmount
from JBMTest

that takes 2 seconds on my machine.

-frank





The End.
Post #456536
« Prev Topic | Next Topic »

Add to briefcase «««1112131415»»»

Permissions Expand / Collapse