Help writing a SQL

Question

Help writing a SQL

Viewing 11 posts - 61 through 70 (of 70 total)

You must be logged in to reply to this topic. Login to reply

Garadin One Orange Chip Points: 29613 More actions · Answer 1

While updating the view is an interesting addition, it does not necessarily prove anything in regards to the 3 part update. Your examples use only a 2 part update. I tested this view creation/2 part update method (ordered by acct, ndt) against the unindexed 2 million row result set. Here are the results.

2 Part Update method on View Created from Unindexed data

(2000000 row(s) affected)

(1 row(s) affected)

Update Done. Time Elapsed: 264 seconds

(64425 row(s) affected)

(1 row(s) affected)

Select to Table Done. Total Time Elapsed: 266 seconds

(1 row(s) affected)

Final Row Count, Two Part Update Method:64425

While the correct results are returned, the speed of the update is gone. I also tested the same code on the original result set which had a clustered index of ndt, acct (opposite of what is needed). The results are what I expected.

2 Part Update method on View Created from data with a different clustered index

(2000000 row(s) affected)

(1 row(s) affected)

Update Done. Time Elapsed: 43 seconds

(1780556 row(s) affected)

(1 row(s) affected)

Select to Table Done. Total Time Elapsed: 48 seconds

(1 row(s) affected)

Final Row Count, Two Part Update Method:1780556

Note that this is the result set that the original tests ran against(albeit with a different clustered index), and should have produced 64202 records. The reason it produced so many more is because it fell back to using a clustered index update, not the order by specified in the view. I've attached the execution plan for the last test, but I modified teh base scripts several times and don't have all the backing on this one. (Don't have time at the moment to re-run all of these and produce execution plans/scripts/test data setups, but may be able to later if you'd like to see it.)

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 2

Dave Ballantyne (9/29/2009)
Garadin (9/28/2009)
Also, if you decide not to take Jeff's word for it and want to test this yourself, make sure you don't insert the data in the exact order it needs to be in.
In my little test scripts i did post the data in a 'random' order and then created the clustered index so it was in a different order. 🙂
Anyway Jeff , im sure you've been there done it and got the T-Shirt 😀 and look forward to your completed document , but , i really am struggling to get incorrect results. Ive scaled up by test to be on adventureworks and no matter what order or filtering i use then my Cte-Quirky update returns the same as my 'control' cursor. Must get on with some 'real' work now 😉
Drop table #Balance
go
drop table #CurBalance
go
drop index Sales.SalesOrderHeader.idxOrderDate
go
create index idxOrderDate on Sales.SalesOrderHeader(Orderdate) include(SalesOrderId,SalesPersonId,subtotal,taxamt,freight)
go
create index idxSalesPerson on Sales.SalesOrderHeader(SalesPersonId) include(SalesOrderId,OrderDate,subtotal,taxamt,freight)
go
Create Table #Balance(
SalesOrderId integer,
RollingBalance money
)
Create Table #CurBalance(
SalesOrderId integer,
RollingBalance money
)
insert into #Balance(SalesOrderId,RollingBalance)
select SalesOrderID,NULL
from Sales.SalesOrderHeader SOH
join Sales.SalesPerson SP
on SP.SalesPersonID = SOH.SalesPersonID
join Sales.SalesTerritory ST
on St.TerritoryID = SP.TerritoryID
where OrderDate between '01jan03' and '01may03'
-- and SOH.SalesPersonID in(276 ,277)
-- CountryRegionCode = 'CA'
order by OrderDate, SOH.SalesPersonID
--order by SOH.SalesPersonID ,OrderDate
--Order by st.TerritoryID,OrderDate
--Order by st.CountryRegionCode,OrderDate
Declare @OrderYear integer,
@OrderMonth integer,
@SalesPersonId integer,
@TerritoryId integer,
@CountryRegionCode char(2),
@RollingBalance money
Select @OrderYear = 0
Select @OrderMonth = 0
Select @RollingBalance = 0
Select @SalesPersonId =0
;with cteValue(SalesOrderId,OrderDate,SalesPersonId, TotalDue ,RollingBalance,TerritoryID , CountryRegionCode)
as(
select top 99999999999 SOH.SalesOrderID,OrderDate,SOH.SalesPersonID,TotalDue,RollingBalance, SP.TerritoryID,CountryRegionCode
from Sales.SalesOrderHeader SOH
join Sales.SalesPerson SP
on SP.SalesPersonID = SOH.SalesPersonID
join Sales.SalesTerritory ST
on St.TerritoryID = SP.TerritoryID
join #Balance
on #Balance.SalesOrderId = SOH.SalesOrderID
where OrderDate between '01jan03' and '01may03'
-- and SOH.SalesPersonID in(276 ,277)
-- CountryRegionCode ='CA'
--order by OrderDate, SOH.SalesPersonID
order by SOH.SalesPersonID ,OrderDate
--Order by st.TerritoryID,OrderDate
-- Order by st.CountryRegionCode,OrderDate
)
update cteValue
set @RollingBalance = case when @SalesPersonId <> cteValue.SalesPersonId or
@OrderMonth <> DATEPART(mm,OrderDate) or
@OrderYear <> DATEPART(yy,OrderDate) or
@TerritoryId <> TerritoryId
then TotalDue
else @RollingBalance +TotalDue end,
RollingBalance = @RollingBalance,
@OrderYear = DATEPART(yy,OrderDate),
@OrderMonth = DATEPART(mm,OrderDate),
@SalesPersonId = cteValue.SalesPersonId,
@TerritoryId = cteValue.TerritoryID,
@CountryRegionCode = cteValue.CountryRegionCode
go
Declare @OrderYear integer,
@OrderMonth integer,
@SalesPersonId integer,
@TerritoryId integer,
@CountryRegionCode char(2),
@RollingBalance money,
@PrevOrderYear integer,
@PrevOrderMonth integer,
@PrevSalesPersonId integer,
@PrevTerritoryId integer,
@PrevCountryRegionCode char(2),
@SalesOrderId integer,
@OrderDate datetime,
@TotalDue money
Select @OrderYear = 0,
@OrderMonth = 0,
@RollingBalance = 0,
@SalesPersonId =0,
@PrevOrderYear = 0,
@PrevOrderMonth = 0,
@PrevSalesPersonId =0,
@PrevTerritoryId =0,
@PrevCountryRegionCode='XX'
declare balancecur cursor for
select SOH.SalesOrderID,OrderDate,SOH.SalesPersonID,TotalDue, SP.TerritoryID,CountryRegionCode
from Sales.SalesOrderHeader SOH
join Sales.SalesPerson SP
on SP.SalesPersonID = SOH.SalesPersonID
join Sales.SalesTerritory ST
on St.TerritoryID = SP.TerritoryID
where OrderDate between '01jan03' and '01may03'
--and SOH.SalesPersonID in(276 ,277)
--CountryRegionCode ='CA'
--order by OrderDate, SOH.SalesPersonID
order by SOH.SalesPersonID ,OrderDate
--Order by st.TerritoryID,OrderDate
--Order by st.CountryRegionCode,OrderDate
open balancecur
while(0=0) begin
fetch next from balancecur into @SalesOrderId,@OrderDate,@SalesPersonId,@TotalDue,@TerritoryId,@CountryRegionCode
if(@@FETCH_STATUS<>0) break
Select @OrderMonth = DATEPART(mm,@OrderDate),
@OrderYear = DATEPART(yy,@OrderDate)
if( @OrderMonth <> @PrevOrderMonth or
@OrderYear <> @PrevOrderYear or
@TerritoryId <> @PrevTerritoryId or
@CountryRegionCode <> @PrevCountryRegionCode or
@SalesPersonId <> @PrevSalesPersonId
) begin
Select @RollingBalance = 0,
@PrevOrderMonth = @OrderMonth,
@PrevOrderYear = @OrderYear,
@PrevSalesPersonId = @SalesPersonId,
@PrevCountryRegionCode = @CountryRegionCode ,
@PrevTerritoryId = @TerritoryId
end
Select @RollingBalance = @RollingBalance + @TotalDue
insert into #CurBalance(SalesOrderId,RollingBalance)
values(@SalesOrderId,@RollingBalance)
end
close balancecur
deallocate balancecur
go
select COUNT(*) from #CurBalance
go
select COUNT(*) from #Balance
go
select COUNT(*) from #CurBalance join #Balance
on #CurBalance.SalesOrderId = #Balance.SalesOrderId
and #CurBalance.RollingBalance= #Balance.RollingBalance

I'm not sure why you think I have the time to read that much undocumented code to try to figure out what results you're actually expecting.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 3

Dave,

Ok... despite what I said about the length of undocumented code, I actually did take the time to check it out because of the importance of the concept. And, I ran the code and as you pointed out, the running balances work correctly. That's part of the problem that I'm talking about... for small numbers of rows, it frequently appears to work correctly. The problem occurs if you get a "merry-go-round" index which is an index that starts somewhere in the middle and wraps around to the beginning after hitting the end.

However, I'm always willing to learn new things. See if you can make an "ordered update" using a CTE work on the following. Please see the note in the code that says "Add any indexes you want here...". If you can make it work without error in the running balance, then I guess I'll need to revisit some things concerning a running balance. Thanks.

/*************************************************************************************

Create the test table with a non-clustered Primary Key and a separate clustered index

This code has been tested in SQL Server 2000 and 2005.

*************************************************************************************/

--===== Do this testing in a nice, "safe" place that everyone has

USE TempDB

GO

--===== If the test table already exists, drop it in case we need to rerun.

-- The 3 part naming is overkill, but prevents accidents on real tables.

IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL

DROP TABLE TempDB.dbo.TransactionDetail

GO

--===== Create the test table (TransactionDetail) with a NON clustered PK

CREATE TABLE dbo.TransactionDetail

(

TransactionDetailID INT IDENTITY(1,1),

Date DATETIME,

AccountID INT,

Amount MONEY,

AccountRunningTotal MONEY --Running total across each account

)

GO

-- Add any indexes you want here... before the data population below

GO

/*************************************************************************************

Populate the table using a rather slow method but one that's sure to cause lots of

Page splits and that will fragment the table with over 99% fragmentation.

*************************************************************************************/

--===== Preset the environment for appearance and speed

SET NOCOUNT ON

--===== Populate the table in "segments" to force page splits.

-- Normally this would NOT have a While loop in it.

WHILE (ISNULL(IDENT_CURRENT('TransactionDetail'),0)) < 1000000

BEGIN

INSERT INTO dbo.TransactionDetail

(Date, AccountID, Amount)

SELECT TOP 10000

--10 years worth of dates with times from 1/1/2000 to 12/31/2009

CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,

--Just one AccountID

275 AS AccountID,

--Dollar amounts from -99.99 to + 99.99

CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY) AS Amount

FROM Master.dbo.SysColumns sc1

CROSS JOIN Master.dbo.SysColumns sc2

END

GO

--===== Verify the row count

SELECT COUNT(*) FROM dbo.TransactionDetail

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110206 More actions · Answer 4

CREATE CLUSTERED INDEX TransactionDetail_AccountDate ON dbo.TransactionDetail(AccountID, [Date])

GO

DECLARE @AccountID INT, @AccountRunningTotal MONEY

UPDATE T

SET @AccountRunningTotal = AccountRunningTotal

= Amount + CASE

WHEN @AccountID = AccountID

THEN @AccountRunningTotal

ELSE 0

END,

@AccountID = AccountID

FROM dbo.TransactionDetail T

WITH (INDEX (TransactionDetail_AccountDate)) -- Forcing following the order in the index

SELECT * FROM dbo.TransactionDetail

ORDER BY AccountID, [Date]

Worked for me...

_____________
Code for TallyGenerator

Dave Ballantyne SSC-Dedicated Points: 33566 More actions · Answer 5

Apologies for the big lump of code before , kind of obvious that everyone else hasn't gone through the same process i have to reach that point.

Just to recap.

The two things that bother me about the quirky updates are:

1) The clustered index is required in the order the update is to happen.

2) The additional column that is also required.

These may be impractical for any variety of reasons.

What i have attempted to do is use a cte / view to attempt to solve these issues.

In the case of issue 1 then a cte is used to join to a temp table to hold the running total and both issues use the ordering within a cte to resolve issue 2.

Its a fail big time (only tested with 2005 9.00.4207.00 (X64) ) but it may spark a fresh idea with someone else

The below code contains 4 Tests , Test 1 attempts to solve issues 1 and 2, and the others attempt to solve problem 1 in isolation.

Test 1 fails , 2 and 3 pass but 4 fails because the clustered index update that seth was seeing in his attempt of using a view.

/*************************************************************************************

Create the test table with a non-clustered Primary Key and a separate clustered index

This code has been tested in SQL Server 2000 and 2005.

*************************************************************************************/

--===== Do this testing in a nice, "safe" place that everyone has

USE TempDB

GO

--===== If the test table already exists, drop it in case we need to rerun.

-- The 3 part naming is overkill, but prevents accidents on real tables.

IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL

DROP TABLE TempDB.dbo.TransactionDetail

GO

--===== Create the test table (TransactionDetail) with a NON clustered PK

CREATE TABLE dbo.TransactionDetail

(

TransactionDetailID INT IDENTITY(1,1),

Date DATETIME,

AccountID INT,

Amount MONEY,

AccountRunningTotal MONEY --Running total across each account

)

GO

-- Add any indexes you want here... before the data population below

GO

/*************************************************************************************

Populate the table using a rather slow method but one that's sure to cause lots of

Page splits and that will fragment the table with over 99% fragmentation.

*************************************************************************************/

--===== Preset the environment for appearance and speed

SET NOCOUNT ON

--===== Populate the table in "segments" to force page splits.

-- Normally this would NOT have a While loop in it.

WHILE (ISNULL(IDENT_CURRENT('TransactionDetail'),0)) < 1000000

BEGIN

INSERT INTO dbo.TransactionDetail

(Date, AccountID, Amount)

SELECT TOP 10000

--10 years worth of dates with times from 1/1/2000 to 12/31/2009

CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,

--Just one AccountID

275 AS AccountID,

--Dollar amounts from -99.99 to + 99.99

CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY) AS Amount

FROM Master.dbo.SysColumns sc1

CROSS JOIN Master.dbo.SysColumns sc2

END

GO

--===== Verify the row count

SELECT COUNT(*) FROM dbo.TransactionDetail

go

-- GOALS -- A) Find a quirky update method not reliant upon having a clustered index in the required order

-- B) Find a quirky update method not reliant upon having a extra column on the base table - MultiUser issues

go

-- Build the control running totals using a cursor for comparison

Drop table #CurTransactionRunningTot

go

Create Table #CurTransactionRunningTot(

TransactionDetailId integer not null,

RunningTotal money not null

)

Declare @JulianDate integer,

@PrevJulianDate integer,

@Amount money,

@RunningTotal Money,

@TransactionDetailId integer

Select @PrevJulianDate = -1

Declare CursorControl Cursor for

Select TransactionDetailId,convert(integer,DATEDIFF(DAY,0,Date)),Amount

from TransactionDetail

order by Date,TransactionDetailID

open CursorControl

while(0=0) begin

Fetch next from CursorControl into @TransactionDetailId,@JulianDate,@Amount

if(@@Fetch_Status <>0) break

If(@JulianDate <> @PrevJulianDate) begin

Select @PrevJulianDate = @JulianDate

Select @RunningTotal =0

end

select @RunningTotal = @RunningTotal+@Amount

Insert into #CurTransactionRunningTot(TransactionDetailId,RunningTotal)

values(@TransactionDetailId,@RunningTotal)

end

close CursorControl

Deallocate CursorControl

go

-- Create a temp table to hold running values - Negates the requirement for the

-- AccountRunningTotal on TransactionDetail

drop table #TransactionRunningTot

go

Create Table #TransactionRunningTot(

TransactionDetailId integer,

RunningTotal money

)

go

insert into #TransactionRunningTot(TransactionDetailId,RunningTotal)

Select TransactionDetailId,null

from TransactionDetail

go

/* Note that top 999999999..... is used as top 100 percent within a cte ignores any ordering */

/*

with cteName(Name)

as

(

select top 100 percent name from sysobjects order by name

)

Select * from ctename -- Fails to order correctly

*/

/*

with cteName(Name)

as

(

select top 9999999 name from sysobjects order by name

)

Select * from ctename -- orders correctly

*/

/* TEST 1 - No Indexes - Running total on Date using a temptable to hold results*/

/* For informational purposes only , does fail to update in correct order - im guessing due to the hashmatch after the sorting - Check query plan*/

Declare @JulianDate integer

Declare @RunningTotal Money

Select @JulianDate = -1,

@RunningTotal = 0;

with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

as

(

Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,

#TransactionRunningTot.RunningTotal

from TransactionDetail join #TransactionRunningTot on

TransactionDetail.TransactionDetailId = #TransactionRunningTot.TransactionDetailId

order by Date,TransactionDetail.TransactionDetailId

)

update cteValues

set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

RunningTotal = @RunningTotal,

@JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

go

/* TEST 1 - Results */

/* RowCounts first */

Select count(*) from #CurTransactionRunningTot

Select count(*) from #TransactionRunningTot

go

Select count(*)

from #CurTransactionRunningTot

join #TransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = #TransactionRunningTot.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal = #TransactionRunningTot.RunningTotal

go

/* TEST 2 - No Indexes - Running total on Date - Updating base table */

update TransactionDetail set AccountRunningTotal=NULL

go

Declare @JulianDate integer

Declare @RunningTotal Money

Select @JulianDate = -1,

@RunningTotal = 0;

with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

as

(

Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,AccountRunningTotal

from TransactionDetail

order by Date,TransactionDetailID

)

update cteValues

set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

RunningTotal = @RunningTotal,

@JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

go

/* Select Rows that dont match */

/* PASS */

Select *

from TransactionDetail

join #CurTransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal <> TransactionDetail.AccountRunningTotal

order by date

/* Select count of Rows that do match */

Select count(*)

from TransactionDetail

join #CurTransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal = TransactionDetail.AccountRunningTotal

go

/* TEST 3 - Update with nc-index on date */

create index idxDate on TransactionDetail(Date)

go

update TransactionDetail set AccountRunningTotal=NULL

go

Declare @JulianDate integer

Declare @RunningTotal Money

Select @JulianDate = -1,

@RunningTotal = 0;

with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

as

(

Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,AccountRunningTotal

from TransactionDetail

order by Date,TransactionDetailID

)

update cteValues

set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

RunningTotal = @RunningTotal,

@JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

go

/* Select Rows that dont match */

Select *

from TransactionDetail

join #CurTransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal <> TransactionDetail.AccountRunningTotal

order by date

go

/* Select count of Rows that do match */

Select count(*)

from TransactionDetail

join #CurTransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal = TransactionDetail.AccountRunningTotal

go

Drop index TransactionDetail.idxDate

go

/* TEST 3 - Update with clustered-index on date */

create clustered index idxDate on TransactionDetail(Date)

go

update TransactionDetail set AccountRunningTotal=NULL

go

Declare @JulianDate integer

Declare @RunningTotal Money

Select @JulianDate = -1,

@RunningTotal = 0;

with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

as

(

Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,AccountRunningTotal

from TransactionDetail

order by Date,TransactionDetailID

)

update cteValues

set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

RunningTotal = @RunningTotal,

@JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

go

/* Select Rows that dont match */

Select *

from TransactionDetail

join #CurTransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal <> TransactionDetail.AccountRunningTotal

order by date

go

/* Select Rows that do match */

Select count(*)

from TransactionDetail

join #CurTransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal = TransactionDetail.AccountRunningTotal

go

Drop index TransactionDetail.idxDate

go

/* TEST 4 - Update with clustered-index on Amount */

create clustered index idxAmount on TransactionDetail(Amount)

go

update TransactionDetail set AccountRunningTotal=NULL

go

Declare @JulianDate integer

Declare @RunningTotal Money

Select @JulianDate = -1,

@RunningTotal = 0;

with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

as

(

Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,AccountRunningTotal

from TransactionDetail

order by Date,TransactionDetailID

)

update cteValues

set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

RunningTotal = @RunningTotal,

@JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

go

/* Select Rows that dont match */

Select *

from TransactionDetail

join #CurTransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal <> TransactionDetail.AccountRunningTotal

order by date

go

/* Select Rows that do match */

Select count(*)

from TransactionDetail

join #CurTransactionRunningTot

on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

and #CurTransactionRunningTot.RunningTotal = TransactionDetail.AccountRunningTotal

go

Clear Sky SQL
My Blog[/url]

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 6

Very cool... Thanks. I'll take a look at the code.

The two things that bother me about the quirky updates are:
1) The clustered index is required in the order the update is to happen.
2) The additional column that is also required.

Yes, they bother me as well and I wish MS would fix the SUM() function to work the same as ROW_NUMBER() so we don't need to do this and we could do it in a view... but then it wouldn't be so much fun. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Garadin One Orange Chip Points: 29613 More actions · Answer 7

Jeff Moden (9/30/2009)
Very cool... Thanks. I'll take a look at the code.
The two things that bother me about the quirky updates are:
1) The clustered index is required in the order the update is to happen.
2) The additional column that is also required.
Yes, they bother me as well and I wish MS would fix the SUM() function to work the same as ROW_NUMBER() so we don't need to do this and we could do it in a view... but then it wouldn't be so much fun. 😀

Even if they did, not all the uses for this would go away. A lot of the things involving dates / complex conditions would still employ it.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 8

Man, do I agree with that... even if they solve the running total problem, previous row problems will likely never go away.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 9

Right, the real failing isn't the language syntax, so much as it is what the optimizer does with it. We all know how to write a Running Total using even only ANSI Standard SQL that's been around forever:

SELECT

t1.AccountID,

t1.TransactionDate ,

( SELECT SUM(Amount)

FROM TransactionDetail t2

WHERE t2.AccountID = t2.AccountID

AND t2.TransactionDate <= t1.TransactionDate

) AS AccountRunningTotal

FROM dbo.TransactionDetail t1

ORDER BY t1.AccountID, t1.TransactionDate

The real problem is that the optimizer chooses to implement this with a Triangular Join query plan ( O(n^2) ), apparently because it fails to recognize that this could be accomplished much more efficiently with a Stream Aggregate query plan. All that [font="Courier New"]SUM(..) OVER(ORDER BY ..)[/font] would do for us is to make it easy for the optimizer to figure this out. But if it were smarter, then it wouldn't need these extra syntactical aids, it would figure it out anyway.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Dave Ballantyne SSC-Dedicated Points: 33566 More actions · Answer 10

Just for completeness, quirky merge goes out of order to...

create table #TransTest(

InsOrder integer identity,

TransactionDetailId integer null,

Test integer null,

RunningTotal money null

)

go

create index #idxtest on #TransTest(TransactionDetailId)

go

delete from #TransTest

go

declare @test-2 integer,

@RunningTotal money,

@Date date

select @test-2 = 0,

@RunningTotal = 0,

@Date = '19990101'

insert into #TransTest(TransactionDetailId)

SELECT top 99999999999 TransactionDetailId

from TransactionDetail

where cast(date as Date)='2009-12-31' order by date,TransactionDetailId

MERGE #TransTest target

USING (SELECT top 99999999999 TransactionDetailId,date,Amount,rDate=cast(date as Date)

from TransactionDetail

where cast(date as Date)='2009-12-31' order by date,TransactionDetailId)

AS source (TransactionDetailId,date,amount,rdate)

ON (target.TransactionDetailId= source.TransactionDetailId)

when matched then update

set @test-2 = coalesce(@Test +1,0),

@RunningTotal = case when @Date <> cast(source.rdate as date) then Amount else @RunningTotal+source.Amount end,

@Date =source.rdate,

target.Test = coalesce(@Test,0),

target.RunningTotal = coalesce(@RunningTotal,0);

go

select * from #TransTest join TransactionDetail on #TransTest.TransactionDetailId= TransactionDetail.TransactionDetailID

order by Date,#TransTest.TransactionDetailID

Clear Sky SQL
My Blog[/url]

Jeff Moden SSC Guru Points: 1003870 More actions · Answer 11

Test 1 fails , 2 and 3 pass but 4 fails because the clustered index update that seth was seeing in his attempt of using a view.

Heh... I ran your code, Dave. Thanks for the comments in the code. Some of the verification code has an ambiguous column error in it because of the non aliased column in ORDER BY but the easy fix was to just not include the ORDER BY so no problem there.

Admittedly, I've not looked at Seth's view attempt but I just can't imagine why anyone would add a clustered index to the Amount column. It would slow down both inserts and selects because most folks won't be sorting on it nor selecting on it alone. It does, however, demonstrate that it destroys the ORDER BY of the UPDATE CTE and that's important also because I just can't imagine having a million row heap.

So far as test 2 and 3 go, I hadn't intended to do a "grouped" running total by date because I wanted to show what happens over the entire million rows... unlike test 2 and 3, it will sometimes fail. I'm writing the code for that test right now... it won't always fail but once is enough.

Also, there are two schools of thought on whether or not to include a running total column in the original table. I agree that it would make it a pain to run concurrent running total code, but that normally wouldn't happen... normally it would be a single nightly job. The other thing that could happen is that the design of the "insert new data" process could be entirely RBAR in fashion where the running total for each row is calculated when each new row is inserted. That would be ok for a low number of transactions but would be a real killer when trying to add batches from ACH's (automatic clearing houses) as banks do every day.

As a side bar, I've found that it's actually faster to copy the required data for a given account to a temp table, add the clustered index, and display the data than either a cursor or and ordered CTE. In fact, you can also add verification code on top of all that and it's still faster than either mostly owing to the fact that you do have the proper index to do so.

Anyway... I'll be back with the code for tests 2 and 3 soon.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)