Help writing a SQL

  • While updating the view is an interesting addition, it does not necessarily prove anything in regards to the 3 part update. Your examples use only a 2 part update. I tested this view creation/2 part update method (ordered by acct, ndt) against the unindexed 2 million row result set. Here are the results.

    2 Part Update method on View Created from Unindexed data

    (2000000 row(s) affected)

    (1 row(s) affected)

    Update Done. Time Elapsed: 264 seconds

    (64425 row(s) affected)

    (1 row(s) affected)

    Select to Table Done. Total Time Elapsed: 266 seconds

    (1 row(s) affected)

    Final Row Count, Two Part Update Method:64425

    While the correct results are returned, the speed of the update is gone. I also tested the same code on the original result set which had a clustered index of ndt, acct (opposite of what is needed). The results are what I expected.

    2 Part Update method on View Created from data with a different clustered index

    (2000000 row(s) affected)

    (1 row(s) affected)

    Update Done. Time Elapsed: 43 seconds

    (1780556 row(s) affected)

    (1 row(s) affected)

    Select to Table Done. Total Time Elapsed: 48 seconds

    (1 row(s) affected)

    Final Row Count, Two Part Update Method:1780556

    Note that this is the result set that the original tests ran against(albeit with a different clustered index), and should have produced 64202 records. The reason it produced so many more is because it fell back to using a clustered index update, not the order by specified in the view. I've attached the execution plan for the last test, but I modified teh base scripts several times and don't have all the backing on this one. (Don't have time at the moment to re-run all of these and produce execution plans/scripts/test data setups, but may be able to later if you'd like to see it.)

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Dave Ballantyne (9/29/2009)


    Garadin (9/28/2009)

    Also, if you decide not to take Jeff's word for it and want to test this yourself, make sure you don't insert the data in the exact order it needs to be in.

    In my little test scripts i did post the data in a 'random' order and then created the clustered index so it was in a different order. 🙂

    Anyway Jeff , im sure you've been there done it and got the T-Shirt 😀 and look forward to your completed document , but , i really am struggling to get incorrect results. Ive scaled up by test to be on adventureworks and no matter what order or filtering i use then my Cte-Quirky update returns the same as my 'control' cursor. Must get on with some 'real' work now 😉

    Drop table #Balance

    go

    drop table #CurBalance

    go

    drop index Sales.SalesOrderHeader.idxOrderDate

    go

    create index idxOrderDate on Sales.SalesOrderHeader(Orderdate) include(SalesOrderId,SalesPersonId,subtotal,taxamt,freight)

    go

    create index idxSalesPerson on Sales.SalesOrderHeader(SalesPersonId) include(SalesOrderId,OrderDate,subtotal,taxamt,freight)

    go

    Create Table #Balance(

    SalesOrderId integer,

    RollingBalance money

    )

    Create Table #CurBalance(

    SalesOrderId integer,

    RollingBalance money

    )

    insert into #Balance(SalesOrderId,RollingBalance)

    select SalesOrderID,NULL

    from Sales.SalesOrderHeader SOH

    join Sales.SalesPerson SP

    on SP.SalesPersonID = SOH.SalesPersonID

    join Sales.SalesTerritory ST

    on St.TerritoryID = SP.TerritoryID

    where OrderDate between '01jan03' and '01may03'

    -- and SOH.SalesPersonID in(276 ,277)

    -- CountryRegionCode = 'CA'

    order by OrderDate, SOH.SalesPersonID

    --order by SOH.SalesPersonID ,OrderDate

    --Order by st.TerritoryID,OrderDate

    --Order by st.CountryRegionCode,OrderDate

    Declare @OrderYear integer,

    @OrderMonth integer,

    @SalesPersonId integer,

    @TerritoryId integer,

    @CountryRegionCode char(2),

    @RollingBalance money

    Select @OrderYear = 0

    Select @OrderMonth = 0

    Select @RollingBalance = 0

    Select @SalesPersonId =0

    ;with cteValue(SalesOrderId,OrderDate,SalesPersonId, TotalDue ,RollingBalance,TerritoryID , CountryRegionCode)

    as(

    select top 99999999999 SOH.SalesOrderID,OrderDate,SOH.SalesPersonID,TotalDue,RollingBalance, SP.TerritoryID,CountryRegionCode

    from Sales.SalesOrderHeader SOH

    join Sales.SalesPerson SP

    on SP.SalesPersonID = SOH.SalesPersonID

    join Sales.SalesTerritory ST

    on St.TerritoryID = SP.TerritoryID

    join #Balance

    on #Balance.SalesOrderId = SOH.SalesOrderID

    where OrderDate between '01jan03' and '01may03'

    -- and SOH.SalesPersonID in(276 ,277)

    -- CountryRegionCode ='CA'

    --order by OrderDate, SOH.SalesPersonID

    order by SOH.SalesPersonID ,OrderDate

    --Order by st.TerritoryID,OrderDate

    -- Order by st.CountryRegionCode,OrderDate

    )

    update cteValue

    set @RollingBalance = case when @SalesPersonId <> cteValue.SalesPersonId or

    @OrderMonth <> DATEPART(mm,OrderDate) or

    @OrderYear <> DATEPART(yy,OrderDate) or

    @TerritoryId <> TerritoryId

    then TotalDue

    else @RollingBalance +TotalDue end,

    RollingBalance = @RollingBalance,

    @OrderYear = DATEPART(yy,OrderDate),

    @OrderMonth = DATEPART(mm,OrderDate),

    @SalesPersonId = cteValue.SalesPersonId,

    @TerritoryId = cteValue.TerritoryID,

    @CountryRegionCode = cteValue.CountryRegionCode

    go

    Declare @OrderYear integer,

    @OrderMonth integer,

    @SalesPersonId integer,

    @TerritoryId integer,

    @CountryRegionCode char(2),

    @RollingBalance money,

    @PrevOrderYear integer,

    @PrevOrderMonth integer,

    @PrevSalesPersonId integer,

    @PrevTerritoryId integer,

    @PrevCountryRegionCode char(2),

    @SalesOrderId integer,

    @OrderDate datetime,

    @TotalDue money

    Select @OrderYear = 0,

    @OrderMonth = 0,

    @RollingBalance = 0,

    @SalesPersonId =0,

    @PrevOrderYear = 0,

    @PrevOrderMonth = 0,

    @PrevSalesPersonId =0,

    @PrevTerritoryId =0,

    @PrevCountryRegionCode='XX'

    declare balancecur cursor for

    select SOH.SalesOrderID,OrderDate,SOH.SalesPersonID,TotalDue, SP.TerritoryID,CountryRegionCode

    from Sales.SalesOrderHeader SOH

    join Sales.SalesPerson SP

    on SP.SalesPersonID = SOH.SalesPersonID

    join Sales.SalesTerritory ST

    on St.TerritoryID = SP.TerritoryID

    where OrderDate between '01jan03' and '01may03'

    --and SOH.SalesPersonID in(276 ,277)

    --CountryRegionCode ='CA'

    --order by OrderDate, SOH.SalesPersonID

    order by SOH.SalesPersonID ,OrderDate

    --Order by st.TerritoryID,OrderDate

    --Order by st.CountryRegionCode,OrderDate

    open balancecur

    while(0=0) begin

    fetch next from balancecur into @SalesOrderId,@OrderDate,@SalesPersonId,@TotalDue,@TerritoryId,@CountryRegionCode

    if(@@FETCH_STATUS<>0) break

    Select @OrderMonth = DATEPART(mm,@OrderDate),

    @OrderYear = DATEPART(yy,@OrderDate)

    if( @OrderMonth <> @PrevOrderMonth or

    @OrderYear <> @PrevOrderYear or

    @TerritoryId <> @PrevTerritoryId or

    @CountryRegionCode <> @PrevCountryRegionCode or

    @SalesPersonId <> @PrevSalesPersonId

    ) begin

    Select @RollingBalance = 0,

    @PrevOrderMonth = @OrderMonth,

    @PrevOrderYear = @OrderYear,

    @PrevSalesPersonId = @SalesPersonId,

    @PrevCountryRegionCode = @CountryRegionCode ,

    @PrevTerritoryId = @TerritoryId

    end

    Select @RollingBalance = @RollingBalance + @TotalDue

    insert into #CurBalance(SalesOrderId,RollingBalance)

    values(@SalesOrderId,@RollingBalance)

    end

    close balancecur

    deallocate balancecur

    go

    select COUNT(*) from #CurBalance

    go

    select COUNT(*) from #Balance

    go

    select COUNT(*) from #CurBalance join #Balance

    on #CurBalance.SalesOrderId = #Balance.SalesOrderId

    and #CurBalance.RollingBalance= #Balance.RollingBalance

    I'm not sure why you think I have the time to read that much undocumented code to try to figure out what results you're actually expecting.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dave,

    Ok... despite what I said about the length of undocumented code, I actually did take the time to check it out because of the importance of the concept. And, I ran the code and as you pointed out, the running balances work correctly. That's part of the problem that I'm talking about... for small numbers of rows, it frequently appears to work correctly. The problem occurs if you get a "merry-go-round" index which is an index that starts somewhere in the middle and wraps around to the beginning after hitting the end.

    However, I'm always willing to learn new things. See if you can make an "ordered update" using a CTE work on the following. Please see the note in the code that says "Add any indexes you want here...". If you can make it work without error in the running balance, then I guess I'll need to revisit some things concerning a running balance. Thanks.

    /*************************************************************************************

    Create the test table with a non-clustered Primary Key and a separate clustered index

    This code has been tested in SQL Server 2000 and 2005.

    *************************************************************************************/

    --===== Do this testing in a nice, "safe" place that everyone has

    USE TempDB

    GO

    --===== If the test table already exists, drop it in case we need to rerun.

    -- The 3 part naming is overkill, but prevents accidents on real tables.

    IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL

    DROP TABLE TempDB.dbo.TransactionDetail

    GO

    --===== Create the test table (TransactionDetail) with a NON clustered PK

    CREATE TABLE dbo.TransactionDetail

    (

    TransactionDetailID INT IDENTITY(1,1),

    Date DATETIME,

    AccountID INT,

    Amount MONEY,

    AccountRunningTotal MONEY --Running total across each account

    )

    GO

    -- Add any indexes you want here... before the data population below

    GO

    /*************************************************************************************

    Populate the table using a rather slow method but one that's sure to cause lots of

    Page splits and that will fragment the table with over 99% fragmentation.

    *************************************************************************************/

    --===== Preset the environment for appearance and speed

    SET NOCOUNT ON

     

    --===== Populate the table in "segments" to force page splits.

    -- Normally this would NOT have a While loop in it.

    WHILE (ISNULL(IDENT_CURRENT('TransactionDetail'),0)) < 1000000

    BEGIN

    INSERT INTO dbo.TransactionDetail

    (Date, AccountID, Amount)

    SELECT TOP 10000

    --10 years worth of dates with times from 1/1/2000 to 12/31/2009

    CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,

    --Just one AccountID

    275 AS AccountID,

    --Dollar amounts from -99.99 to + 99.99

    CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY) AS Amount

    FROM Master.dbo.SysColumns sc1

    CROSS JOIN Master.dbo.SysColumns sc2

    END

    GO

    --===== Verify the row count

    SELECT COUNT(*) FROM dbo.TransactionDetail

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CREATE CLUSTERED INDEX TransactionDetail_AccountDate ON dbo.TransactionDetail(AccountID, [Date])

    GO

    DECLARE @AccountID INT, @AccountRunningTotal MONEY

    UPDATE T

    SET @AccountRunningTotal = AccountRunningTotal

    = Amount + CASE

    WHEN @AccountID = AccountID

    THEN @AccountRunningTotal

    ELSE 0

    END,

    @AccountID = AccountID

    FROM dbo.TransactionDetail T

    WITH (INDEX (TransactionDetail_AccountDate)) -- Forcing following the order in the index

    SELECT * FROM dbo.TransactionDetail

    ORDER BY AccountID, [Date]

    Worked for me...

    _____________
    Code for TallyGenerator

  • Apologies for the big lump of code before , kind of obvious that everyone else hasn't gone through the same process i have to reach that point.

    Just to recap.

    The two things that bother me about the quirky updates are:

    1) The clustered index is required in the order the update is to happen.

    2) The additional column that is also required.

    These may be impractical for any variety of reasons.

    What i have attempted to do is use a cte / view to attempt to solve these issues.

    In the case of issue 1 then a cte is used to join to a temp table to hold the running total and both issues use the ordering within a cte to resolve issue 2.

    Its a fail big time (only tested with 2005 9.00.4207.00 (X64) ) but it may spark a fresh idea with someone else

    The below code contains 4 Tests , Test 1 attempts to solve issues 1 and 2, and the others attempt to solve problem 1 in isolation.

    Test 1 fails , 2 and 3 pass but 4 fails because the clustered index update that seth was seeing in his attempt of using a view.

    /*************************************************************************************

    Create the test table with a non-clustered Primary Key and a separate clustered index

    This code has been tested in SQL Server 2000 and 2005.

    *************************************************************************************/

    --===== Do this testing in a nice, "safe" place that everyone has

    USE TempDB

    GO

    --===== If the test table already exists, drop it in case we need to rerun.

    -- The 3 part naming is overkill, but prevents accidents on real tables.

    IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL

    DROP TABLE TempDB.dbo.TransactionDetail

    GO

    --===== Create the test table (TransactionDetail) with a NON clustered PK

    CREATE TABLE dbo.TransactionDetail

    (

    TransactionDetailID INT IDENTITY(1,1),

    Date DATETIME,

    AccountID INT,

    Amount MONEY,

    AccountRunningTotal MONEY --Running total across each account

    )

    GO

    -- Add any indexes you want here... before the data population below

    GO

    /*************************************************************************************

    Populate the table using a rather slow method but one that's sure to cause lots of

    Page splits and that will fragment the table with over 99% fragmentation.

    *************************************************************************************/

    --===== Preset the environment for appearance and speed

    SET NOCOUNT ON

    --===== Populate the table in "segments" to force page splits.

    -- Normally this would NOT have a While loop in it.

    WHILE (ISNULL(IDENT_CURRENT('TransactionDetail'),0)) < 1000000

    BEGIN

    INSERT INTO dbo.TransactionDetail

    (Date, AccountID, Amount)

    SELECT TOP 10000

    --10 years worth of dates with times from 1/1/2000 to 12/31/2009

    CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,

    --Just one AccountID

    275 AS AccountID,

    --Dollar amounts from -99.99 to + 99.99

    CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY) AS Amount

    FROM Master.dbo.SysColumns sc1

    CROSS JOIN Master.dbo.SysColumns sc2

    END

    GO

    --===== Verify the row count

    SELECT COUNT(*) FROM dbo.TransactionDetail

    go

    -- GOALS -- A) Find a quirky update method not reliant upon having a clustered index in the required order

    -- B) Find a quirky update method not reliant upon having a extra column on the base table - MultiUser issues

    go

    -- Build the control running totals using a cursor for comparison

    Drop table #CurTransactionRunningTot

    go

    Create Table #CurTransactionRunningTot(

    TransactionDetailId integer not null,

    RunningTotal money not null

    )

    Declare @JulianDate integer,

    @PrevJulianDate integer,

    @Amount money,

    @RunningTotal Money,

    @TransactionDetailId integer

    Select @PrevJulianDate = -1

    Declare CursorControl Cursor for

    Select TransactionDetailId,convert(integer,DATEDIFF(DAY,0,Date)),Amount

    from TransactionDetail

    order by Date,TransactionDetailID

    open CursorControl

    while(0=0) begin

    Fetch next from CursorControl into @TransactionDetailId,@JulianDate,@Amount

    if(@@Fetch_Status <>0) break

    If(@JulianDate <> @PrevJulianDate) begin

    Select @PrevJulianDate = @JulianDate

    Select @RunningTotal =0

    end

    select @RunningTotal = @RunningTotal+@Amount

    Insert into #CurTransactionRunningTot(TransactionDetailId,RunningTotal)

    values(@TransactionDetailId,@RunningTotal)

    end

    close CursorControl

    Deallocate CursorControl

    go

    -- Create a temp table to hold running values - Negates the requirement for the

    -- AccountRunningTotal on TransactionDetail

    drop table #TransactionRunningTot

    go

    Create Table #TransactionRunningTot(

    TransactionDetailId integer,

    RunningTotal money

    )

    go

    insert into #TransactionRunningTot(TransactionDetailId,RunningTotal)

    Select TransactionDetailId,null

    from TransactionDetail

    go

    /* Note that top 999999999..... is used as top 100 percent within a cte ignores any ordering */

    /*

    with cteName(Name)

    as

    (

    select top 100 percent name from sysobjects order by name

    )

    Select * from ctename -- Fails to order correctly

    */

    /*

    with cteName(Name)

    as

    (

    select top 9999999 name from sysobjects order by name

    )

    Select * from ctename -- orders correctly

    */

    /* TEST 1 - No Indexes - Running total on Date using a temptable to hold results*/

    /* For informational purposes only , does fail to update in correct order - im guessing due to the hashmatch after the sorting - Check query plan*/

    Declare @JulianDate integer

    Declare @RunningTotal Money

    Select @JulianDate = -1,

    @RunningTotal = 0;

    with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

    as

    (

    Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,

    #TransactionRunningTot.RunningTotal

    from TransactionDetail join #TransactionRunningTot on

    TransactionDetail.TransactionDetailId = #TransactionRunningTot.TransactionDetailId

    order by Date,TransactionDetail.TransactionDetailId

    )

    update cteValues

    set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

    RunningTotal = @RunningTotal,

    @JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

    go

    /* TEST 1 - Results */

    /* RowCounts first */

    Select count(*) from #CurTransactionRunningTot

    Select count(*) from #TransactionRunningTot

    go

    Select count(*)

    from #CurTransactionRunningTot

    join #TransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = #TransactionRunningTot.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal = #TransactionRunningTot.RunningTotal

    go

    /* TEST 2 - No Indexes - Running total on Date - Updating base table */

    update TransactionDetail set AccountRunningTotal=NULL

    go

    Declare @JulianDate integer

    Declare @RunningTotal Money

    Select @JulianDate = -1,

    @RunningTotal = 0;

    with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

    as

    (

    Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,AccountRunningTotal

    from TransactionDetail

    order by Date,TransactionDetailID

    )

    update cteValues

    set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

    RunningTotal = @RunningTotal,

    @JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

    go

    /* Select Rows that dont match */

    /* PASS */

    Select *

    from TransactionDetail

    join #CurTransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal <> TransactionDetail.AccountRunningTotal

    order by date

    /* Select count of Rows that do match */

    Select count(*)

    from TransactionDetail

    join #CurTransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal = TransactionDetail.AccountRunningTotal

    go

    /* TEST 3 - Update with nc-index on date */

    create index idxDate on TransactionDetail(Date)

    go

    update TransactionDetail set AccountRunningTotal=NULL

    go

    Declare @JulianDate integer

    Declare @RunningTotal Money

    Select @JulianDate = -1,

    @RunningTotal = 0;

    with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

    as

    (

    Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,AccountRunningTotal

    from TransactionDetail

    order by Date,TransactionDetailID

    )

    update cteValues

    set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

    RunningTotal = @RunningTotal,

    @JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

    go

    /* Select Rows that dont match */

    Select *

    from TransactionDetail

    join #CurTransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal <> TransactionDetail.AccountRunningTotal

    order by date

    go

    /* Select count of Rows that do match */

    Select count(*)

    from TransactionDetail

    join #CurTransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal = TransactionDetail.AccountRunningTotal

    go

    Drop index TransactionDetail.idxDate

    go

    /* TEST 3 - Update with clustered-index on date */

    create clustered index idxDate on TransactionDetail(Date)

    go

    update TransactionDetail set AccountRunningTotal=NULL

    go

    Declare @JulianDate integer

    Declare @RunningTotal Money

    Select @JulianDate = -1,

    @RunningTotal = 0;

    with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

    as

    (

    Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,AccountRunningTotal

    from TransactionDetail

    order by Date,TransactionDetailID

    )

    update cteValues

    set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

    RunningTotal = @RunningTotal,

    @JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

    go

    /* Select Rows that dont match */

    Select *

    from TransactionDetail

    join #CurTransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal <> TransactionDetail.AccountRunningTotal

    order by date

    go

    /* Select Rows that do match */

    Select count(*)

    from TransactionDetail

    join #CurTransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal = TransactionDetail.AccountRunningTotal

    go

    Drop index TransactionDetail.idxDate

    go

    /* TEST 4 - Update with clustered-index on Amount */

    create clustered index idxAmount on TransactionDetail(Amount)

    go

    update TransactionDetail set AccountRunningTotal=NULL

    go

    Declare @JulianDate integer

    Declare @RunningTotal Money

    Select @JulianDate = -1,

    @RunningTotal = 0;

    with cteValues(TransactionDetailId,Date,AccountID,Amount,RunningTotal)

    as

    (

    Select top 999999999999999999 TransactionDetail.TransactionDetailId,Date,AccountID,Amount,AccountRunningTotal

    from TransactionDetail

    order by Date,TransactionDetailID

    )

    update cteValues

    set @RunningTotal =Case when @JulianDate <> convert(integer,DATEDIFF(DAY,0,Date)) then Amount else @RunningTotal+Amount end,

    RunningTotal = @RunningTotal,

    @JulianDate = convert(integer,DATEDIFF(DAY,0,Date))

    go

    /* Select Rows that dont match */

    Select *

    from TransactionDetail

    join #CurTransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal <> TransactionDetail.AccountRunningTotal

    order by date

    go

    /* Select Rows that do match */

    Select count(*)

    from TransactionDetail

    join #CurTransactionRunningTot

    on #CurTransactionRunningTot.TransactionDetailId = TransactionDetail.TransactionDetailID

    and #CurTransactionRunningTot.RunningTotal = TransactionDetail.AccountRunningTotal

    go



    Clear Sky SQL
    My Blog[/url]

  • Very cool... Thanks. I'll take a look at the code.

    The two things that bother me about the quirky updates are:

    1) The clustered index is required in the order the update is to happen.

    2) The additional column that is also required.

    Yes, they bother me as well and I wish MS would fix the SUM() function to work the same as ROW_NUMBER() so we don't need to do this and we could do it in a view... but then it wouldn't be so much fun. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/30/2009)


    Very cool... Thanks. I'll take a look at the code.

    The two things that bother me about the quirky updates are:

    1) The clustered index is required in the order the update is to happen.

    2) The additional column that is also required.

    Yes, they bother me as well and I wish MS would fix the SUM() function to work the same as ROW_NUMBER() so we don't need to do this and we could do it in a view... but then it wouldn't be so much fun. 😀

    Even if they did, not all the uses for this would go away. A lot of the things involving dates / complex conditions would still employ it.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Man, do I agree with that... even if they solve the running total problem, previous row problems will likely never go away.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Right, the real failing isn't the language syntax, so much as it is what the optimizer does with it. We all know how to write a Running Total using even only ANSI Standard SQL that's been around forever:

    SELECT

    t1.AccountID,

    t1.TransactionDate ,

    ( SELECT SUM(Amount)

    FROM TransactionDetail t2

    WHERE t2.AccountID = t2.AccountID

    AND t2.TransactionDate <= t1.TransactionDate

    ) AS AccountRunningTotal

    FROM dbo.TransactionDetail t1

    ORDER BY t1.AccountID, t1.TransactionDate

    The real problem is that the optimizer chooses to implement this with a Triangular Join query plan ( O(n^2) ), apparently because it fails to recognize that this could be accomplished much more efficiently with a Stream Aggregate query plan. All that [font="Courier New"]SUM(..) OVER(ORDER BY ..)[/font] would do for us is to make it easy for the optimizer to figure this out. But if it were smarter, then it wouldn't need these extra syntactical aids, it would figure it out anyway.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Just for completeness, quirky merge goes out of order to...

    create table #TransTest(

    InsOrder integer identity,

    TransactionDetailId integer null,

    Test integer null,

    RunningTotal money null

    )

    go

    create index #idxtest on #TransTest(TransactionDetailId)

    go

    delete from #TransTest

    go

    declare @test-2 integer,

    @RunningTotal money,

    @Date date

    select @test-2 = 0,

    @RunningTotal = 0,

    @Date = '19990101'

    insert into #TransTest(TransactionDetailId)

    SELECT top 99999999999 TransactionDetailId

    from TransactionDetail

    where cast(date as Date)='2009-12-31' order by date,TransactionDetailId

    MERGE #TransTest target

    USING (SELECT top 99999999999 TransactionDetailId,date,Amount,rDate=cast(date as Date)

    from TransactionDetail

    where cast(date as Date)='2009-12-31' order by date,TransactionDetailId)

    AS source (TransactionDetailId,date,amount,rdate)

    ON (target.TransactionDetailId= source.TransactionDetailId)

    when matched then update

    set @test-2 = coalesce(@Test +1,0),

    @RunningTotal = case when @Date <> cast(source.rdate as date) then Amount else @RunningTotal+source.Amount end,

    @Date =source.rdate,

    target.Test = coalesce(@Test,0),

    target.RunningTotal = coalesce(@RunningTotal,0);

    go

    select * from #TransTest join TransactionDetail on #TransTest.TransactionDetailId= TransactionDetail.TransactionDetailID

    order by Date,#TransTest.TransactionDetailID



    Clear Sky SQL
    My Blog[/url]

  • Test 1 fails , 2 and 3 pass but 4 fails because the clustered index update that seth was seeing in his attempt of using a view.

    Heh... I ran your code, Dave. Thanks for the comments in the code. Some of the verification code has an ambiguous column error in it because of the non aliased column in ORDER BY but the easy fix was to just not include the ORDER BY so no problem there.

    Admittedly, I've not looked at Seth's view attempt but I just can't imagine why anyone would add a clustered index to the Amount column. It would slow down both inserts and selects because most folks won't be sorting on it nor selecting on it alone. It does, however, demonstrate that it destroys the ORDER BY of the UPDATE CTE and that's important also because I just can't imagine having a million row heap.

    So far as test 2 and 3 go, I hadn't intended to do a "grouped" running total by date because I wanted to show what happens over the entire million rows... unlike test 2 and 3, it will sometimes fail. I'm writing the code for that test right now... it won't always fail but once is enough.

    Also, there are two schools of thought on whether or not to include a running total column in the original table. I agree that it would make it a pain to run concurrent running total code, but that normally wouldn't happen... normally it would be a single nightly job. The other thing that could happen is that the design of the "insert new data" process could be entirely RBAR in fashion where the running total for each row is calculated when each new row is inserted. That would be ok for a low number of transactions but would be a real killer when trying to add batches from ACH's (automatic clearing houses) as banks do every day.

    As a side bar, I've found that it's actually faster to copy the required data for a given account to a temp table, add the clustered index, and display the data than either a cursor or and ordered CTE. In fact, you can also add verification code on top of all that and it's still faster than either mostly owing to the fact that you do have the proper index to do so.

    Anyway... I'll be back with the code for tests 2 and 3 soon.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 61 through 70 (of 70 total)

You must be logged in to reply to this topic. Login to reply