Interesting problem for the "Running Total/Quirky Update" method of removing cursors

  • Robb Melancon

    Ten Centuries

    Points: 1129

    This is an extension to another topic I had posted but I decided to split it off as the problem is different than the original problem of replacing the cursor/while loop with a faster method.

    So I am stuck on a problem I am having using the "quirky update" method for replacing a cursor/while loop. The root of the issue is that the update compares row n to row n+1 and updates the contents of row n+1. What I need is a comparison of row n to row n+1 but then update the contents of row n.

    Here is the code that displays the problem.

    create table #DMVRunningPerformance (

    Symbol_vc varchar(255),

    DayID_in int,

    MV float,

    DailyPerf float null,

    RunningPerf float null)

    CREATE CLUSTERED INDEX [indexdown] ON #DMVRunningPerformance

    (

    [Symbol_vc] ASC,

    [DayID_in] DESC

    )

    WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

    insert into #DMVRunningPerformance

    values ('IBM',75006,201048.8987,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75005,200841.5658,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75004,200321.7043,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75003,201120.0467,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75002,201779.8805,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75001,201651.3917,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75000,201651.3917,NULL,NULL)

    declare @NewGroup_bt bit=1,

    @DailyPerformance float,

    @RunningPerformance float,

    @NDMarketValue float

    update #DMVRunningPerformance

    set

    @DailyPerformance=CASE WHEN @NewGroup_bt=1 THEN 0 ELSE

    (@NDMarketValue-MV)/MV END,

    @RunningPerformance=CASE WHEN @NewGroup_bt=1 THEN 0 ELSE

    ((1 + @RunningPerformance) * (1 + @DailyPerformance)) - 1 END,

    DailyPerf=@DailyPerformance,

    RunningPerf=@RunningPerformance,

    @NDMarketValue=MV,

    @NewGroup_bt=0

    FROM #DMVRunningPerformance OPTION (MAXDOP 1)

    select * from #DMVRunningPerformance

    drop table #DMVRunningPerformance

    Note that I can't change the order as it would completely change the performance numbers. I am trying to avoid having to roll through the set a second time to shift the numbers up. So to clarify, the MV of row n is compared to MV of row n+1 to get daily performance. This performance number (and the running performance) need to be stored in row n, not row n+ 1.

    Thanks in advance,

    Robb

  • Gopi Muluka

    SSCarpal Tunnel

    Points: 4594

    Try this

    create table #DMVRunningPerformance (

    Symbol_vc varchar(25),

    DayID_in int,

    MV float,

    DailyPerf float null,

    RunningPerf float null)

    CREATE CLUSTERED INDEX [indexdown] ON #DMVRunningPerformance

    (

    [Symbol_vc] ASC,

    [DayID_in] DESC

    )

    WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

    insert into #DMVRunningPerformance

    values ('IBM',75006,201048.8987,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75005,200841.5658,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75004,200321.7043,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75003,201120.0467,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75002,201779.8805,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75001,201651.3917,NULL,NULL)

    insert into #DMVRunningPerformance

    values ('IBM',75000,201651.3917,NULL,NULL)

    declare @NewGroup_bt INT,

    @DailyPerformance float,

    @RunningPerformance float,

    @NDMarketValue float

    set @NewGroup_bt =0;

    set @RunningPerformance=0;

    update B

    set

    @NDMarketValue=B.MV,

    @DailyPerformance = (@NDMarketValue-C.MV)/C.MV ,

    B.DailyPerf=@DailyPerformance,

    @RunningPerformance= ((1 + @DailyPerformance)*(1 + @RunningPerformance) ) - 1 ,

    @NewGroup_bt=@NewGroup_bt+1,

    B.RunningPerf=@RunningPerformance

    FROM #DMVRunningPerformance b

    INNER JOIN #DMVRunningPerformance C

    ON b.DAYID_IN=C.DAYID_IN+1

    select * from #DMVRunningPerformance

    drop table #DMVRunningPerformance

  • ChrisM@Work

    SSC Guru

    Points: 186043

    Robb, can you please confirm that you are running on SQL Server 7 or 2k?

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • Robb Melancon

    Ten Centuries

    Points: 1129

    Running SQL2008

    Also Ten Centuries, your solution violates the rules of using the "Quirky Update" by using a Join. I'll add a link to the article on "Quirky Update".

    http://www.sqlservercentral.com/articles/T-SQL/68467/[\url]

    I don't know that this method is ultimately what I will use as I am skeptical about using it in a production environment, but it is the quickest performing alternative to cursors that I have found.

  • Garadin

    One Orange Chip

    Points: 29613

    I'm still a little unclear on your requirements. Can you provide the results you are trying to get?

    This is a guess and might be the whole thing you're trying to avoid, but what about changing the sort order to ASC on day_id and then using this:

    update #DMVRunningPerformance

    set

    @DailyPerformance=DailyPerf=CASE WHEN @NewGroup_bt=1 THEN 0 ELSE

    (MV - @NDMarketValue)/ @NDMarketValue END,

    @RunningPerformance=RunningPerf=CASE WHEN @NewGroup_bt=1 THEN 0 ELSE

    ((1 + @RunningPerformance) * (1 + @DailyPerformance)) - 1 END,

    @NDMarketValue=MV,

    @NewGroup_bt=0

    FROM #DMVRunningPerformance OPTION (MAXDOP 1)

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • ChrisM@Work

    SSC Guru

    Points: 186043

    Robb Melancon (5/21/2010)


    Running SQL2008

    Also Ten Centuries, your solution violates the rules of using the "Quirky Update" by using a Join. I'll add a link to the article on "Quirky Update".

    http://www.sqlservercentral.com/articles/T-SQL/68467/[\url]

    I don't know that this method is ultimately what I will use as I am skeptical about using it in a production environment, but it is the quickest performing alternative to cursors that I have found.

    Hi Robb

    A recursive CTE will do the job. Like Seth, I'm unsure of which direction (relative to DayID_in) you wish to run the calculation for RunningPerf. Here's an example which should get you started:

    ;WITH Calculator AS (

    -- Anchor row, tr.Seq = 1

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf = CAST((nextrow.MV-thisrow.MV)/thisrow.MV AS FLOAT),

    RunningPerf = CAST(NULL AS FLOAT)

    FROM #OrderedData thisrow

    INNER JOIN #OrderedData nextrow ON nextrow.Seq = thisrow.Seq + 1

    WHERE thisrow.Seq = 1

    UNION ALL

    -- first recursion will be Seq = 2 (thisrow.Seq = 2, lastrow.Seq = 1)

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf = CAST((nextrow.MV-thisrow.MV)/thisrow.MV AS FLOAT),

    RunningPerf = lastrow.RunningPerf

    FROM Calculator lastrow

    INNER JOIN #OrderedData thisrow ON thisrow.Seq = lastrow.Seq + 1

    INNER JOIN #OrderedData nextrow ON nextrow.Seq = thisrow.Seq + 1

    )

    SELECT Seq, Symbol_vc, DayID_in, MV, DailyPerf, RunningPerf

    FROM Calculator

    Note that if you have a unique clustered index on the seq column, as in the example, the SELECT will run in a similar timeframe to a running totals update.

    If you want to use the results of the SELECT to update another table, then you would be wise to run the results into another table rather than attempting an UPDATE FROM ... with the rCTE as a table source - performance can be poor.

    The SELECT returns 6 of the 7 rows because of the INNER JOIN 'nextrow'. Logically you can't get a value for the last row in any case because it has no next row! However, you can make the row appear in your output by fiddling the join criteria.

    Come back if you need help with this. The important point to remember about recursive CTE's is that only the result of the last iteration is 'exposed' to the next.

    Cheers

    ChrisM

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • Robb Melancon

    Ten Centuries

    Points: 1129

    Thanks for the input guys. The rows need to be calculated in dayid desc order. Here are the expected results.

    AssetID_in Symbol DayID_in MV DailyPerf RunningPerf

    100 IBM 75006 NULL 201048.8987 0.001032321 0.001032321

    100 IBM 75005 NULL 200841.5658 0.002595133 0.003630133

    100 IBM 75004 NULL 200321.7043 -0.003969482 -0.000353759

    100 IBM 75003 NULL 201120.0467 -0.003270067 -0.003622669

    100 IBM 75002 NULL 201779.8805 0.000637183 -0.002987795

    100 IBM 75001 NULL 201651.3917 0 -0.002987795

    100 IBM 75000 NULL 201651.3917 0 -0.002987795

    It's a bit odd because you need to start on the second row (Todays MV - Yesterdays MV divided by Yesterdays MV) to get the first rows performance but the running is done first row to second row. So 75006 Daily is (201047.8987-200841.5658)/200841.5658 and running is 1 + Current Running (which is 0 on day 75006) * 1 + Daily minus 1. I realize the Running Performance seems backwards meaning that the value for the entire period is actually stored on the first day of the period, but this is done for a reason which is more than you probably want to know about calculating market performance for periods.... anyway hope this calrifies a little more.

  • ChrisM@Work

    SSC Guru

    Points: 186043

    Robb, looks straightforward - but can you please edit and if necessary correct correct the line beginning -

    So 75006 Daily is...

    Cheers

    ChrisM

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • Robb Melancon

    Ten Centuries

    Points: 1129

    Ok, so I got it. My question now is how does this perform relative to doing the same thing with fast forward read only cursors? I can answer that by implementing this in some more tests but just wondering if you guys have a guess...

    Here is the completed code:

    create table #DMVRunningPerformance (

    Seq int,

    LongPosition_bt bit null,

    AssetID_in int null,

    Symbol_vc varchar(255),

    DayID_in int,

    Status_vc varchar(255) null,

    MV float,

    DailyPerf float null,

    RunningPerf float null)

    CREATE CLUSTERED INDEX [indexdown] ON #DMVRunningPerformance

    (

    [Seq] ASC,

    [LongPosition_bt] ASC,

    [AssetID_in] ASC,

    [DayID_in] DESC

    )

    WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

    insert into #DMVRunningPerformance

    values (1,1,100,'IBM',75006,NULL,201048.8987,NULL,NULL)

    insert into #DMVRunningPerformance

    values (2,1,100,'IBM',75005,NULL,200841.5658,NULL,NULL)

    insert into #DMVRunningPerformance

    values (3,1,100,'IBM',75004,NULL,200321.7043,NULL,NULL)

    insert into #DMVRunningPerformance

    values (4,1,100,'IBM',75003,NULL,201120.0467,NULL,NULL)

    insert into #DMVRunningPerformance

    values (5,1,100,'IBM',75002,NULL,201779.8805,NULL,NULL)

    insert into #DMVRunningPerformance

    values (6,1,100,'IBM',75001,NULL,201651.3917,NULL,NULL)

    insert into #DMVRunningPerformance

    values (7,1,100,'IBM',75000,NULL,201651.3917,NULL,NULL)

    declare @AssetID_in int=-1,

    @NDAssetID_in int=-1,

    @LongPosition_bt bit=0,

    @Status_vc varchar(255)=NULL,

    @NewGroup_bt bit=1,

    @DailyPerformance float,

    @RunningPerformance float,

    @NDMarketValue float

    ;WITH Calculator AS (

    -- Anchor row, tr.Seq = 1

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf=CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),

    RunningPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT)

    FROM #DMVRunningPerformance thisrow

    INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1

    WHERE thisrow.Seq = 1

    UNION ALL

    -- first recursion will be Seq = 2 (thisrow.Seq = 2, lastrow.Seq = 1)

    SELECT thisrow.Seq, thisrow.Symbol_vc, thisrow.DayID_in, thisrow.MV,

    DailyPerf = CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT),

    RunningPerf = ((1 + lastrow.RunningPerf) * (1 + CAST((thisrow.MV-nextrow.MV)/nextrow.MV AS FLOAT))) - 1

    FROM Calculator lastrow

    INNER JOIN #DMVRunningPerformance thisrow ON thisrow.Seq = lastrow.Seq + 1

    INNER JOIN #DMVRunningPerformance nextrow ON nextrow.Seq = thisrow.Seq + 1

    )

    SELECT Seq, Symbol_vc, DayID_in, MV, DailyPerf, RunningPerf

    FROM Calculator

    drop table #dmvrunningperformance

  • ChrisM@Work

    SSC Guru

    Points: 186043

    Hi Robb

    There were some performance comparisons posted here earlier in the year. IIRC the running totals update did a million rows in about 6 seconds, the rCTE method took a little less than twice that.

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • Robb Melancon

    Ten Centuries

    Points: 1129

    So I'm running into the following:

    The statement terminated. The maximum recursion 100 has been exhausted before statement completion

    And looks like the max is 32,000, did you run this on the million row table?

  • Garadin

    One Orange Chip

    Points: 29613

    Although the max recursion is only a bit over 32K, any change in criteria will restart that recursion. You could easily run it on millions of rows... or it could crap out at 33k, it all depends on the data distribution.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • ChrisM@Work

    SSC Guru

    Points: 186043

    Robb Melancon (5/24/2010)


    So I'm running into the following:

    The statement terminated. The maximum recursion 100 has been exhausted before statement completion

    And looks like the max is 32,000, did you run this on the million row table?

    OPTION (MAXRECURSION 0)

    I'm still looking for the thread, Robb...and yes, a million rows in about 9 seconds using a recursive CTE. BTW credit to Paul White for discovering the performance improvement returned by ensuring the clustered index is unique.

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • ChrisM@Work

    SSC Guru

    Points: 186043

    Hi Robb, here's some stuff for you to play with. Note that the join to the recursive part in the rCTE is critical.

    DROP TABLE #Numbers

    SELECT TOP 1000000 --000

    n = ROW_NUMBER() OVER (ORDER BY a.name),

    CalcValue = CAST(NULL AS BIGINT)

    INTO #Numbers

    FROM master.dbo.syscolumns a, master.dbo.syscolumns b

    CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)

    -- (1,000,000 row(s) affected)

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    -- 'Quirky' update

    DECLARE @Lastval INT = 0, @CalcValue BIGINT

    UPDATE #Numbers SET

    @CalcValue = CalcValue = (@Lastval + n),

    @Lastval = n

    -- (1,000,000 row(s) affected) / CPU time = 2968 ms, elapsed time = 3079 ms.

    -- Table #Numbers... Scan count 1, logical reads 3113, physical reads 6, read-ahead reads 3146, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- Recursive CTE

    ;WITH Calculator (n, CalcValue) AS (

    SELECT n.n, CalcValue = CAST(n.n AS BIGINT)

    FROM #Numbers n

    WHERE n.n = 1

    UNION ALL

    SELECT n.n, CalcValue = n.n + c.n

    FROM #Numbers n

    INNER JOIN Calculator c ON n.n = c.n + 1 -- nice

    --INNER JOIN Calculator c ON c.n + 1 = n.n -- nice

    --INNER JOIN Calculator c ON c.n = n.n - 1 -- slow

    --INNER JOIN Calculator c ON n.n - c.n = 1 -- slow

    --INNER JOIN Calculator c ON c.n - n.n = -1 -- slow

    )

    SELECT n, CalcValue

    FROM Calculator

    OPTION (MAXRECURSION 0)

    -- (1,000,000 row(s) affected) / CPU time = 33297 ms, elapsed time = 36161 ms.

    -- Table 'Worktable'. Scan count 2, logical reads 6000001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SET STATISTICS IO Off

    SET STATISTICS TIME Off

    There's some more here.

    Cheers

    ChrisM

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

Viewing 14 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply