Running total query

  • I got part of the following idea off a post I saw on Joe Selko's blog... sorry, can't remember the link.

    I thought that I'd use a CTE to make it somewhat more robust in terms of before and after row handling... can anyone see any issues with it?

    Note that the table it runs on is:

    create table DataTable (ref int constraint PK_DataTable primary key, MajorGroup int, GroupOrder int, DataValue int)

    go

    create unique index IX_GroupOrder on DataTable (MajorGroup, GroupOrder)

    insert into DataTable (ref, MajorGroup, GroupOrder, DataValue)

    with NumGenerator (nRef, nGroup, nOrder, nValue) as

    (

    select 1, 1, 1, INTEGER(RAND()*100)

    union all

    select nRef+1, (nRef+1)%100, (nRef+1)%1000, INTEGER(RAND()*100)

    from DataTable

    where nRef < 10000

    ) select nRef, nGroup, nOrder, nValue from NumGenerator

    The query in question is:

    ;with theData (rowNum, GroupA, GroupRowNum, theValue) as

    (

    select row_number() over (order by MajorGroup, GroupOrder),

    DataValue,

    row_number() over (partition by MajorGroup

    order by MajorGroup, GroupOrder),

    GroupOrder

    from DataTable

    )

    select X.GroupA, X.GroupRowNum, X.theValue +

    coalesce(

    (select sum(theValue)

    from theData Y

    where Y.GroupA=X.GroupA and Y.GroupRowNum > X.GroupRowNum), 0

    )

    from theData X

    Can anyone see any issues with the query CTE that would cause performance issues?

    Random Technical Stuff[/url]

  • Oh, and before anyone asks why I didn't run an execution plan to see what it looks like... I did!

    I got an odd error from SQL Server (SSMS?), which I've logged a connect bug about here.

    Random Technical Stuff[/url]

  • The problem with this type of approach is that the subquery in the SELECT will be processed last and therefore will be executed once for each row in your CTE, so despite giving the appearance of being set-based, it will likely be similar in performance to a row-by-row solution.

  • Fair point... also I'm thinking it won't scale because each time it does the sum it has to sum every row before it, which of course increases every time.

    I was hoping there was a way that I could use row_number() to get the value from the previous row and add to the total, all without using an UPDATE statement. Any ideas?

    Random Technical Stuff[/url]

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply