Linking to next row using a join on row_number()+1=row_number()

  • Having previously read Jeff Moden's article "Hidden RBAR: Triangular Joins" (http://www.sqlservercentral.com/articles/T-SQL/61539/), when I read the comments following this (http://www.sqlteam.com/article/joining-to-the-next-sequential-row) article, I decided to try the method of using a join with row_number()+1=row_number on a real problem I have which currently relies on a triangular join.

    The aim is to take a series of statuses and dates for documents and get pairs of status+date with next_status+date. This is so that further processing can look at turnaround times, etc.

    However, my implementation of the method described turns out to be slower than a simple triangular join (select ...,min(b.x) from t join t a join t b on a.x<b.x group by ...).

    So my question is, is there a faster way to do this than then triangular join?

    As a second question, can anyone suggest why the performance of the row_number test solution goes down the tubes (on my system) at between 712 and 714 records (713 sometimes takes ~300ms CPU, others ~76000ms!).

    P.S. The server has 4 CPUs hence for 1000 records, elapse time is only about 25% of CPU time.

    /*

    100 records generated

    Table 'Worktable'. Scan count 0, logical reads 0,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table '#t'. Scan count 3, logical reads 9,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 16 ms, elapsed time = 29 ms.

    Table '#t'. Scan count 30, logical reads 18,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 46 ms, elapsed time = 35 ms.

    -------------------------------------------

    500 records generated

    Table 'Worktable'. Scan count 0, logical reads 0,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table '#t'. Scan count 3, logical reads 39,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 78 ms, elapsed time = 203 ms.

    Table '#t'. Scan count 30, logical reads 78,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 205 ms, elapsed time = 157 ms.

    ---------------------------------------------

    1000 records generated

    Table 'Worktable'. Scan count 0, logical reads 0,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table '#t'. Scan count 3, logical reads 78,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 297 ms, elapsed time = 716 ms.

    Table '#t'. Scan count 28, logical reads 312,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    Table 'Worktable'. Scan count 8, logical reads 775568,

    physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0,

    lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 152624 ms, elapsed time = 40658 ms.

    --------------------------------------------------

    */

    set statistics io off

    set statistics time off

    set nocount on

    -----------------------------------

    -- create some data

    -- ----------------

    -- This actually models part of a document workflow.

    -- The flow has been simplified to just 3 states.

    -- Documents start in state 'F' and may move to

    -- either 'C' (compete) or 'P' (queried) or may be

    -- amended and stay in state 'F'. Once in state 'P',

    -- the query may be answered, so the document goes

    -- to state 'F' or it may be amended and stay in

    -- state 'P'. The probabilities of moving between

    -- states are arbitrarily set for the model. In

    -- order to simulate repeated queries making it

    -- more likely for the document to be completed,

    -- the probability is increased on each step.

    -- Time intervals are randomly assigned between

    -- 5 minutes and 1 month. In the model, the generated

    -- IDs are sequential. In the real application, they

    -- may not be.

    --

    drop table #t

    create table #t (id int, st char(1), date datetime)

    declare @n int

    declare @st char(1)

    declare @date datetime

    declare @cp float

    declare @p float

    set @n = 1000 -- <<<<<<<<<<<<<<<<< How many ID chains to generate

    print convert(varchar(5),@n)+' records generated'

    while @n > 0

    begin

    set @st = 'F'

    set @date = dateadd(mi,rand()*1440.0*28,convert(datetime,'2008/2/1'))

    insert #t (id,st,date) values(@n,@st,@date)

    set @cp = 0.3

    while @st <> 'C'

    begin

    if @st = 'F'

    begin

    set @p = rand()

    if @p < @cp

    set @st = 'C'

    else

    if @p < 0.99

    set @st = 'P'

    end

    else

    begin

    set @p = rand(1)

    if @p < 0.9

    set @st = 'F'

    end

    set @date = dateadd(mi,rand()*1440*3+5,@date)

    insert #t (id,st,date) values(@n,@st,@date)

    set @cp = @cp * 1.1

    end

    set @n = @n -1

    end

    select * from #t

    set nocount off

    set statistics io on

    set statistics time on

    --------------------------------------------

    -- version 1 - triangular join

    ------------------------------

    -- This is the method currently used and

    -- produces the desired result

    --

    select

    id,

    sta,

    min(da) as 'da',

    stb,

    db

    from (

    select

    a.id,

    a.sta,

    a.da,

    b.st as 'stb',

    b.date as db

    from (

    select

    id,

    sta,

    da,

    min(db) as 'db'

    from (

    select

    a.id,

    a.st as 'sta',

    a.date as 'da',

    b.st as 'stb',

    min(b.date) as 'db'

    from

    #t a

    join

    #t b

    on a.id=b.id

    and a.date < b.date

    and a.st <> b.st

    group by

    a.id,

    a.st,

    a.date,

    b.st

    ) x

    group by

    id,

    sta,

    da

    ) a

    join

    #t b

    on a.id=b.id

    and a.db=b.date

    ) x

    group by

    id,

    sta,

    stb,

    db

    order by

    id,

    min(da)

    --------------------------------------------

    -- version 2 - using self join by row_number + 1 = row_number

    ------------

    -- this was derived from comments following the article at

    -- http://www.sqlteam.com/article/joining-to-the-next-sequential-row

    --

    ;with t1 as (

    select

    row_number() over(order by id,date) as 'r_id',

    id,

    st,

    date

    from

    #t

    )

    ,t4 as (

    select

    row_number() over(order by id,date) as 'r_id',

    id,

    st,

    date

    from (

    select top 1

    id,

    st,

    date

    from

    t1

    order by

    id,

    date

    union all

    select

    id,

    st,

    date

    from (

    select

    b.id,

    b.st,

    b.date

    from

    t1 a

    join

    t1 b

    on a.r_id+1=b.r_id

    where

    (a.id=b.id and a.st<>b.st)

    or(a.id<>b.id)

    ) z

    ) x

    )

    select

    a.id,

    a.st as 'sta',

    a.date as 'da',

    b.st as 'stb',

    b.date as 'db'

    from

    t4 a

    join

    t4 b

    on a.r_id+1=b.r_id

    and a.id=b.id

    order by

    id,

    da

    set statistics time off

    set statistics io off

    Derek

  • I could probably help, if you provide the table structure, including indexes, some sample data, and the query you're using.

    Without that, I would hazard a guess that the difference in performance at a specific number of rows is probably due to (a) the statistics being used, (b) the index(es) being used, and/or (c) the join being used.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (4/11/2008)


    I could probably help, if you provide the table structure, including indexes, some sample data, and the query you're using.

    Did you read the script? I thought I put everything in there.

    The code section starts with the test results for 100, 500 and 1000 records.

    Then there's a large comment describing what the model is simulating.

    The first executable code creates a temporary table with sample data.

    The next step (2 nested loops) fill it with sample data.

    Then there's a query (labelled 'Version 1') which uses a simplified version of the existing query.

    Then finally the last part ('Version 2') tries the new query.

    I've not bothered to mention the 'set statistics...' and 'set nocount...' statements; they occur at the beginning, after the sample creation and at the end.

    What is missing?

    I haven't tried any indexes yet as I was so surprised by the query differences so there aren't any at present. Obviously, they might improve the performance of either version.

    Derek

  • I didn't realize the whole thing was being done on a temp table. Sorry about that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Can you please add the parameters and settings on your tempdb? Could it be linked to the fact that you are simply hitting an "auto grow" of your tempdb during the statement execution?

  • I used your temp table and test data on 9000 rows, 1000 IDs in #t.

    Here's the query I wrote, based on the description you gave in the original post:

    ;with CTE (Row, ID, St, Date) as

    (select dense_rank() over (partition by id order by date),

    id, st, date

    from #t)

    select t1.ID, t1.St, t1.Date, t2.St, t2.Date,

    datediff(minute, t1.date, t2.date)

    from cte t1

    left outer join cte t2

    on t1.id = t2.id

    and t1.row = t2.row - 1

    order by t1.id, t1.date

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    (9000 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 2, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 78 ms, elapsed time = 292 ms.

    With 100,000 IDs, I had to modify the query (my desktop computer can't handle displaying the results of a 900,000 row query without running out of memory), to:

    ;with CTE (Row, ID, St, Date) as

    (select dense_rank() over (partition by id order by date),

    id, st, date

    from #t)

    select count(*)

    from cte t1

    left outer join cte t2

    on t1.id = t2.id

    and t1.row = t2.row - 1

    And got:

    SQL Server parse and compile time:

    CPU time = 16 ms, elapsed time = 24 ms.

    (1 row(s) affected)

    Table '#t'. Scan count 6, logical reads 5144, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 7267 ms, elapsed time = 5087 ms.

    Please test this and see if it gets you the kind of output you need.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Glen Sidelnikov (4/11/2008)


    Can you please add the parameters and settings on your tempdb? Could it be linked to the fact that you are simply hitting an "auto grow" of your tempdb during the statement execution?

    I'm at home so can't easily check this (it's Saturday) but don't see how this could be the case.

    If tempdb were to 'auto grow', doesn't it t it then stay extended? The effect I get is that rthe 712-714 step is reproducible if I do a series of tests.

    712 takes ~300ms

    713 takes ~300ms

    714 takes ~25000ms

    713 takes ~25000ms

    712 takes ~300ms

    713 takes ~300ms

    ...and so on.

    There seems to be some memory effect.

    My suspicion is that it changes choice of execution plan to one that's much worse, and then uses it again on the 'down to 713 test' because it's cached. I intend to check this on monday.

    Derek

  • GSquared (4/11/2008)


    I used your temp table and test data on 9000 rows, 1000 IDs in #t.

    Here's the query I wrote, based on the description you gave in the original post:

    ;with CTE (Row, ID, St, Date) as

    (select dense_rank() over (partition by id order by date),

    id, st, date

    from #t)

    select t1.ID, t1.St, t1.Date, t2.St, t2.Date,

    datediff(minute, t1.date, t2.date)

    from cte t1

    left outer join cte t2

    on t1.id = t2.id

    and t1.row = t2.row - 1

    order by t1.id, t1.date

    Sorry, my fault. I've been working with this problem so long I missed out one of the key requirements in the description, although not in the code of either of the query versions.

    The only datetime differences that are wanted are those between one state and the next different state.

    So, given an id with the following states...

    insert #t values(37,'F','2008-02-05 09:37')

    insert #t values(37,'F','2008-02-05 09:54')

    insert #t values(37,'P','2008-02-05 10:01')

    insert #t values(37,'P','2008-02-05 10:17')

    insert #t values(37,'P','2008-02-05 10:22')

    insert #t values(37,'F','2008-02-05 10:32')

    insert #t values(37,'F','2008-02-05 10:36')

    insert #t values(37,'C','2008-02-05 10:42')

    The result should be

    ID sta da stb db

    -- --- ---------------- --- ----------------

    37 F 2008-02-05 09:37 P 2008-02-05 10:01

    37 P 2008-02-05 10:01 F 2008-02-05 10:32

    37 F 2008-02-05 10:32 C 2008-02-05 10:42

    Your query starts the next step of the process which is to take the time differences, except that it hasn't eliminated the 'update' entries which don't change state. This is what complicates this part of the solution.

    In reality, the differences aren't simple datediffs (which is why I left them out), but actually need to take account of working days, taking account of public holidays in the country the users are, and working hours, adjusted for the local timezones. All this logic works, so has been omitted.

    I didn't realize the whole thing was being done on a temp table. Sorry about that.

    The real table isn't a temp table, obviously, but it also contains a lot of extra fields that aren't relevant to getting this query to work, such as company code, user name (needed to get the country and timezone), vendor, amount, etc.

    In addition, there are 7 states not 3 (F1, F2, F3, P1, P2, P3, CP) but while substring(st,1,1) will add overhead, it's not relevant to the logic, so I simply stored a char(1) value.

    The full version of the query shown as version 1 is actually at the core of a larger query which does all the time differencing plus adding other required information.

    The real table also has about 15000 records per month in it and contains at least 15 months data (currently about 40 months) and thus takes several minutes to run.

    For the purpose of getting the logic of the query right, I stripped the data down to a subset which illustrates it. This is about as small as I could make it.

    Obviously, I've been here long enough that I don't expect people to write my queries for me, what I'm looking for are some pointers as to the fastest way this can be done.

    I'm quite happy to make the temp table into a permanent one, add any indexes or whatever if it will help speed up the query. I should then be able to scale it up to the real table and check it works there. unfortunately, I have a suspicion that the triangular/diagonal join may actually be fastest.

    Derek

  • Hi Gsquared,

    Thanks for the hint.

    I tried implementing your use of dense_rank instead of row_number and located an omission in my code. In my version 2, CTE t4 should have group by b.id, b.st, b.date just before the close of subquery z. Without it, multiple rows are produced which get eliminated later, but add excessive overhead.

    With that added, and including a version using dense_rank, I get, for 100000 records.

    (403358 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 3, logical reads 4278, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 7687 ms, elapsed time = 71055 ms.

    (403358 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 6, logical reads 5714, physical reads 0, read-ahead reads 3, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 6266 ms, elapsed time = 64598 ms.

    (322812 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 6, logical reads 5714, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 6094 ms, elapsed time = 54511 ms.

    (This test is on my laptop rather than the server so it takes longer!)

    The conclusion is that dense_rank is slightly faster than row_number and both are about 20% faster than the triangular/diagonal join.

    On Monday, I'll see if this improves the 'real' code. 🙂

    Derek

  • Try this one...

    ;WITH Yak (id, st, date, rowid, col)

    AS (

    SELECTid,

    st,

    date,

    ROW_NUMBER() OVER (ORDER BY id, date),

    0

    FROM#t

    )

    SELECTid,

    MAX(CASE WHEN col = 0 THEN st ELSE '' END) AS sta,

    MAX(CASE WHEN col = 0 THEN date ELSE '' END) AS da,

    MAX(CASE WHEN col = 1 THEN st ELSE '' END) AS stb,

    MAX(CASE WHEN col = 1 THEN date ELSE '' END) AS db

    FROM(

    select id, st, date, rowid, col from Yak

    union all

    select id, st, date, rowid - 1, col + 1 from Yak

    ) AS d

    group byid,

    rowid

    havingcount(*) = 2

    order byrowid

    Results for 1000 sample records

    derek 1 - triangular join

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 3, logical reads 78, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 141 ms, elapsed time = 1518 ms.

    derek 2 - cte's

    Table '#t'. Scan count 16, logical reads 208, physical reads 0, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 4, logical reads 702520, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 55094 ms, elapsed time = 121561 ms.

    peso

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 2, logical reads 52, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 565 ms.


    N 56°04'39.16"
    E 12°55'05.25"

  • Sometimes, "old school" works better...

    SET NOCOUNT ON

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT IDENTITY(INT,1,1) AS RowNum,

    ID,ST,Date

    INTO #MyHead

    FROM #t

    ORDER BY ID,DATE

    SELECT t1.ID,

    t1.ST AS STa,

    t1.Date AS DateA,

    t2.ST AS STb,

    t2.Date AS DateB

    FROM #MyHead t1

    LEFT OUTER JOIN #MyHead t2

    ON t1.RowNum+1 = t2.RowNum

    WHERE t2.ST IS NOT NULL

    ORDER BY t1.ID

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    SET NOCOUNT OFF

    DROP TABLE #MyHead

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Add a

    AND t1.id = t2.id

    and you produce the same result too!


    N 56°04'39.16"
    E 12°55'05.25"

  • Both Jeff's and Peso's queries do not eliminate pairs where the status stays the same. Additionally, as Peso pointed out, Jeff didn't remove the 'pairs' where the ids don't match. After fixing Jeff's ID match, I get for 10000 samples...

    derek 1 (diagonal join)

    (39972 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 3, logical reads 426, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 625 ms, elapsed time = 7687 ms.


    derek 2 (row_number)

    (39972 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 6, logical reads 576, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 609 ms, elapsed time = 6790 ms.


    derek 3 (dense_rank)

    (39972 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 6, logical reads 576, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 640 ms, elapsed time = 8898 ms.


    peso

    (41864 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 2, logical reads 286, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 391 ms, elapsed time = 5726 ms.


    Jeff (copy)

    Table '#t'. Scan count 1, logical reads 143, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 47 ms, elapsed time = 442 ms.

    (51864 row(s) affected)

    Jeff (query)

    (41864 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#MyHead'. Scan count 2, logical reads 336, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 4826 ms.

    The number of rows is higher for the last 2 for the reason given.

    At the moment it looks like a join on row_number() over(order by id,date) is fastest.

    Derek

  • Peso (2)

    create table #t (id int not null, st char(1) not null, date datetime not null, seq int null)

    --

    --.... main fill omitted

    --

    DECLARE@Seq INT,

    @Yak INT,

    @st CHAR(1)

    SET@Seq = -1

    UPDATE#t

    SET@Seq = @Seq + 1,

    @Yak = Seq =CASE

    WHEN st = @st THEN NULL

    ELSE @Seq

    END,

    @Seq =CASE

    WHEN @Yak IS NULL THEN @Seq - 1

    ELSE @Seq

    END,

    @st = st

    SELECTid,

    MAX(CASE WHEN Col = 0 THEN st ELSE '' END) AS sta,

    MAX(CASE WHEN Col = 0 THEN date ELSE '' END) AS da,

    MAX(CASE WHEN Col = 1 THEN st ELSE '' END) AS stb,

    MAX(CASE WHEN Col = 1 THEN date ELSE '' END) AS db

    FROM(

    SELECTid,

    st,

    date,

    Seq,

    0 AS Col

    FROM#t

    WHERESeq IS NOT NULL

    UNION ALL

    SELECTid,

    st,

    date,

    Seq - 1,

    1

    FROM#t

    WHERESeq IS NOT NULL

    ) AS d

    GROUP BYid,

    Seq

    HAVINGCOUNT(*) = 2

    ORDER BYid,

    MAX(CASE WHEN Col = 0 THEN date ELSE '' END)

    For 10000 samples...

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#t'. Scan count 1, logical reads 171, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 110 ms, elapsed time = 823 ms.

    (52533 row(s) affected)

    (40584 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#t'. Scan count 2, logical reads 340, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 313 ms, elapsed time = 6886 ms.

    Including the update, this is 50% faster. Excluding the update, it's 100% faster. The only thing I'm dubious about is whether the update will always get the records in the right order? UPDATE...ORDER BY isn't legal SQL, unfortunately. 🙂

    Derek

  • Derek Dongray (4/14/2008)


    Both Jeff's and Peso's queries do not eliminate pairs where the status stays the same.

    Sorry... with all the other stuff, I missed that requirement altogether.

    Additionally, as Peso pointed out, Jeff didn't remove the 'pairs' where the ids don't match. After fixing Jeff's ID match, I get for 10000 samples...

    Correct... I flat forgot to add that to the criteria.

    So, you all set now or ???

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply