Curly question I can't work out without cursors!

  • I'm hoping Jeff Modem can help me on this one 🙂

    What I've got is a table that tells me when a clock is stopped or started on every entry. What I want to do is get the first start entry, then ignore the next start entries until I get a stop entry, then ignore the next stop entries until I get a start entry - ad nauseum.

    I have no idea how you would do this without using a cursor 🙁

    To make it easier, I have an easy temporary table that a solution could be derived from:

    drop table #TempTable

    create table #TempTable (ref int, StartStop varchar (5))

    insert into #TempTable values (1, 'Start')

    insert into #TempTable values (2, 'Start')

    insert into #TempTable values (3, 'Stop')

    insert into #TempTable values (4, 'Stop')

    insert into #TempTable values (5, 'Start')

    insert into #TempTable values (6, 'Stop')

    insert into #TempTable values (7, 'Start')

    insert into #TempTable values (8, 'Start')

    So my query should return:

    1, Start

    3, Stop

    5, Start

    6, Stop

    7, Start

    How would I achieve this?

    Random Technical Stuff[/url]

  • Hi,

    try this

    select * from

    (

    select a.ref,

    (case when (select b.StartStop from #TempTable b where b.ref = a.ref -1)= a.StartStop

    then 'ERROR' else StartStop end)StartStop

    from #TempTable a

    ) as X

    where StartStop 'ERROR'

  • Thank you so much!

    I think I see the logic...

    It looks like my table has NULLs as well... where a NULL is the same as 'Start'. I'm terribly sorry if this is too much bother, but I've been racking my brains for the past 15 minutes to work out how to implement the same thing...

    drop table #TempTable

    create table #TempTable (ref int, StartStop varchar (5))

    insert into #TempTable values (1, NULL)

    insert into #TempTable values (2, NULL)

    insert into #TempTable values (3, 'Start')

    insert into #TempTable values (4, 'Start')

    insert into #TempTable values (5, 'Stop')

    insert into #TempTable values (6, 'Stop')

    insert into #TempTable values (7, 'Start')

    insert into #TempTable values (8, 'Stop')

    insert into #TempTable values (9, 'Start')

    insert into #TempTable values (10, 'Start')

    Do you have any suggestions? Or pointers on how to solve this?

    Random Technical Stuff[/url]

  • select * from

    (

    select a.ref,

    (case when (select isnull(b.StartStop,'Start') from #TempTable b where b.ref = a.ref -1)= isnull(a.StartStop,'Start')

    then 'ERROR' else isnull(a.StartStop,'Start') end)StartStop

    from #TempTable a

    ) as X

    where StartStop 'ERROR'

  • arun.sas - you are an absolute legend - I don't think I can thank you enough!

    Random Technical Stuff[/url]

  • However we are waiting for jeff mark on this (any performance issue?)

  • Funnily enough - compared to the last thing I was looking at it's a dream of a query. 🙂

    I think that the isnulls will probably cause a bit of RBAR, but then that's a problem with the database and not really a fault of SQL. I'm seriously considering asking them to run an update to convert all the NULLs to Start.

    Obviously the table that I'm looking at it slightly different - but not by much 🙂 and there's a clustered index on the ref value, so it works pretty well I think...

    But I would love to see what Jeff does with this.

    I'm curious where you learned your magic though... I've been reading everything I can get my hands on to try to replace cursors. I'm actually trying to do a running update, but I have to subtract the date of the time the clock first started from the time the clock stopped...

    In other words, the table actually is like this:

    drop table #TempTable

    create table #TempTable (ref int, StartStop varchar (5), theDate datetime)

    insert into #TempTable values (1, 'Start', '2009-01-01')

    insert into #TempTable values (3, 'Stop', '2009-01-05')

    insert into #TempTable values (4, 'Stop', '2009-01-05')

    insert into #TempTable values (5, 'Start', '2009-02-04')

    insert into #TempTable values (6, 'Stop', '2009-03-04')

    insert into #TempTable values (7, 'Start', '2009-03-12')

    insert into #TempTable values (8, 'Stop', '2009-03-14')

    I'm still working on that 🙂

    I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.

    Random Technical Stuff[/url]

  • ta.bu.shi.da.yu (8/3/2009)


    I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.

    I hope that you're saying that as a joke. There is no reason to resort to a RBAR solution. Here is a set-based solution.

    WITH StartStop AS (

    SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate

    FROM #TempTable

    )

    SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days

    FROM StartStop AS a

    LEFT OUTER JOIN StartStop AS b

    ON a.Ref <= b.Ref

    AND a.StartStop b.StartStop

    WHERE a.StartStop = 'Start'

    GROUP BY a.Ref, a.theDate

    I learned a lot of my advanced SQL from the "T-SQL Black Belt" column in SQL Server Magazine.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (8/3/2009)


    ta.bu.shi.da.yu (8/3/2009)


    I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.

    I hope that you're saying that as a joke. There is no reason to resort to a RBAR solution. Here is a set-based solution.

    WITH StartStop AS (

    SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate

    FROM #TempTable

    )

    SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days

    FROM StartStop AS a

    LEFT OUTER JOIN StartStop AS b

    ON a.Ref <= b.Ref

    AND a.StartStop b.StartStop

    WHERE a.StartStop = 'Start'

    GROUP BY a.Ref, a.theDate

    I learned a lot of my advanced SQL from the "T-SQL Black Belt" column in SQL Server Magazine.

    Drew

    Oh man... all you guys are awesome 🙂

    Random Technical Stuff[/url]

  • WITH StartStop AS (

    SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate

    FROM #TempTable

    )

    SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days

    FROM StartStop AS a

    LEFT OUTER JOIN StartStop AS b

    ON a.Ref <= b.Ref

    AND a.StartStop b.StartStop

    WHERE a.StartStop = 'Start'

    GROUP BY a.Ref, a.theDate

    This won't quite work if there are multiple consecutive "Start" rows, since the requirement was to ignore duplicate "Start" rows that immediately follow another "Start" row. Also, the triangular join will perform poorly if there are many rows.

    Here is an alternative query that uses the difference beween two ROW_NUMBER functions to help identify consecutive sequences with the same status (0 = Start, 1 = Stop), then uses another ROW_NUMBER function to join start-stop row pairs for which to calculate the date difference. This query will perform significantly faster than the above query if the number of rows is large (>10000).

    ;WITH cteStage1 AS (

    SELECT

    ref,

    CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END AS status,

    ROW_NUMBER() OVER (ORDER BY ref)

    - ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn

    FROM #TempTable

    ),

    cteStage2 AS (

    SELECT S.ref, S.status, T.theDate,

    ROW_NUMBER() OVER (ORDER BY S.ref) AS rn

    FROM #TempTable T

    INNER JOIN (

    SELECT MIN(ref) AS ref, status

    FROM cteStage1

    GROUP BY status, cn

    ) S ON (S.ref = T.ref)

    )

    SELECT

    T0.ref AS StartRef,

    T0.theDate AS StartDate,

    T1.ref AS StopRef,

    T1.theDate AS StopDate,

    DATEDIFF(day, T0.theDate, T1.theDate) AS Days

    FROM cteStage2 T0

    INNER JOIN cteStage2 T1 ON (T0.rn = T1.rn - 1)

    WHERE (T0.status = 0 AND T1.status = 1)

    ORDER BY T0.ref

  • andrewd.smith (8/3/2009)


    WITH StartStop AS (

    SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate

    FROM #TempTable

    )

    SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days

    FROM StartStop AS a

    LEFT OUTER JOIN StartStop AS b

    ON a.Ref <= b.Ref

    AND a.StartStop b.StartStop

    WHERE a.StartStop = 'Start'

    GROUP BY a.Ref, a.theDate

    This won't quite work if there are multiple consecutive "Start" rows, since the requirement was to ignore duplicate "Start" rows that immediately follow another "Start" row. Also, the triangular join will perform poorly if there are many rows.

    Here is an alternative query that uses the difference beween two ROW_NUMBER functions to help identify consecutive sequences with the same status (0 = Start, 1 = Stop), then uses another ROW_NUMBER function to join start-stop row pairs for which to calculate the date difference. This query will perform significantly faster than the above query if the number of rows is large (>10000).

    ;WITH cteStage1 AS (

    SELECT

    ref,

    CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END AS status,

    ROW_NUMBER() OVER (ORDER BY ref)

    - ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn

    FROM #TempTable

    ),

    cteStage2 AS (

    SELECT S.ref, S.status, T.theDate,

    ROW_NUMBER() OVER (ORDER BY S.ref) AS rn

    FROM #TempTable T

    INNER JOIN (

    SELECT MIN(ref) AS ref, status

    FROM cteStage1

    GROUP BY status, cn

    ) S ON (S.ref = T.ref)

    )

    SELECT

    T0.ref AS StartRef,

    T0.theDate AS StartDate,

    T1.ref AS StopRef,

    T1.theDate AS StopDate,

    DATEDIFF(day, T0.theDate, T1.theDate) AS Days

    FROM cteStage2 T0

    INNER JOIN cteStage2 T1 ON (T0.rn = T1.rn - 1)

    WHERE (T0.status = 0 AND T1.status = 1)

    ORDER BY T0.ref

    Oh boy... my head feels like it's about to explode...

    I'm confused by the first bit of the CTE... what does the following do?

    ROW_NUMBER() OVER (ORDER BY ref)

    - ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn

    Random Technical Stuff[/url]

  • I'm confused by the first bit of the CTE... what does the following do?

    ROW_NUMBER() OVER (ORDER BY ref)

    - ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn

    The combination of the numeric difference between the 2 ROW_NUMBER functions [cn] and the expression over which the 2nd ROW_NUMBER function is partitioned [status] uniquely identifies each consecutive sequence of rows with the same value of [status] when ordered by the ref column.

    Once we have calculated the values of [cn] and [status] in the CTE, we can group by these columns and easly find the first (i.e. minimum ref column value) row in each consecutive sequence.

    This ROW_NUMBER difference method used to solve the problem of combining sequential rows originated from the following article, or at least that's where I first saw it used.

    http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html

  • Sorry... I missed something... had to take the code down from this post... I'll be back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And this was an accidental double post from above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nah... I didn't miss anything... Andrew not only did it the way I would have, he even cited the same URL I was going to cite where I first saw the trick of using the difference between two row numbers.

    Arun's code worked fine but only if the Ref #'s were sequential.

    I have done performance testing in the past on the method Andrew used... even the quirky update won't beat it. It comes close but it was edged out.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply