Curly question I can't work out without cursors!

Question

Post reply

Curly question I can't work out without cursors!

ta.bu.shi.da.yu

Hall of Fame

Points: 3985
More actions
August 3, 2009 at 2:16 am

#385101

I'm hoping Jeff Modem can help me on this one 🙂
What I've got is a table that tells me when a clock is stopped or started on every entry. What I want to do is get the first start entry, then ignore the next start entries until I get a stop entry, then ignore the next stop entries until I get a start entry - ad nauseum.
I have no idea how you would do this without using a cursor 🙁
To make it easier, I have an easy temporary table that a solution could be derived from:
drop table #TempTable
create table #TempTable (ref int, StartStop varchar (5))
insert into #TempTable values (1, 'Start')
insert into #TempTable values (2, 'Start')
insert into #TempTable values (3, 'Stop')
insert into #TempTable values (4, 'Stop')
insert into #TempTable values (5, 'Start')
insert into #TempTable values (6, 'Stop')
insert into #TempTable values (7, 'Start')
insert into #TempTable values (8, 'Start')
So my query should return:
1, Start
3, Stop
5, Start
6, Stop
7, Start
How would I achieve this?
Random Technical Stuff[/url]

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

arun.sas SSChampion Points: 11831 More actions · Answer 1

Hi,

try this

select * from

(

select a.ref,

(case when (select b.StartStop from #TempTable b where b.ref = a.ref -1)= a.StartStop

then 'ERROR' else StartStop end)StartStop

from #TempTable a

) as X

where StartStop 'ERROR'

ta.bu.shi.da.yu Hall of Fame Points: 3985 More actions · Answer 2

Thank you so much!

I think I see the logic...

It looks like my table has NULLs as well... where a NULL is the same as 'Start'. I'm terribly sorry if this is too much bother, but I've been racking my brains for the past 15 minutes to work out how to implement the same thing...

drop table #TempTable

create table #TempTable (ref int, StartStop varchar (5))

insert into #TempTable values (1, NULL)

insert into #TempTable values (2, NULL)

insert into #TempTable values (3, 'Start')

insert into #TempTable values (4, 'Start')

insert into #TempTable values (5, 'Stop')

insert into #TempTable values (6, 'Stop')

insert into #TempTable values (7, 'Start')

insert into #TempTable values (8, 'Stop')

insert into #TempTable values (9, 'Start')

insert into #TempTable values (10, 'Start')

Do you have any suggestions? Or pointers on how to solve this?

Random Technical Stuff[/url]

arun.sas SSChampion Points: 11831 More actions · Answer 3

select * from

(

select a.ref,

(case when (select isnull(b.StartStop,'Start') from #TempTable b where b.ref = a.ref -1)= isnull(a.StartStop,'Start')

then 'ERROR' else isnull(a.StartStop,'Start') end)StartStop

from #TempTable a

) as X

where StartStop 'ERROR'

ta.bu.shi.da.yu Hall of Fame Points: 3985 More actions · Answer 4

arun.sas - you are an absolute legend - I don't think I can thank you enough!

Random Technical Stuff[/url]

arun.sas SSChampion Points: 11831 More actions · Answer 5

However we are waiting for jeff mark on this (any performance issue?)

ta.bu.shi.da.yu Hall of Fame Points: 3985 More actions · Answer 6

Funnily enough - compared to the last thing I was looking at it's a dream of a query. 🙂

I think that the isnulls will probably cause a bit of RBAR, but then that's a problem with the database and not really a fault of SQL. I'm seriously considering asking them to run an update to convert all the NULLs to Start.

Obviously the table that I'm looking at it slightly different - but not by much 🙂 and there's a clustered index on the ref value, so it works pretty well I think...

But I would love to see what Jeff does with this.

I'm curious where you learned your magic though... I've been reading everything I can get my hands on to try to replace cursors. I'm actually trying to do a running update, but I have to subtract the date of the time the clock first started from the time the clock stopped...

In other words, the table actually is like this:

drop table #TempTable

create table #TempTable (ref int, StartStop varchar (5), theDate datetime)

insert into #TempTable values (1, 'Start', '2009-01-01')

insert into #TempTable values (3, 'Stop', '2009-01-05')

insert into #TempTable values (4, 'Stop', '2009-01-05')

insert into #TempTable values (5, 'Start', '2009-02-04')

insert into #TempTable values (6, 'Stop', '2009-03-04')

insert into #TempTable values (7, 'Start', '2009-03-12')

insert into #TempTable values (8, 'Stop', '2009-03-14')

I'm still working on that 🙂

I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.

Random Technical Stuff[/url]

drew.allen SSC Guru Points: 76944 More actions · Answer 7

ta.bu.shi.da.yu (8/3/2009)
I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.

I hope that you're saying that as a joke. There is no reason to resort to a RBAR solution. Here is a set-based solution.

WITH StartStop AS (

SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate

FROM #TempTable

)

SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days

FROM StartStop AS a

LEFT OUTER JOIN StartStop AS b

ON a.Ref <= b.Ref

AND a.StartStop b.StartStop

WHERE a.StartStop = 'Start'

GROUP BY a.Ref, a.theDate

I learned a lot of my advanced SQL from the "T-SQL Black Belt" column in SQL Server Magazine.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

ta.bu.shi.da.yu Hall of Fame Points: 3985 More actions · Answer 8

drew.allen (8/3/2009)
ta.bu.shi.da.yu (8/3/2009)
I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.
I hope that you're saying that as a joke. There is no reason to resort to a RBAR solution. Here is a set-based solution.
WITH StartStop AS (
SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate
FROM #TempTable
)
SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days
FROM StartStop AS a
LEFT OUTER JOIN StartStop AS b
ON a.Ref <= b.Ref
AND a.StartStop b.StartStop
WHERE a.StartStop = 'Start'
GROUP BY a.Ref, a.theDate
I learned a lot of my advanced SQL from the "T-SQL Black Belt" column in SQL Server Magazine.
Drew

Oh man... all you guys are awesome 🙂

Random Technical Stuff[/url]

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 9

WITH StartStop AS (
SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate
FROM #TempTable
)
SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days
FROM StartStop AS a
LEFT OUTER JOIN StartStop AS b
ON a.Ref <= b.Ref
AND a.StartStop b.StartStop
WHERE a.StartStop = 'Start'
GROUP BY a.Ref, a.theDate

This won't quite work if there are multiple consecutive "Start" rows, since the requirement was to ignore duplicate "Start" rows that immediately follow another "Start" row. Also, the triangular join will perform poorly if there are many rows.

Here is an alternative query that uses the difference beween two ROW_NUMBER functions to help identify consecutive sequences with the same status (0 = Start, 1 = Stop), then uses another ROW_NUMBER function to join start-stop row pairs for which to calculate the date difference. This query will perform significantly faster than the above query if the number of rows is large (>10000).

;WITH cteStage1 AS (

SELECT

ref,

CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END AS status,

ROW_NUMBER() OVER (ORDER BY ref)

- ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn

FROM #TempTable

),

cteStage2 AS (

SELECT S.ref, S.status, T.theDate,

ROW_NUMBER() OVER (ORDER BY S.ref) AS rn

FROM #TempTable T

INNER JOIN (

SELECT MIN(ref) AS ref, status

FROM cteStage1

GROUP BY status, cn

) S ON (S.ref = T.ref)

)

SELECT

T0.ref AS StartRef,

T0.theDate AS StartDate,

T1.ref AS StopRef,

T1.theDate AS StopDate,

DATEDIFF(day, T0.theDate, T1.theDate) AS Days

FROM cteStage2 T0

INNER JOIN cteStage2 T1 ON (T0.rn = T1.rn - 1)

WHERE (T0.status = 0 AND T1.status = 1)

ORDER BY T0.ref

ta.bu.shi.da.yu Hall of Fame Points: 3985 More actions · Answer 10

andrewd.smith (8/3/2009)
WITH StartStop AS (
SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate
FROM #TempTable
)
SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days
FROM StartStop AS a
LEFT OUTER JOIN StartStop AS b
ON a.Ref <= b.Ref
AND a.StartStop b.StartStop
WHERE a.StartStop = 'Start'
GROUP BY a.Ref, a.theDate
This won't quite work if there are multiple consecutive "Start" rows, since the requirement was to ignore duplicate "Start" rows that immediately follow another "Start" row. Also, the triangular join will perform poorly if there are many rows.
Here is an alternative query that uses the difference beween two ROW_NUMBER functions to help identify consecutive sequences with the same status (0 = Start, 1 = Stop), then uses another ROW_NUMBER function to join start-stop row pairs for which to calculate the date difference. This query will perform significantly faster than the above query if the number of rows is large (>10000).
;WITH cteStage1 AS (
SELECT
ref,
CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END AS status,
ROW_NUMBER() OVER (ORDER BY ref)
- ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn
FROM #TempTable
),
cteStage2 AS (
SELECT S.ref, S.status, T.theDate,
ROW_NUMBER() OVER (ORDER BY S.ref) AS rn
FROM #TempTable T
INNER JOIN (
SELECT MIN(ref) AS ref, status
FROM cteStage1
GROUP BY status, cn
) S ON (S.ref = T.ref)
)
SELECT
T0.ref AS StartRef,
T0.theDate AS StartDate,
T1.ref AS StopRef,
T1.theDate AS StopDate,
DATEDIFF(day, T0.theDate, T1.theDate) AS Days
FROM cteStage2 T0
INNER JOIN cteStage2 T1 ON (T0.rn = T1.rn - 1)
WHERE (T0.status = 0 AND T1.status = 1)
ORDER BY T0.ref

Oh boy... my head feels like it's about to explode...

I'm confused by the first bit of the CTE... what does the following do?

ROW_NUMBER() OVER (ORDER BY ref)

- ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn

Random Technical Stuff[/url]

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 11

I'm confused by the first bit of the CTE... what does the following do?
ROW_NUMBER() OVER (ORDER BY ref)
- ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn

The combination of the numeric difference between the 2 ROW_NUMBER functions [cn] and the expression over which the 2nd ROW_NUMBER function is partitioned [status] uniquely identifies each consecutive sequence of rows with the same value of [status] when ordered by the ref column.

Once we have calculated the values of [cn] and [status] in the CTE, we can group by these columns and easly find the first (i.e. minimum ref column value) row in each consecutive sequence.

This ROW_NUMBER difference method used to solve the problem of combining sequential rows originated from the following article, or at least that's where I first saw it used.

http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 12

Sorry... I missed something... had to take the code down from this post... I'll be back.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 13

And this was an accidental double post from above.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 14

Nah... I didn't miss anything... Andrew not only did it the way I would have, he even cited the same URL I was going to cite where I first saw the trick of using the difference between two row numbers.

Arun's code worked fine but only if the Ref #'s were sequential.

I have done performance testing in the past on the method Andrew used... even the quirky update won't beat it. It comes close but it was edged out.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)