Date Range of occurences of consecutive String Values?

Question

Post reply

Date Range of occurences of consecutive String Values?

pat-1085820

SSC Rookie

Points: 29
More actions
August 5, 2009 at 8:21 am

#72405

Hi, I'm a little stumped with this problem. I am querying data from a HelpDesk Ticket system in SQL 2000. I need to find the date range each issue has spent in a particular phase of development. The part I'm having trouble with is that a ticket can go from 'open' to 'closed', back to 'open'...and I need to capture those 2 'Open' date ranges separately, or the results will be skewed.
I have a table with the following structure and sample data. for each issuenumber, whenever a comment is added to this ticketing system, another row is added with the issuenumber, status, activitydate, and activityseq number.
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#projectactivity','U') IS NOT NULL
DROP TABLE #projectactivity
--===== Create the test table with
CREATE TABLE #projectactivity
(
IssueNumber INT ,
ActivityDate DATETIME,
ActivitySeq INT,
Status varchar (25)
)
INSERT INTO #projectactivity (IssueNumber, ActivityDate, ActivitySeq, Status)
SELECT '6127','Mar 28 2007 10:11AM','1','Open' UNION ALL
SELECT '6127','Mar 28 2007 10:13AM','14','Open' UNION ALL
SELECT '6127','Mar 29 2007 8:03AM','16','Open' UNION ALL
SELECT '6127','Mar 25 2008 1:36PM','18','Open' UNION ALL
SELECT '6127','Apr 3 2008 10:15PM','20','Open' UNION ALL
SELECT '6127','Apr 14 2008 2:31PM','23','Open' UNION ALL
SELECT '6127','Jun 4 2008 1:42PM','25','Open' UNION ALL
SELECT '6127','Jul 3 2008 4:23PM','27','Cancelled' UNION ALL
SELECT '6127','Apr 9 2009 9:39AM','28','Cancelled' UNION ALL
SELECT '6127','Apr 9 2009 9:43AM','29','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:30PM','33','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:31PM','34','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:36PM','35','Open' UNION ALL
SELECT '6127','May 6 2009 12:35PM','36','Open' UNION ALL
SELECT '6127','May 28 2009 11:28AM','38','Pending Requirements' UNION ALL
SELECT '6127','Jun 1 2009 8:59AM','39','Define Requirements' UNION ALL
SELECT '6127','Jun 2 2009 8:54AM','43','Analysis Design' UNION ALL
SELECT '6127','Jun 3 2009 11:55AM','44','Research' UNION ALL
SELECT '6127','Jul 2 2009 8:31AM','45','Research'
Something like the following would give me incorrect results because it would group all the 'Open' Status Ranges together, when in fact I want to have 3 separate date ranges with an 'Open' Status for this particular issue.
--==== Using the Min and Max ActivityDate for each IssueNumber in the ProjectActivity table
--==== allows for selecting the date range an Issue spent in each status
SELECT
IssueNumber,
Status,
MIN(ActivityDate) as ActivityDate_Start,
MAX(ActivityDate) as ActivityDate_Finish
FROM #projectactivity
GROUP BY
IssueNumber,
Status
Can anyone help me tackle this with a set based solution? Thanks.

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323753 More actions · Answer 1

thank you for the DDL and sample data; that makes it SO easy to test and offer suggestions;

will row_number give you the data you are looking for?

try these two SQLs and give me some feedback on them:

--anything that had the same status twice on the same day

select * from

(select

IssueNumber,

Status,

DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate,

row_number() over(partition by IssueNumber,Status,DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) order by DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) ) As RW

FROM #projectactivity )MyAlias

where RW > 1

--everything ordered by day

select

IssueNumber,

Status,

DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate,

row_number() over(partition by IssueNumber,Status,DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) order by DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) ) As RW

FROM #projectactivity

ORDER BY

DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0)

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

spaghettidba SSC Guru Points: 105732 More actions · Answer 2

Lowell, I was thinking of row_number() too, but then I saw it was a SQL2000 forum.

Anyway I don't understand how different "open" rows are related to subsequent operations: can you explain a little more what happens in time? If I order you table by ActivityDate I get something that is not very clear to me. What does ActivitySeq mean?

-- Gianluca Sartori

Lowell SSC Guru Points: 323753 More actions · Answer 3

misread the requirement...how about this:

SELECT

IssueNumber,

Status,

min(ActivityDate) As ActivityDate,

min(EndDate) As EndDate

FROM(

select MyAlias.* ,myAlias2.ActivityDate as EndDate from

(select

IssueNumber,

Status,

DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate

FROM #projectactivity )MyAlias

LEFT OUTER JOIN

(select

IssueNumber,

Status,

DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate

FROM #projectactivity )MyAlias2

ON MyAlias.IssueNumber = MyAlias2.IssueNumber

AND MyAlias.ActivityDate < MyAlias2.ActivityDate

AND MyAlias.Status MyAlias2.Status

)

BiggerAlias

Group By

IssueNumber,

Status

ORDER BY ActivityDate

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

pat-1085820 SSC Rookie Points: 29 More actions · Answer 4

I updated the sample data, I made an error when I pasted it in. ActivitySeq does have a relationship to the sequence of events, although they are not in order (1,2,3). In the updated sample data, you can see the increasing sequence numbers corresponding to the activity dates. it may not be necessary to solve this problem though.

SELECT '6127','Mar 28 2007 10:11AM','1','Open' UNION ALL

SELECT '6127','Mar 28 2007 10:13AM','14','Open' UNION ALL

SELECT '6127','Mar 29 2007 8:03AM','16','Open' UNION ALL

SELECT '6127','Mar 25 2008 1:36PM','18','Open' UNION ALL

SELECT '6127','Apr 3 2008 10:15PM','20','Open' UNION ALL

SELECT '6127','Apr 14 2008 2:31PM','23','Open' UNION ALL

SELECT '6127','Jun 4 2008 1:42PM','25','Open' UNION ALL

SELECT '6127','Jul 3 2008 4:23PM','27','Cancelled' UNION ALL

SELECT '6127','Apr 9 2009 9:39AM','28','Cancelled' UNION ALL

SELECT '6127','Apr 9 2009 9:43AM','29','Open' UNION ALL

SELECT '6127','Apr 22 2009 5:30PM','33','Open' UNION ALL

SELECT '6127','Apr 22 2009 5:31PM','34','Open' UNION ALL

SELECT '6127','Apr 22 2009 5:36PM','35','Open' UNION ALL

SELECT '6127','May 6 2009 12:35PM','36','Open' UNION ALL

SELECT '6127','May 28 2009 11:28AM','38','Pending Requirements' UNION ALL

SELECT '6127','Jun 1 2009 8:59AM','39','Define Requirements' UNION ALL

SELECT '6127','Jun 2 2009 8:54AM','43','Analysis Design' UNION ALL

SELECT '6127','Jun 3 2009 11:55AM','44','Research' UNION ALL

SELECT '6127','Jul 2 2009 8:31AM','45','Research'

Lowell, thanks for taking a stab. I tried something similar but ended up with only one date range for the open status like your query. Really, I should be returning 2 distinct 'Open' status date ranges. '2007-03-28' to '2008-06-04' and '2009-04-09' to '2009-05-06'.

spaghettidba SSC Guru Points: 105732 More actions · Answer 5

This should do the trick:

SELECT issuenumber, activitydate, activityseq, status

FROM (

SELECT *, prevstatus = (

SELECT status

FROM #projectactivity AS b

WHERE activityseq = (SELECT MAX(activityseq) FROM #projectactivity AS c WHERE activityseq < a.activityseq)

)

FROM #projectactivity AS a

) AS data

WHERE status prevstatus OR prevstatus IS NULL

ORDER BY activityseq

Be advised anyway that although it seems to be a set based solution, it is not indeed. It is what is called a "triangular join".

You will have to test performance carefully before putting to production.

Hope this helps

Gianluca

-- Gianluca Sartori

pat-1085820 SSC Rookie Points: 29 More actions · Answer 6

Thanks Gianluca, that does seem to work for picking out the proper Start Dates of each Status Date Range.

I need to get the end date of each range as well though.

trying to figure out exactly how your code acheived the result, i'm not familair with the technique you used and I'd like to understand it.

spaghettidba SSC Guru Points: 105732 More actions · Answer 7

Well, it's a bit more complicated than I thought, but it (poorly) does the trick.

IF OBJECT_ID('TempDB..#tmpdata','U') IS NOT NULL

DROP TABLE #tmpdata

SELECT issuenumber, activitydate, activityseq, status, nextseq = CAST(NULL AS int), maxseq = CAST(NULL AS int),

maxdate = CAST(NULL AS datetime)

INTO #tmpdata

FROM (

SELECT *, prevstatus = (

SELECT status

FROM #projectactivity AS b

WHERE activityseq = (SELECT MAX(activityseq) FROM #projectactivity AS c WHERE activityseq < a.activityseq)

)

FROM #projectactivity AS a

) AS data

WHERE status prevstatus OR prevstatus IS NULL

ORDER BY activityseq

UPDATE #tmpdata

SET nextseq = (SELECT MIN(activityseq) FROM #tmpdata AS b WHERE b.activityseq > a.activityseq)

FROM #tmpdata as a

UPDATE #tmpdata

SET maxseq = (SELECT MAX(activityseq) FROM #projectactivity AS b WHERE b.activityseq < a.nextseq and a.status = b.status)

FROM #tmpdata as a

UPDATE #tmpdata

SET maxseq = (SELECT MAX(activityseq) FROM #projectactivity)

FROM #tmpdata as a

WHERE a.maxseq is null

UPDATE #tmpdata

SET maxdate = b.activitydate

FROM #tmpdata as a

INNER JOIN #projectactivity AS b

ON a.maxseq = b.activityseq

SELECT issuenumber, activitydate as startdate, maxdate as enddate, activityseq, status

FROM #tmpdata

It uses a temporary table and lots of triangular joins. I don't like it.

Could sound like swearing, but maybe a cursor is the best choice here.

-- Gianluca Sartori

spaghettidba SSC Guru Points: 105732 More actions · Answer 8

Gianluca Sartori (8/6/2009)
maybe a cursor is the best choice here.

Hope Jeff doesn't find this thread!;-)

I could get porkchopped in seconds!:-D

-- Gianluca Sartori

pat-1085820 SSC Rookie Points: 29 More actions · Answer 9

Thanks for helping me. Sometimes a cursor is the best solution, I'll play around with it.

haha, I'm a big fan of Jeff. He's the man when it comes to writing SQL. I love his articles.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

You have provided the DDL for the table and sample data. The only thing I see missing that would help me is the expected results based on the sample data, and perhaps a brief explanation behind it.

spaghettidba SSC Guru Points: 105732 More actions · Answer 11

Ok, I could not resist, I HAD to turn it to a quirky update!

IF OBJECT_ID('TempDB..#tmpdata','U') IS NOT NULL

DROP TABLE #tmpdata

SELECT *, eventgroup = cast(NULL AS int)

INTO #tmpdata

FROM #projectactivity

CREATE CLUSTERED INDEX IX_TMP_ORDERBY --clustered to resolve "Merry-go-Round"

ON #tmpdata (activityseq)

DECLARE @eventgroup int

SET @eventgroup = 0

DECLARE @prevStatus varchar(25)

SET @prevStatus = 0

UPDATE #tmpdata

SET @eventgroup = eventgroup = CASE

WHEN Status = @prevStatus

THEN @eventgroup

ELSE @eventgroup + 1

END ,

@prevStatus = status

FROM #tmpdata WITH (INDEX(IX_TMP_ORDERBY),TABLOCKX)

SELECT issuenumber, MIN(activitydate) as startdate, MAX(activitydate) as enddate, Status

FROM #tmpdata

GROUP BY issuenumber, Status, eventgroup

ORDER BY 1,2

This is done with the "quirky update" method described in one of Jeff's articles, currently under construction.

This is the link.

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

You can look at the discussion and sql file attached to understand what goes on behind the scenes.

Basically, you have to build a clustered index on the columns you want to order by and the update specifying the clustered index in the hints. This way you build a running count, that builds a group number and then uses that group number in the GROUP BY clause.

Now Jeff can safely read this thread. 😀

-- Gianluca Sartori

pat-1085820 SSC Rookie Points: 29 More actions · Answer 12

Thanks Gianluca.

Lyyn, the output from the sample should look like the sample below. The reason for this is that we want to see how long an issue has spent in a particular Status. I'm sorry about this formatting. I couldn't get it to show up nicely. Maybe just some trick I don't know yet.

issuenumberstartdateenddateactivityseqstatus

61272007-03-28 10:11:00.0002008-06-04 13:42:00.0001Open

61272008-07-03 16:23:00.0002009-04-09 09:39:00.00027Cancelled

61272009-04-09 09:43:00.0002009-05-06 12:35:00.00029Open

61272009-05-28 11:28:00.0002009-05-28 11:28:00.00038Pending Requirements

61272009-06-01 08:59:00.0002009-06-01 08:59:00.00039Define Requirements

61272009-06-02 08:54:00.0002009-06-02 08:54:00.00043Analysis Design

61272009-06-03 11:55:00.0002009-07-02 08:31:00.00044Research