Date Range of occurences of consecutive String Values?

  • Hi, I'm a little stumped with this problem. I am querying data from a HelpDesk Ticket system in SQL 2000. I need to find the date range each issue has spent in a particular phase of development. The part I'm having trouble with is that a ticket can go from 'open' to 'closed', back to 'open'...and I need to capture those 2 'Open' date ranges separately, or the results will be skewed.

    I have a table with the following structure and sample data. for each issuenumber, whenever a comment is added to this ticketing system, another row is added with the issuenumber, status, activitydate, and activityseq number.

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#projectactivity','U') IS NOT NULL

    DROP TABLE #projectactivity

    --===== Create the test table with

    CREATE TABLE #projectactivity

    (

    IssueNumber INT ,

    ActivityDate DATETIME,

    ActivitySeq INT,

    Status varchar (25)

    )

    INSERT INTO #projectactivity (IssueNumber, ActivityDate, ActivitySeq, Status)

    SELECT '6127','Mar 28 2007 10:11AM','1','Open' UNION ALL

    SELECT '6127','Mar 28 2007 10:13AM','14','Open' UNION ALL

    SELECT '6127','Mar 29 2007 8:03AM','16','Open' UNION ALL

    SELECT '6127','Mar 25 2008 1:36PM','18','Open' UNION ALL

    SELECT '6127','Apr 3 2008 10:15PM','20','Open' UNION ALL

    SELECT '6127','Apr 14 2008 2:31PM','23','Open' UNION ALL

    SELECT '6127','Jun 4 2008 1:42PM','25','Open' UNION ALL

    SELECT '6127','Jul 3 2008 4:23PM','27','Cancelled' UNION ALL

    SELECT '6127','Apr 9 2009 9:39AM','28','Cancelled' UNION ALL

    SELECT '6127','Apr 9 2009 9:43AM','29','Open' UNION ALL

    SELECT '6127','Apr 22 2009 5:30PM','33','Open' UNION ALL

    SELECT '6127','Apr 22 2009 5:31PM','34','Open' UNION ALL

    SELECT '6127','Apr 22 2009 5:36PM','35','Open' UNION ALL

    SELECT '6127','May 6 2009 12:35PM','36','Open' UNION ALL

    SELECT '6127','May 28 2009 11:28AM','38','Pending Requirements' UNION ALL

    SELECT '6127','Jun 1 2009 8:59AM','39','Define Requirements' UNION ALL

    SELECT '6127','Jun 2 2009 8:54AM','43','Analysis Design' UNION ALL

    SELECT '6127','Jun 3 2009 11:55AM','44','Research' UNION ALL

    SELECT '6127','Jul 2 2009 8:31AM','45','Research'

    Something like the following would give me incorrect results because it would group all the 'Open' Status Ranges together, when in fact I want to have 3 separate date ranges with an 'Open' Status for this particular issue.

    --==== Using the Min and Max ActivityDate for each IssueNumber in the ProjectActivity table

    --==== allows for selecting the date range an Issue spent in each status

    SELECT

    IssueNumber,

    Status,

    MIN(ActivityDate) as ActivityDate_Start,

    MAX(ActivityDate) as ActivityDate_Finish

    FROM #projectactivity

    GROUP BY

    IssueNumber,

    Status

    Can anyone help me tackle this with a set based solution? Thanks.

  • thank you for the DDL and sample data; that makes it SO easy to test and offer suggestions;

    will row_number give you the data you are looking for?

    try these two SQLs and give me some feedback on them:

    --anything that had the same status twice on the same day

    select * from

    (select

    IssueNumber,

    Status,

    DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate,

    row_number() over(partition by IssueNumber,Status,DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) order by DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) ) As RW

    FROM #projectactivity )MyAlias

    where RW > 1

    --everything ordered by day

    select

    IssueNumber,

    Status,

    DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate,

    row_number() over(partition by IssueNumber,Status,DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) order by DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) ) As RW

    FROM #projectactivity

    ORDER BY

    DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell, I was thinking of row_number() too, but then I saw it was a SQL2000 forum.

    Anyway I don't understand how different "open" rows are related to subsequent operations: can you explain a little more what happens in time? If I order you table by ActivityDate I get something that is not very clear to me. What does ActivitySeq mean?

    -- Gianluca Sartori

  • misread the requirement...how about this:

    SELECT

    IssueNumber,

    Status,

    min(ActivityDate) As ActivityDate,

    min(EndDate) As EndDate

    FROM(

    select MyAlias.* ,myAlias2.ActivityDate as EndDate from

    (select

    IssueNumber,

    Status,

    DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate

    FROM #projectactivity )MyAlias

    LEFT OUTER JOIN

    (select

    IssueNumber,

    Status,

    DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate

    FROM #projectactivity )MyAlias2

    ON MyAlias.IssueNumber = MyAlias2.IssueNumber

    AND MyAlias.ActivityDate < MyAlias2.ActivityDate

    AND MyAlias.Status MyAlias2.Status

    )

    BiggerAlias

    Group By

    IssueNumber,

    Status

    ORDER BY ActivityDate

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I updated the sample data, I made an error when I pasted it in. ActivitySeq does have a relationship to the sequence of events, although they are not in order (1,2,3). In the updated sample data, you can see the increasing sequence numbers corresponding to the activity dates. it may not be necessary to solve this problem though.

    SELECT '6127','Mar 28 2007 10:11AM','1','Open' UNION ALL

    SELECT '6127','Mar 28 2007 10:13AM','14','Open' UNION ALL

    SELECT '6127','Mar 29 2007 8:03AM','16','Open' UNION ALL

    SELECT '6127','Mar 25 2008 1:36PM','18','Open' UNION ALL

    SELECT '6127','Apr 3 2008 10:15PM','20','Open' UNION ALL

    SELECT '6127','Apr 14 2008 2:31PM','23','Open' UNION ALL

    SELECT '6127','Jun 4 2008 1:42PM','25','Open' UNION ALL

    SELECT '6127','Jul 3 2008 4:23PM','27','Cancelled' UNION ALL

    SELECT '6127','Apr 9 2009 9:39AM','28','Cancelled' UNION ALL

    SELECT '6127','Apr 9 2009 9:43AM','29','Open' UNION ALL

    SELECT '6127','Apr 22 2009 5:30PM','33','Open' UNION ALL

    SELECT '6127','Apr 22 2009 5:31PM','34','Open' UNION ALL

    SELECT '6127','Apr 22 2009 5:36PM','35','Open' UNION ALL

    SELECT '6127','May 6 2009 12:35PM','36','Open' UNION ALL

    SELECT '6127','May 28 2009 11:28AM','38','Pending Requirements' UNION ALL

    SELECT '6127','Jun 1 2009 8:59AM','39','Define Requirements' UNION ALL

    SELECT '6127','Jun 2 2009 8:54AM','43','Analysis Design' UNION ALL

    SELECT '6127','Jun 3 2009 11:55AM','44','Research' UNION ALL

    SELECT '6127','Jul 2 2009 8:31AM','45','Research'

    Lowell, thanks for taking a stab. I tried something similar but ended up with only one date range for the open status like your query. Really, I should be returning 2 distinct 'Open' status date ranges. '2007-03-28' to '2008-06-04' and '2009-04-09' to '2009-05-06'.

  • This should do the trick:

    SELECT issuenumber, activitydate, activityseq, status

    FROM (

    SELECT *, prevstatus = (

    SELECT status

    FROM #projectactivity AS b

    WHERE activityseq = (SELECT MAX(activityseq) FROM #projectactivity AS c WHERE activityseq < a.activityseq)

    )

    FROM #projectactivity AS a

    ) AS data

    WHERE status prevstatus OR prevstatus IS NULL

    ORDER BY activityseq

    Be advised anyway that although it seems to be a set based solution, it is not indeed. It is what is called a "triangular join".

    You will have to test performance carefully before putting to production.

    Hope this helps

    Gianluca

    -- Gianluca Sartori

  • Thanks Gianluca, that does seem to work for picking out the proper Start Dates of each Status Date Range.

    I need to get the end date of each range as well though.

    trying to figure out exactly how your code acheived the result, i'm not familair with the technique you used and I'd like to understand it.

  • Well, it's a bit more complicated than I thought, but it (poorly) does the trick.

    IF OBJECT_ID('TempDB..#tmpdata','U') IS NOT NULL

    DROP TABLE #tmpdata

    SELECT issuenumber, activitydate, activityseq, status, nextseq = CAST(NULL AS int), maxseq = CAST(NULL AS int),

    maxdate = CAST(NULL AS datetime)

    INTO #tmpdata

    FROM (

    SELECT *, prevstatus = (

    SELECT status

    FROM #projectactivity AS b

    WHERE activityseq = (SELECT MAX(activityseq) FROM #projectactivity AS c WHERE activityseq < a.activityseq)

    )

    FROM #projectactivity AS a

    ) AS data

    WHERE status prevstatus OR prevstatus IS NULL

    ORDER BY activityseq

    UPDATE #tmpdata

    SET nextseq = (SELECT MIN(activityseq) FROM #tmpdata AS b WHERE b.activityseq > a.activityseq)

    FROM #tmpdata as a

    UPDATE #tmpdata

    SET maxseq = (SELECT MAX(activityseq) FROM #projectactivity AS b WHERE b.activityseq < a.nextseq and a.status = b.status)

    FROM #tmpdata as a

    UPDATE #tmpdata

    SET maxseq = (SELECT MAX(activityseq) FROM #projectactivity)

    FROM #tmpdata as a

    WHERE a.maxseq is null

    UPDATE #tmpdata

    SET maxdate = b.activitydate

    FROM #tmpdata as a

    INNER JOIN #projectactivity AS b

    ON a.maxseq = b.activityseq

    SELECT issuenumber, activitydate as startdate, maxdate as enddate, activityseq, status

    FROM #tmpdata

    It uses a temporary table and lots of triangular joins. I don't like it.

    Could sound like swearing, but maybe a cursor is the best choice here.

    -- Gianluca Sartori

  • Gianluca Sartori (8/6/2009)


    maybe a cursor is the best choice here.

    Hope Jeff doesn't find this thread!;-)

    I could get porkchopped in seconds!:-D

    -- Gianluca Sartori

  • Thanks for helping me. Sometimes a cursor is the best solution, I'll play around with it.

    haha, I'm a big fan of Jeff. He's the man when it comes to writing SQL. I love his articles.

  • You have provided the DDL for the table and sample data. The only thing I see missing that would help me is the expected results based on the sample data, and perhaps a brief explanation behind it.

  • Ok, I could not resist, I HAD to turn it to a quirky update!

    IF OBJECT_ID('TempDB..#tmpdata','U') IS NOT NULL

    DROP TABLE #tmpdata

    SELECT *, eventgroup = cast(NULL AS int)

    INTO #tmpdata

    FROM #projectactivity

    CREATE CLUSTERED INDEX IX_TMP_ORDERBY --clustered to resolve "Merry-go-Round"

    ON #tmpdata (activityseq)

    DECLARE @eventgroup int

    SET @eventgroup = 0

    DECLARE @prevStatus varchar(25)

    SET @prevStatus = 0

    UPDATE #tmpdata

    SET @eventgroup = eventgroup = CASE

    WHEN Status = @prevStatus

    THEN @eventgroup

    ELSE @eventgroup + 1

    END ,

    @prevStatus = status

    FROM #tmpdata WITH (INDEX(IX_TMP_ORDERBY),TABLOCKX)

    SELECT issuenumber, MIN(activitydate) as startdate, MAX(activitydate) as enddate, Status

    FROM #tmpdata

    GROUP BY issuenumber, Status, eventgroup

    ORDER BY 1,2

    This is done with the "quirky update" method described in one of Jeff's articles, currently under construction.

    This is the link.

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    You can look at the discussion and sql file attached to understand what goes on behind the scenes.

    Basically, you have to build a clustered index on the columns you want to order by and the update specifying the clustered index in the hints. This way you build a running count, that builds a group number and then uses that group number in the GROUP BY clause.

    Now Jeff can safely read this thread. 😀

    -- Gianluca Sartori

  • Thanks Gianluca.

    Lyyn, the output from the sample should look like the sample below. The reason for this is that we want to see how long an issue has spent in a particular Status. I'm sorry about this formatting. I couldn't get it to show up nicely. Maybe just some trick I don't know yet.

    issuenumberstartdateenddateactivityseqstatus

    61272007-03-28 10:11:00.0002008-06-04 13:42:00.0001Open

    61272008-07-03 16:23:00.0002009-04-09 09:39:00.00027Cancelled

    61272009-04-09 09:43:00.0002009-05-06 12:35:00.00029Open

    61272009-05-28 11:28:00.0002009-05-28 11:28:00.00038Pending Requirements

    61272009-06-01 08:59:00.0002009-06-01 08:59:00.00039Define Requirements

    61272009-06-02 08:54:00.0002009-06-02 08:54:00.00043Analysis Design

    61272009-06-03 11:55:00.0002009-07-02 08:31:00.00044Research

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply