August 5, 2009 at 8:21 am
Hi, I'm a little stumped with this problem. I am querying data from a HelpDesk Ticket system in SQL 2000. I need to find the date range each issue has spent in a particular phase of development. The part I'm having trouble with is that a ticket can go from 'open' to 'closed', back to 'open'...and I need to capture those 2 'Open' date ranges separately, or the results will be skewed.
I have a table with the following structure and sample data. for each issuenumber, whenever a comment is added to this ticketing system, another row is added with the issuenumber, status, activitydate, and activityseq number.
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#projectactivity','U') IS NOT NULL
DROP TABLE #projectactivity
--===== Create the test table with
CREATE TABLE #projectactivity
(
IssueNumber INT ,
ActivityDate DATETIME,
ActivitySeq INT,
Status varchar (25)
)
INSERT INTO #projectactivity (IssueNumber, ActivityDate, ActivitySeq, Status)
SELECT '6127','Mar 28 2007 10:11AM','1','Open' UNION ALL
SELECT '6127','Mar 28 2007 10:13AM','14','Open' UNION ALL
SELECT '6127','Mar 29 2007 8:03AM','16','Open' UNION ALL
SELECT '6127','Mar 25 2008 1:36PM','18','Open' UNION ALL
SELECT '6127','Apr 3 2008 10:15PM','20','Open' UNION ALL
SELECT '6127','Apr 14 2008 2:31PM','23','Open' UNION ALL
SELECT '6127','Jun 4 2008 1:42PM','25','Open' UNION ALL
SELECT '6127','Jul 3 2008 4:23PM','27','Cancelled' UNION ALL
SELECT '6127','Apr 9 2009 9:39AM','28','Cancelled' UNION ALL
SELECT '6127','Apr 9 2009 9:43AM','29','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:30PM','33','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:31PM','34','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:36PM','35','Open' UNION ALL
SELECT '6127','May 6 2009 12:35PM','36','Open' UNION ALL
SELECT '6127','May 28 2009 11:28AM','38','Pending Requirements' UNION ALL
SELECT '6127','Jun 1 2009 8:59AM','39','Define Requirements' UNION ALL
SELECT '6127','Jun 2 2009 8:54AM','43','Analysis Design' UNION ALL
SELECT '6127','Jun 3 2009 11:55AM','44','Research' UNION ALL
SELECT '6127','Jul 2 2009 8:31AM','45','Research'
Something like the following would give me incorrect results because it would group all the 'Open' Status Ranges together, when in fact I want to have 3 separate date ranges with an 'Open' Status for this particular issue.
--==== Using the Min and Max ActivityDate for each IssueNumber in the ProjectActivity table
--==== allows for selecting the date range an Issue spent in each status
SELECT
IssueNumber,
Status,
MIN(ActivityDate) as ActivityDate_Start,
MAX(ActivityDate) as ActivityDate_Finish
FROM #projectactivity
GROUP BY
IssueNumber,
Status
Can anyone help me tackle this with a set based solution? Thanks.
August 5, 2009 at 8:36 am
thank you for the DDL and sample data; that makes it SO easy to test and offer suggestions;
will row_number give you the data you are looking for?
try these two SQLs and give me some feedback on them:
--anything that had the same status twice on the same day
select * from
(select
IssueNumber,
Status,
DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate,
row_number() over(partition by IssueNumber,Status,DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) order by DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) ) As RW
FROM #projectactivity )MyAlias
where RW > 1
--everything ordered by day
select
IssueNumber,
Status,
DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate,
row_number() over(partition by IssueNumber,Status,DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) order by DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) ) As RW
FROM #projectactivity
ORDER BY
DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0)
Lowell
August 5, 2009 at 8:47 am
Lowell, I was thinking of row_number() too, but then I saw it was a SQL2000 forum.
Anyway I don't understand how different "open" rows are related to subsequent operations: can you explain a little more what happens in time? If I order you table by ActivityDate I get something that is not very clear to me. What does ActivitySeq mean?
-- Gianluca Sartori
August 5, 2009 at 8:49 am
misread the requirement...how about this:
SELECT
IssueNumber,
Status,
min(ActivityDate) As ActivityDate,
min(EndDate) As EndDate
FROM(
select MyAlias.* ,myAlias2.ActivityDate as EndDate from
(select
IssueNumber,
Status,
DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate
FROM #projectactivity )MyAlias
LEFT OUTER JOIN
(select
IssueNumber,
Status,
DATEADD(dd, DATEDIFF(dd,0,ActivityDate), 0) AS ActivityDate
FROM #projectactivity )MyAlias2
ON MyAlias.IssueNumber = MyAlias2.IssueNumber
AND MyAlias.ActivityDate < MyAlias2.ActivityDate
AND MyAlias.Status MyAlias2.Status
)
BiggerAlias
Group By
IssueNumber,
Status
ORDER BY ActivityDate
Lowell
August 5, 2009 at 10:31 am
I updated the sample data, I made an error when I pasted it in. ActivitySeq does have a relationship to the sequence of events, although they are not in order (1,2,3). In the updated sample data, you can see the increasing sequence numbers corresponding to the activity dates. it may not be necessary to solve this problem though.
SELECT '6127','Mar 28 2007 10:11AM','1','Open' UNION ALL
SELECT '6127','Mar 28 2007 10:13AM','14','Open' UNION ALL
SELECT '6127','Mar 29 2007 8:03AM','16','Open' UNION ALL
SELECT '6127','Mar 25 2008 1:36PM','18','Open' UNION ALL
SELECT '6127','Apr 3 2008 10:15PM','20','Open' UNION ALL
SELECT '6127','Apr 14 2008 2:31PM','23','Open' UNION ALL
SELECT '6127','Jun 4 2008 1:42PM','25','Open' UNION ALL
SELECT '6127','Jul 3 2008 4:23PM','27','Cancelled' UNION ALL
SELECT '6127','Apr 9 2009 9:39AM','28','Cancelled' UNION ALL
SELECT '6127','Apr 9 2009 9:43AM','29','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:30PM','33','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:31PM','34','Open' UNION ALL
SELECT '6127','Apr 22 2009 5:36PM','35','Open' UNION ALL
SELECT '6127','May 6 2009 12:35PM','36','Open' UNION ALL
SELECT '6127','May 28 2009 11:28AM','38','Pending Requirements' UNION ALL
SELECT '6127','Jun 1 2009 8:59AM','39','Define Requirements' UNION ALL
SELECT '6127','Jun 2 2009 8:54AM','43','Analysis Design' UNION ALL
SELECT '6127','Jun 3 2009 11:55AM','44','Research' UNION ALL
SELECT '6127','Jul 2 2009 8:31AM','45','Research'
Lowell, thanks for taking a stab. I tried something similar but ended up with only one date range for the open status like your query. Really, I should be returning 2 distinct 'Open' status date ranges. '2007-03-28' to '2008-06-04' and '2009-04-09' to '2009-05-06'.
August 5, 2009 at 11:14 am
This should do the trick:
SELECT issuenumber, activitydate, activityseq, status
FROM (
SELECT *, prevstatus = (
SELECT status
FROM #projectactivity AS b
WHERE activityseq = (SELECT MAX(activityseq) FROM #projectactivity AS c WHERE activityseq < a.activityseq)
)
FROM #projectactivity AS a
) AS data
WHERE status prevstatus OR prevstatus IS NULL
ORDER BY activityseq
Be advised anyway that although it seems to be a set based solution, it is not indeed. It is what is called a "triangular join".
You will have to test performance carefully before putting to production.
Hope this helps
Gianluca
-- Gianluca Sartori
August 5, 2009 at 12:22 pm
Thanks Gianluca, that does seem to work for picking out the proper Start Dates of each Status Date Range.
I need to get the end date of each range as well though.
trying to figure out exactly how your code acheived the result, i'm not familair with the technique you used and I'd like to understand it.
August 6, 2009 at 4:50 am
Well, it's a bit more complicated than I thought, but it (poorly) does the trick.
IF OBJECT_ID('TempDB..#tmpdata','U') IS NOT NULL
DROP TABLE #tmpdata
SELECT issuenumber, activitydate, activityseq, status, nextseq = CAST(NULL AS int), maxseq = CAST(NULL AS int),
maxdate = CAST(NULL AS datetime)
INTO #tmpdata
FROM (
SELECT *, prevstatus = (
SELECT status
FROM #projectactivity AS b
WHERE activityseq = (SELECT MAX(activityseq) FROM #projectactivity AS c WHERE activityseq < a.activityseq)
)
FROM #projectactivity AS a
) AS data
WHERE status prevstatus OR prevstatus IS NULL
ORDER BY activityseq
UPDATE #tmpdata
SET nextseq = (SELECT MIN(activityseq) FROM #tmpdata AS b WHERE b.activityseq > a.activityseq)
FROM #tmpdata as a
UPDATE #tmpdata
SET maxseq = (SELECT MAX(activityseq) FROM #projectactivity AS b WHERE b.activityseq < a.nextseq and a.status = b.status)
FROM #tmpdata as a
UPDATE #tmpdata
SET maxseq = (SELECT MAX(activityseq) FROM #projectactivity)
FROM #tmpdata as a
WHERE a.maxseq is null
UPDATE #tmpdata
SET maxdate = b.activitydate
FROM #tmpdata as a
INNER JOIN #projectactivity AS b
ON a.maxseq = b.activityseq
SELECT issuenumber, activitydate as startdate, maxdate as enddate, activityseq, status
FROM #tmpdata
It uses a temporary table and lots of triangular joins. I don't like it.
Could sound like swearing, but maybe a cursor is the best choice here.
-- Gianluca Sartori
August 6, 2009 at 4:52 am
Gianluca Sartori (8/6/2009)
maybe a cursor is the best choice here.
Hope Jeff doesn't find this thread!;-)
I could get porkchopped in seconds!:-D
-- Gianluca Sartori
August 6, 2009 at 9:25 am
Thanks for helping me. Sometimes a cursor is the best solution, I'll play around with it.
haha, I'm a big fan of Jeff. He's the man when it comes to writing SQL. I love his articles.
August 6, 2009 at 9:59 am
You have provided the DDL for the table and sample data. The only thing I see missing that would help me is the expected results based on the sample data, and perhaps a brief explanation behind it.
August 6, 2009 at 10:00 am
Ok, I could not resist, I HAD to turn it to a quirky update!
IF OBJECT_ID('TempDB..#tmpdata','U') IS NOT NULL
DROP TABLE #tmpdata
SELECT *, eventgroup = cast(NULL AS int)
INTO #tmpdata
FROM #projectactivity
CREATE CLUSTERED INDEX IX_TMP_ORDERBY --clustered to resolve "Merry-go-Round"
ON #tmpdata (activityseq)
DECLARE @eventgroup int
SET @eventgroup = 0
DECLARE @prevStatus varchar(25)
SET @prevStatus = 0
UPDATE #tmpdata
SET @eventgroup = eventgroup = CASE
WHEN Status = @prevStatus
THEN @eventgroup
ELSE @eventgroup + 1
END ,
@prevStatus = status
FROM #tmpdata WITH (INDEX(IX_TMP_ORDERBY),TABLOCKX)
SELECT issuenumber, MIN(activitydate) as startdate, MAX(activitydate) as enddate, Status
FROM #tmpdata
GROUP BY issuenumber, Status, eventgroup
ORDER BY 1,2
This is done with the "quirky update" method described in one of Jeff's articles, currently under construction.
This is the link.
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
You can look at the discussion and sql file attached to understand what goes on behind the scenes.
Basically, you have to build a clustered index on the columns you want to order by and the update specifying the clustered index in the hints. This way you build a running count, that builds a group number and then uses that group number in the GROUP BY clause.
Now Jeff can safely read this thread. 😀
-- Gianluca Sartori
August 6, 2009 at 1:38 pm
Thanks Gianluca.
Lyyn, the output from the sample should look like the sample below. The reason for this is that we want to see how long an issue has spent in a particular Status. I'm sorry about this formatting. I couldn't get it to show up nicely. Maybe just some trick I don't know yet.
issuenumberstartdateenddateactivityseqstatus
61272007-03-28 10:11:00.0002008-06-04 13:42:00.0001Open
61272008-07-03 16:23:00.0002009-04-09 09:39:00.00027Cancelled
61272009-04-09 09:43:00.0002009-05-06 12:35:00.00029Open
61272009-05-28 11:28:00.0002009-05-28 11:28:00.00038Pending Requirements
61272009-06-01 08:59:00.0002009-06-01 08:59:00.00039Define Requirements
61272009-06-02 08:54:00.0002009-06-02 08:54:00.00043Analysis Design
61272009-06-03 11:55:00.0002009-07-02 08:31:00.00044Research
Viewing 13 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply