February 17, 2005 at 5:12 am
Hi
This is my first post on this forum - so go easy on me please!!
Basically here is some sample data (simplified slightly)
EntryDate Name Location
------------------------------------------------------------
2005-01-01 10:00:00 Customer1 50
2005-01-01 10:01:00 Customer1 50
2005-01-01 14:00:00 Customer1 50
2005-01-02 09:00:00 Customer1 45
2005-01-02 10:00:00 Customer1 30
2005-01-02 11:00:00 Customer1 45
2005-01-02 11:30:00 Customer1 50
2005-01-02 11:40:00 Customer1 50
Basically I need to know (within a given date range - so for example a week) how long Customer1 has been at location 50
So in this example (with the supplied date range covering ALL of the data above) I would expect the result Customer1 - 4 hours 10 minutes.
The problem I am having is finding a quick way to identify contiguous blocks where the location is 50, so I can do a DateDiff on the Start and End EntryDates of each block.
Can anyone help me with this?
Thanks
Simon
February 17, 2005 at 5:42 am
Something like this?
SELECT
MIN(location) location
, MIN([name]) [name]
, MIN(EntryDate) entrydate
, DATEDIFF(mi,MIN(EntryDate),MAX(EntryDate)) diff_in_minutes_per_day
FROM
< your_table >
WHERE
EntryDate>='20050101' AND EntryDate<'20050103'
AND
Location = 50
GROUP BY
[name]
, DAY(EntryDate)
?
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 17, 2005 at 6:42 am
Frank Kalis - Thanks for your reply much appreciated. Whilst your query does exactly what I want in terms of the data shown above - it only works if there is only one period in any one day where the customer is at the location. If they visit twice then it works out the minutes as the difference between the start of the first visit and the end of the last visit - i.e too long a duration.
Any other ideas
Thanks
February 17, 2005 at 8:22 am
You mean he can visit multiple times a day? And you only have an EntryDate and no ExitDate (or something like that)? How will you measure that duration? And what happens to the time someone needs to get from one location to the other?
You got me a bit confused
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 17, 2005 at 9:41 am
Frank
You got it in a nutshell, I have vastly simplified the data, however there are good reasons why things like exit date are not available. The Exit date (and therefore the duration) have to be obtained by looking at when the next record is not in the location we are after hence my original post about identifying contiguous sections of data with the same location code (so that I can get the start and end dates of each "block")....
Its one of the more difficult problems I've had to deal with recently
February 17, 2005 at 9:59 am
Simon,
You have to be a little more explicit on how you define CONTIGUOS, this is why:
1. you said in the previous post that you could have more than one interval on the same date at the same location.
2. You don't want to get the difference between the first Date and the last on the same day at the same location
then how do you determine that there is an interval without an exit date?
HTH
* Noel
February 18, 2005 at 1:19 am
The way your say this, sounds like you have not control over the design, right? However, if you are able to change the table design, you should really do this. I think almost every solution in the current situation will be kind of a kludge. Anyway, I think I have seen something similar in the Google Archives some weeks ago. Have you searched there already. I think it was a posting by David Portas or Steve Kass, but I'm not sure on this. You might search there. From what I understand now, one solution that will work is to use a cursor but I'll keep thinking about it.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 18, 2005 at 1:25 am
Noel
Say for example someone is at a location for two periods in one day. The only way I can tell from the data that I have available that they have visited twice is that because inbetween the two periods there are records where they are somewhere else. That is why this query is so difficult. Here is some more sample data to explain better
2005-01-01 10:00:00 Customer1 50
2005-01-01 11:00:00 Customer1 50
2005-01-01 12:00:00 Customer1 45
2005-01-01 13:00:00 Customer1 45
2005-01-01 14:00:00 Customer1 50
2005-01-02 15:00:00 Customer1 50
I need a query to determine both the green and red dates so that I can perform a date diff (as individual periods) to produce the total of 2 hours for the day. The only data I have to determine these periods is somehow identifying unbroken continous sections that have a matching location id (in this case 50). That is my question really how do I identify the red and green dates above from the small amount of data available.
Thanks
Simon
February 18, 2005 at 2:06 am
Simon,
I am not usre I understood the request, but here is some code :
set nocount on
go
--exec sp__droptable 'dbo.sample'
go
-- set up a test environment
create table dbo.sample
(
EntryDate datetime not null,
Name varchar(10) not null,
location tinyint not null
)
insert dbo.sample values ('2005-01-01 10:00:00','Customer1',50)
insert dbo.sample values ('2005-01-01 10:01:00','Customer1',50)
insert dbo.sample values ('2005-01-01 14:00:00','Customer1',50)
insert dbo.sample values ('2005-01-02 09:00:00','Customer1',45)
insert dbo.sample values ('2005-01-02 10:00:00','Customer1',30)
insert dbo.sample values ('2005-01-02 11:00:00','Customer1',45)
insert dbo.sample values ('2005-01-02 11:30:00','Customer1',50)
insert dbo.sample values ('2005-01-02 11:40:00','Customer1',50)
insert dbo.sample values ('2005-01-01 10:00:00','Customer2',50)
insert dbo.sample values ('2005-01-01 10:01:00','Customer2',50)
insert dbo.sample values ('2005-01-01 14:00:00','Customer3',50)
insert dbo.sample values ('2005-01-02 09:00:00','Customer3',45)
insert dbo.sample values ('2005-01-02 10:00:00','Customer2',30)
insert dbo.sample values ('2005-01-02 11:00:00','Customer3',45)
insert dbo.sample values ('2005-01-02 11:30:00','Customer2',50)
insert dbo.sample values ('2005-01-02 11:40:00','Customer3',50)
go
-- show data for test environment
select Name,EntryDate,location
from dbo.sample
order by Name,EntryDate,location
go
-- get duration at each location for each customer
-- duration is in minutes
--
-- take the data ordered ba Name, EntryDate and location
-- if record n and record n+1 are for the same Name and location,
-- duration is the difference in minutes of the EntryDate
select R.Name,R.location,duration=sum(convert(int,left(R.NextRecord,12)))
from
(
select s1.Name,s1.EntryDate,s1.location,NextRecord =
( select TOP 1 NextRecord=right(replicate('0',12) + convert(varchar(12),datediff(mi,s1.EntryDate,s2.EntryDate)),12) + ' ' + s2.Name + '@' + convert(varchar(10),s2.location)
from dbo.sample s2
where s2.EntryDate > s1.EntryDate and s1.Name = s2.Name )
from dbo.sample s1
--order by s1.Name,s1.EntryDate,s1.location
) R
where R.NextRecord is not null
and R.NextRecord like replicate('_',12) + ' ' + R.Name + '@' + convert(varchar(10),R.location)
group by R.Name,R.location
order by R.Name,R.location
go
/*
RESULT :
Name location duration
---------- -------- -----------
Customer1 50 250
Customer2 50 1
Customer3 45 120
*/
February 18, 2005 at 2:36 am
Now I can see, why this
2005-01-01 10:00:00 Customer1 50
2005-01-01 11:00:00 Customer1 50
will result in 60 minutes difference. However, why should this
2005-01-01 14:00:00 Customer1 50
2005-01-02 15:00:00 Customer1 50
also result in 60 minutes difference anyway. To me this would be 1500, e.g. 25 hours. Anyway, see if this will help a bit:
IF OBJECT_ID('#test')>0
DROP TABLE #test
GO
SET NOCOUNT ON
CREATE TABLE #test
(
ENTRYDATE DATETIME
, [name] VARCHAR(10)
, Location INT
)
INSERT INTO #test VALUES('2005-01-01 10:00:00', 'Customer1', 50)
INSERT INTO #test VALUES('2005-01-01 11:00:00', 'Customer1', 50)
INSERT INTO #test VALUES('2005-01-01 12:00:00', 'Customer1', 45)
INSERT INTO #test VALUES('2005-01-01 13:00:00', 'Customer1', 45)
INSERT INTO #test VALUES('2005-01-01 14:00:00', 'Customer1', 50)
INSERT INTO #test VALUES('2005-01-02 15:00:00', 'Customer1', 50)
SELECT
t1.location
, RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)
, DATEDIFF(mi,t1.EntryDate,t2.EntryDate)
FROM
#test t1
JOIN
#test t2
ON
t1.location = t2.location
AND
(DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0)
OR
DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)+1=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0))
AND
RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)
<
RIGHT(STUFF(CONVERT(CHAR(22),t2.EntryDate,13),21,4,' '),9)
WHERE
t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'
AND
t1.Location = 50
AND NOT EXISTS
(SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)
GROUP BY
t1.location
, t1.EntryDate
, t2.EntryDate
ORDER BY t1.EntryDate
SET NOCOUNT OFF
location
----------- --------- -----------
50 10:00:00 60
50 14:00:00 1500
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 18, 2005 at 2:39 am
FWIW. Note the CASE expression to take care of period greater than 1 day:
IF OBJECT_ID('#test')>0
DROP TABLE #test
GO
SET NOCOUNT ON
CREATE TABLE #test
(
ENTRYDATE DATETIME
, [name] VARCHAR(10)
, Location INT
)
INSERT INTO #test VALUES('2005-01-01 10:00:00', 'Customer1', 50)
INSERT INTO #test VALUES('2005-01-01 11:00:00', 'Customer1', 50)
INSERT INTO #test VALUES('2005-01-01 12:00:00', 'Customer1', 45)
INSERT INTO #test VALUES('2005-01-01 13:00:00', 'Customer1', 45)
INSERT INTO #test VALUES('2005-01-01 14:00:00', 'Customer1', 50)
INSERT INTO #test VALUES('2005-01-02 15:00:00', 'Customer1', 50)
SELECT
t1.location
, RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)
, CASE
WHEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)>=1440
THEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)-1440
ELSE DATEDIFF(mi,t1.EntryDate,t2.EntryDate)
END
FROM
#test t1
JOIN
#test t2
ON
t1.location = t2.location
AND
(DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0)
OR
DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)+1=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0))
AND
RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)
<
RIGHT(STUFF(CONVERT(CHAR(22),t2.EntryDate,13),21,4,' '),9)
WHERE
t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'
AND
t1.Location = 50
AND NOT EXISTS
(SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)
GROUP BY
t1.location
, t1.EntryDate
, t2.EntryDate
ORDER BY t1.EntryDate
SET NOCOUNT OFF
location
----------- --------- -----------
50 10:00:00 60
50 14:00:00 60
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 18, 2005 at 3:36 am
The second date was a typo they should all have been on the same day (the 1st) - although it is perfectly feasible that the date could be the next day so the duration would span the night.
I'll have a look at your query later on today, Thanks Frank
February 18, 2005 at 5:48 am
Aargh, sometimes you can't see the wood for the trees:
SELECT
t1.location
, RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)
, CASE
WHEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)>1440
THEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)-1440
ELSE DATEDIFF(mi,t1.EntryDate,t2.EntryDate)
END
FROM
#test t1
JOIN
#test t2
ON
t1.location = t2.location
AND
t1.EntryDate < t2.EntryDate
WHERE
t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'
AND
t1.Location = 50
AND NOT EXISTS
(SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)
GROUP BY
t1.location
, t1.EntryDate
, t2.EntryDate
ORDER BY t1.EntryDate
should also do
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
February 18, 2005 at 8:14 am
Frank, that is a nice solution. I typically use "IN" instead of "EXISTS" - which I am trying to get away from. Anyway, your "EXISTS" statement is confusing me so I was hoping you would elaborate on why it works (I tested it an know it does).
I understand that because of the t1.EntryDate < t2.EntryDate part of the join, there are extra records (for instance 10:00 - 14:00) that need to be eliminated. I'm just not sure why:
WHERE NOT EXISTS (SELECT * FROM test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)
is doing this. I read that they query will select locations <> 50 that have t3 times between t1 and t2 times. My problem is that your query is selecting locations <> 50 of which your where statement has allready excluded - so I would assume the NOT EXISTS would not have anything additional to exclude.
I am wrong of course since I can see it works. Would you mind explaining a bit? I am so used to "IN" where I am comparing an particular field. I guess with EXISTS I am comparing an entire record hit.
I have read BOL for EXISTS - but the example are very simplistic.
TIA
February 18, 2005 at 8:34 am
I'm not sure my post above properly explained my issue. I rewrote the EXISTS as I understand it:
SELECT t3.* FROM
test t3, test t2, test t1
WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate
This query returns:
ENTRYDATE name Location
------------------------------------------------------ ---------- -----------
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 12:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
2005-01-01 13:00:00.000 Customer1 45
I don't see why this result set is excluding the records you are trying to exclude - which are:
2005-01-01 10:00:00.000 2005-01-01 14:00:00.000 50 10:00:00 240
2005-01-01 10:00:00.000 2005-01-02 15:00:00.000 50 10:00:00 300
2005-01-01 11:00:00.000 2005-01-01 14:00:00.000 50 11:00:00 180
2005-01-01 11:00:00.000 2005-01-02 15:00:00.000 50 11:00:00 240
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply