Contigous Data Periods

Question

Post reply

Contigous Data Periods

Simon Smith

SSC Rookie

Points: 29
More actions
February 17, 2005 at 5:12 am

#61659

Hi
This is my first post on this forum - so go easy on me please!!
Basically here is some sample data (simplified slightly)
EntryDate                          Name                         Location
------------------------------------------------------------
2005-01-01 10:00:00           Customer1                      50
2005-01-01 10:01:00           Customer1                      50
2005-01-01 14:00:00           Customer1                      50
2005-01-02 09:00:00           Customer1                      45
2005-01-02 10:00:00           Customer1                      30
2005-01-02 11:00:00           Customer1                      45
2005-01-02 11:30:00           Customer1                      50
2005-01-02 11:40:00           Customer1                      50
Basically I need to know (within a given date range - so for example a week) how long Customer1 has been at location 50
So in this example (with the supplied date range covering ALL of the data above) I would expect the result Customer1 - 4 hours 10 minutes.
The problem I am having is finding a quick way to identify contiguous blocks where the location is 50, so I can do a DateDiff on the Start and End EntryDates of each block.
Can anyone help me with this?
Thanks
Simon

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Frank Kalis SSC Guru Points: 111183 More actions · Answer 1

Something like this?

SELECT

MIN(location) location

, MIN([name]) [name]

, MIN(EntryDate) entrydate

, DATEDIFF(mi,MIN(EntryDate),MAX(EntryDate)) diff_in_minutes_per_day

FROM

< your_table >

WHERE

EntryDate>='20050101' AND EntryDate<'20050103'

AND

Location = 50

GROUP BY

[name]

, DAY(EntryDate)

?

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Simon Smith SSC Rookie Points: 29 More actions · Answer 2

Frank Kalis - Thanks for your reply much appreciated. Whilst your query does exactly what I want in terms of the data shown above - it only works if there is only one period in any one day where the customer is at the location. If they visit twice then it works out the minutes as the difference between the start of the first visit and the end of the last visit - i.e too long a duration.

Any other ideas

Thanks

Frank Kalis SSC Guru Points: 111183 More actions · Answer 3

You mean he can visit multiple times a day? And you only have an EntryDate and no ExitDate (or something like that)? How will you measure that duration? And what happens to the time someone needs to get from one location to the other?

You got me a bit confused

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Simon Smith SSC Rookie Points: 29 More actions · Answer 4

Frank

You got it in a nutshell, I have vastly simplified the data, however there are good reasons why things like exit date are not available. The Exit date (and therefore the duration) have to be obtained by looking at when the next record is not in the location we are after hence my original post about identifying contiguous sections of data with the same location code (so that I can get the start and end dates of each "block")....

Its one of the more difficult problems I've had to deal with recently

noeld SSC Guru Points: 96590 More actions · Answer 5

Simon,

You have to be a little more explicit on how you define CONTIGUOS, this is why:

1. you said in the previous post that you could have more than one interval on the same date at the same location.

2. You don't want to get the difference between the first Date and the last on the same day at the same location

then how do you determine that there is an interval without an exit date?

HTH

* Noel

Frank Kalis SSC Guru Points: 111183 More actions · Answer 6

The way your say this, sounds like you have not control over the design, right? However, if you are able to change the table design, you should really do this. I think almost every solution in the current situation will be kind of a kludge. Anyway, I think I have seen something similar in the Google Archives some weeks ago. Have you searched there already. I think it was a posting by David Portas or Steve Kass, but I'm not sure on this. You might search there. From what I understand now, one solution that will work is to use a cursor but I'll keep thinking about it.

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Simon Smith SSC Rookie Points: 29 More actions · Answer 7

Noel

Say for example someone is at a location for two periods in one day. The only way I can tell from the data that I have available that they have visited twice is that because inbetween the two periods there are records where they are somewhere else. That is why this query is so difficult. Here is some more sample data to explain better

2005-01-01 10:00:00 Customer1 50

2005-01-01 11:00:00 Customer1 50

2005-01-01 12:00:00 Customer1 45

2005-01-01 13:00:00 Customer1 45

2005-01-01 14:00:00 Customer1 50

2005-01-02 15:00:00 Customer1 50

I need a query to determine both the green and red dates so that I can perform a date diff (as individual periods) to produce the total of 2 hours for the day. The only data I have to determine these periods is somehow identifying unbroken continous sections that have a matching location id (in this case 50). That is my question really how do I identify the red and green dates above from the small amount of data available.

Thanks

Simon

Bert De Haes Hall of Fame Points: 3328 More actions · Answer 8

Simon,

I am not usre I understood the request, but here is some code :

set nocount on

go

--exec sp__droptable 'dbo.sample'

go

-- set up a test environment

create table dbo.sample

(

EntryDate datetime not null,

Name varchar(10) not null,

location tinyint not null

)

insert dbo.sample values ('2005-01-01 10:00:00','Customer1',50)

insert dbo.sample values ('2005-01-01 10:01:00','Customer1',50)

insert dbo.sample values ('2005-01-01 14:00:00','Customer1',50)

insert dbo.sample values ('2005-01-02 09:00:00','Customer1',45)

insert dbo.sample values ('2005-01-02 10:00:00','Customer1',30)

insert dbo.sample values ('2005-01-02 11:00:00','Customer1',45)

insert dbo.sample values ('2005-01-02 11:30:00','Customer1',50)

insert dbo.sample values ('2005-01-02 11:40:00','Customer1',50)

insert dbo.sample values ('2005-01-01 10:00:00','Customer2',50)

insert dbo.sample values ('2005-01-01 10:01:00','Customer2',50)

insert dbo.sample values ('2005-01-01 14:00:00','Customer3',50)

insert dbo.sample values ('2005-01-02 09:00:00','Customer3',45)

insert dbo.sample values ('2005-01-02 10:00:00','Customer2',30)

insert dbo.sample values ('2005-01-02 11:00:00','Customer3',45)

insert dbo.sample values ('2005-01-02 11:30:00','Customer2',50)

insert dbo.sample values ('2005-01-02 11:40:00','Customer3',50)

go

-- show data for test environment

select Name,EntryDate,location

from dbo.sample

order by Name,EntryDate,location

go

-- get duration at each location for each customer

-- duration is in minutes

--

-- take the data ordered ba Name, EntryDate and location

-- if record n and record n+1 are for the same Name and location,

-- duration is the difference in minutes of the EntryDate

select R.Name,R.location,duration=sum(convert(int,left(R.NextRecord,12)))

from

(

select s1.Name,s1.EntryDate,s1.location,NextRecord =

( select TOP 1 NextRecord=right(replicate('0',12) + convert(varchar(12),datediff(mi,s1.EntryDate,s2.EntryDate)),12) + ' ' + s2.Name + '@' + convert(varchar(10),s2.location)

from dbo.sample s2

where s2.EntryDate > s1.EntryDate and s1.Name = s2.Name )

from dbo.sample s1

--order by s1.Name,s1.EntryDate,s1.location

) R

where R.NextRecord is not null

and R.NextRecord like replicate('_',12) + ' ' + R.Name + '@' + convert(varchar(10),R.location)

group by R.Name,R.location

order by R.Name,R.location

go

/*

RESULT :

Name location duration

---------- -------- -----------

Customer1 50 250

Customer2 50 1

Customer3 45 120

*/

Frank Kalis SSC Guru Points: 111183 More actions · Answer 9

Now I can see, why this

2005-01-01 10:00:00 Customer1 50

2005-01-01 11:00:00 Customer1 50

will result in 60 minutes difference. However, why should this

2005-01-01 14:00:00 Customer1 50

2005-01-02 15:00:00 Customer1 50

also result in 60 minutes difference anyway. To me this would be 1500, e.g. 25 hours. Anyway, see if this will help a bit:

IF OBJECT_ID('#test')>0

DROP TABLE #test

GO

SET NOCOUNT ON

CREATE TABLE #test

(

ENTRYDATE DATETIME

, [name] VARCHAR(10)

, Location INT

)

INSERT INTO #test VALUES('2005-01-01 10:00:00', 'Customer1', 50)

INSERT INTO #test VALUES('2005-01-01 11:00:00', 'Customer1', 50)

INSERT INTO #test VALUES('2005-01-01 12:00:00', 'Customer1', 45)

INSERT INTO #test VALUES('2005-01-01 13:00:00', 'Customer1', 45)

INSERT INTO #test VALUES('2005-01-01 14:00:00', 'Customer1', 50)

INSERT INTO #test VALUES('2005-01-02 15:00:00', 'Customer1', 50)

SELECT

t1.location

, RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

, DATEDIFF(mi,t1.EntryDate,t2.EntryDate)

FROM

#test t1

JOIN

#test t2

ON

t1.location = t2.location

AND

(DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0)

OR

DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)+1=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0))

AND

RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

<

RIGHT(STUFF(CONVERT(CHAR(22),t2.EntryDate,13),21,4,' '),9)

WHERE

t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'

AND

t1.Location = 50

AND NOT EXISTS

(SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)

GROUP BY

t1.location

, t1.EntryDate

, t2.EntryDate

ORDER BY t1.EntryDate

SET NOCOUNT OFF

location

----------- --------- -----------

50 10:00:00 60

50 14:00:00 1500

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Frank Kalis SSC Guru Points: 111183 More actions · Answer 10

FWIW. Note the CASE expression to take care of period greater than 1 day:

IF OBJECT_ID('#test')>0

DROP TABLE #test

GO

SET NOCOUNT ON

CREATE TABLE #test

(

ENTRYDATE DATETIME

, [name] VARCHAR(10)

, Location INT

)

INSERT INTO #test VALUES('2005-01-01 10:00:00', 'Customer1', 50)

INSERT INTO #test VALUES('2005-01-01 11:00:00', 'Customer1', 50)

INSERT INTO #test VALUES('2005-01-01 12:00:00', 'Customer1', 45)

INSERT INTO #test VALUES('2005-01-01 13:00:00', 'Customer1', 45)

INSERT INTO #test VALUES('2005-01-01 14:00:00', 'Customer1', 50)

INSERT INTO #test VALUES('2005-01-02 15:00:00', 'Customer1', 50)

SELECT

t1.location

, RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

, CASE

WHEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)>=1440

THEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)-1440

ELSE DATEDIFF(mi,t1.EntryDate,t2.EntryDate)

END

FROM

#test t1

JOIN

#test t2

ON

t1.location = t2.location

AND

(DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0)

OR

DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)+1=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0))

AND

RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

<

RIGHT(STUFF(CONVERT(CHAR(22),t2.EntryDate,13),21,4,' '),9)

WHERE

t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'

AND

t1.Location = 50

AND NOT EXISTS

(SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)

GROUP BY

t1.location

, t1.EntryDate

, t2.EntryDate

ORDER BY t1.EntryDate

SET NOCOUNT OFF

location

----------- --------- -----------

50 10:00:00 60

50 14:00:00 60

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Simon Smith SSC Rookie Points: 29 More actions · Answer 11

The second date was a typo they should all have been on the same day (the 1st) - although it is perfectly feasible that the date could be the next day so the duration would span the night.

I'll have a look at your query later on today, Thanks Frank

Frank Kalis SSC Guru Points: 111183 More actions · Answer 12

Aargh, sometimes you can't see the wood for the trees:

SELECT

t1.location

, RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

, CASE

WHEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)>1440

THEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)-1440

ELSE DATEDIFF(mi,t1.EntryDate,t2.EntryDate)

END

FROM

#test t1

JOIN

#test t2

ON

t1.location = t2.location

AND

t1.EntryDate < t2.EntryDate

WHERE

t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'

AND

t1.Location = 50

AND NOT EXISTS

(SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)

GROUP BY

t1.location

, t1.EntryDate

, t2.EntryDate

ORDER BY t1.EntryDate

should also do

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

rhunt SSCertifiable Points: 5868 More actions · Answer 13

Frank, that is a nice solution. I typically use "IN" instead of "EXISTS" - which I am trying to get away from. Anyway, your "EXISTS" statement is confusing me so I was hoping you would elaborate on why it works (I tested it an know it does).

I understand that because of the t1.EntryDate < t2.EntryDate part of the join, there are extra records (for instance 10:00 - 14:00) that need to be eliminated. I'm just not sure why:

WHERE NOT EXISTS (SELECT * FROM test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)

is doing this. I read that they query will select locations <> 50 that have t3 times between t1 and t2 times. My problem is that your query is selecting locations <> 50 of which your where statement has allready excluded - so I would assume the NOT EXISTS would not have anything additional to exclude.

I am wrong of course since I can see it works. Would you mind explaining a bit? I am so used to "IN" where I am comparing an particular field. I guess with EXISTS I am comparing an entire record hit.

I have read BOL for EXISTS - but the example are very simplistic.

TIA

rhunt SSCertifiable Points: 5868 More actions · Answer 14

I'm not sure my post above properly explained my issue. I rewrote the EXISTS as I understand it:

SELECT t3.* FROM

test t3, test t2, test t1

WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate

This query returns:

ENTRYDATE name Location

------------------------------------------------------ ---------- -----------

2005-01-01 12:00:00.000 Customer1 45

2005-01-01 13:00:00.000 Customer1 45

I don't see why this result set is excluding the records you are trying to exclude - which are:

2005-01-01 10:00:00.000 2005-01-01 14:00:00.000 50 10:00:00 240

2005-01-01 10:00:00.000 2005-01-02 15:00:00.000 50 10:00:00 300

2005-01-01 11:00:00.000 2005-01-01 14:00:00.000 50 11:00:00 180

2005-01-01 11:00:00.000 2005-01-02 15:00:00.000 50 11:00:00 240