Contigous Data Periods

  • Hi

    This is my first post on this forum - so go easy on me please!!

    Basically here is some sample data (simplified slightly)

    EntryDate                          Name                         Location

    ------------------------------------------------------------

    2005-01-01 10:00:00           Customer1                      50

    2005-01-01 10:01:00           Customer1                      50

    2005-01-01 14:00:00           Customer1                      50

    2005-01-02 09:00:00           Customer1                      45

    2005-01-02 10:00:00           Customer1                      30

    2005-01-02 11:00:00           Customer1                      45

    2005-01-02 11:30:00           Customer1                      50

    2005-01-02 11:40:00           Customer1                      50

    Basically I need to know (within a given date range - so for example a week) how long Customer1 has been at location 50

    So in this example (with the supplied date range covering ALL of the data above) I would expect the result Customer1 - 4 hours 10 minutes.

    The problem I am having is finding a quick way to identify contiguous blocks where the location is 50, so I can do a DateDiff on the Start and End EntryDates of each block.

    Can anyone help me with this?

    Thanks

    Simon

     

     

  • Something like this?

    SELECT

     MIN(location) location

     , MIN([name]) [name]

     , MIN(EntryDate) entrydate

     , DATEDIFF(mi,MIN(EntryDate),MAX(EntryDate)) diff_in_minutes_per_day

    FROM 

     < your_table >

    WHERE

     EntryDate>='20050101' AND EntryDate<'20050103'

    AND

     Location = 50

    GROUP BY

     [name]

     , DAY(EntryDate)

    ?

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Frank Kalis - Thanks for your reply much appreciated. Whilst your query does exactly what I want in terms of the data shown above - it only works if there is only one period in any one day where the customer is at the location. If they visit twice then it works out the minutes as the difference between the start of the first visit and the end of the last visit - i.e too long a duration.

    Any other ideas

    Thanks

  • You mean he can visit multiple times a day? And you only have an EntryDate and no ExitDate (or something like that)? How will you measure that duration? And what happens to the time someone needs to get from one location to the other?

    You got me a bit confused

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Frank

    You got it in a nutshell, I have vastly simplified the data, however there are good reasons why things like exit date are not available. The Exit date (and therefore the duration) have to be obtained by looking at when the next record is not in the location we are after hence my original post about identifying contiguous sections of data with the same location code (so that I can get the start and end dates of each "block")....

    Its one of the more difficult problems I've had to deal with recently

     

  • Simon,

    You have to be a little more explicit on how you define CONTIGUOS, this is why:

    1. you said in the previous post that you could have more than one interval on the same date at the same location.

    2. You don't want to get the difference between the first Date and the last on the same day at the same location

    then how do you determine that there is an interval without an exit date?

    HTH

     

     


    * Noel

  • The way your say this, sounds like you have not control over the design, right? However, if you are able to change the table design, you should really do this. I think almost every solution in the current situation will be kind of a kludge. Anyway, I think I have seen something similar in the Google Archives some weeks ago. Have you searched there already. I think it was a posting by David Portas or Steve Kass, but I'm not sure on this. You might search there. From what I understand now, one solution that will work is to use a cursor but I'll keep thinking about it.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Noel

    Say for example someone is at a location for two periods in one day. The only way I can tell from the data that I have available that they have visited twice is that because inbetween the two periods there are records where they are somewhere else. That is why this query is so difficult. Here is some more sample data to explain better

    2005-01-01 10:00:00           Customer1                      50

    2005-01-01 11:00:00           Customer1                      50

    2005-01-01 12:00:00           Customer1                      45

    2005-01-01 13:00:00           Customer1                      45

    2005-01-01 14:00:00           Customer1                      50

    2005-01-02 15:00:00           Customer1                      50

    I need a query to determine both the green and red dates so that I can perform a date diff (as individual periods) to produce the total of 2 hours for the day. The only data I have to determine these periods is somehow identifying unbroken continous sections that have a matching location id (in this case 50). That is my question really how do I identify the red and green dates above from the small amount of data available.

    Thanks

    Simon

  • Simon,

    I am not usre I understood the request, but here is some code :

    set nocount on

    go

    --exec sp__droptable 'dbo.sample'

    go

    -- set up a test environment

    create table dbo.sample

    (

     EntryDate datetime not null,

     Name varchar(10) not null,

     location tinyint not null

    )

    insert dbo.sample values ('2005-01-01 10:00:00','Customer1',50)

    insert dbo.sample values ('2005-01-01 10:01:00','Customer1',50)

    insert dbo.sample values ('2005-01-01 14:00:00','Customer1',50)

    insert dbo.sample values ('2005-01-02 09:00:00','Customer1',45)

    insert dbo.sample values ('2005-01-02 10:00:00','Customer1',30)

    insert dbo.sample values ('2005-01-02 11:00:00','Customer1',45)

    insert dbo.sample values ('2005-01-02 11:30:00','Customer1',50)

    insert dbo.sample values ('2005-01-02 11:40:00','Customer1',50)

    insert dbo.sample values ('2005-01-01 10:00:00','Customer2',50)

    insert dbo.sample values ('2005-01-01 10:01:00','Customer2',50)

    insert dbo.sample values ('2005-01-01 14:00:00','Customer3',50)

    insert dbo.sample values ('2005-01-02 09:00:00','Customer3',45)

    insert dbo.sample values ('2005-01-02 10:00:00','Customer2',30)

    insert dbo.sample values ('2005-01-02 11:00:00','Customer3',45)

    insert dbo.sample values ('2005-01-02 11:30:00','Customer2',50)

    insert dbo.sample values ('2005-01-02 11:40:00','Customer3',50)

    go

    -- show data for test environment

    select Name,EntryDate,location

    from dbo.sample

    order by Name,EntryDate,location

    go

    -- get duration at each location for each customer

    -- duration is in minutes

    --

    -- take the data ordered ba Name, EntryDate and location

    -- if record n and record n+1 are for the same Name and location,

    -- duration is the difference in minutes of the EntryDate

    select R.Name,R.location,duration=sum(convert(int,left(R.NextRecord,12)))

    from

    (

     select s1.Name,s1.EntryDate,s1.location,NextRecord =

     ( select TOP 1 NextRecord=right(replicate('0',12) + convert(varchar(12),datediff(mi,s1.EntryDate,s2.EntryDate)),12) + ' ' + s2.Name + '@' + convert(varchar(10),s2.location)

        from dbo.sample s2

        where s2.EntryDate > s1.EntryDate and s1.Name = s2.Name )

     from dbo.sample s1

     --order by s1.Name,s1.EntryDate,s1.location

    ) R

    where R.NextRecord is not null

      and R.NextRecord like replicate('_',12) + ' ' + R.Name + '@' + convert(varchar(10),R.location)

    group by R.Name,R.location

    order by R.Name,R.location

    go

    /*

    RESULT :

    Name       location duration   

    ---------- -------- -----------

    Customer1  50       250

    Customer2  50       1

    Customer3  45       120

    */

  • Now I can see, why this

    2005-01-01 10:00:00           Customer1                      50

    2005-01-01 11:00:00           Customer1                      50

    will result in 60 minutes difference. However, why should this

    2005-01-01 14:00:00           Customer1                      50

    2005-01-02 15:00:00           Customer1                      50

    also result in 60 minutes difference anyway. To me this would be 1500, e.g. 25 hours. Anyway, see if this will help a bit:

    IF OBJECT_ID('#test')>0

     DROP TABLE #test

    GO

    SET NOCOUNT ON

    CREATE TABLE #test

    (

     ENTRYDATE DATETIME

     , [name] VARCHAR(10)

     , Location INT

    )

    INSERT INTO #test VALUES('2005-01-01 10:00:00', 'Customer1', 50)

    INSERT INTO #test VALUES('2005-01-01 11:00:00', 'Customer1', 50)

    INSERT INTO #test VALUES('2005-01-01 12:00:00', 'Customer1', 45)

    INSERT INTO #test VALUES('2005-01-01 13:00:00', 'Customer1', 45)

    INSERT INTO #test VALUES('2005-01-01 14:00:00', 'Customer1', 50)

    INSERT INTO #test VALUES('2005-01-02 15:00:00', 'Customer1', 50)

    SELECT

     t1.location

     , RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

     , DATEDIFF(mi,t1.EntryDate,t2.EntryDate)

    FROM

     #test t1

    JOIN

     #test t2

    ON

     t1.location = t2.location

    AND

     (DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0)

     OR

     DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)+1=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0))

    AND

     RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

     <

     RIGHT(STUFF(CONVERT(CHAR(22),t2.EntryDate,13),21,4,' '),9)

    WHERE

     t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'

    AND

     t1.Location = 50

    AND NOT EXISTS

     (SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)

    GROUP BY

     t1.location

     , t1.EntryDate

     , t2.EntryDate

    ORDER BY t1.EntryDate

    SET NOCOUNT OFF

    location                         

    ----------- --------- -----------

    50          10:00:00  60

    50          14:00:00  1500

     

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • FWIW. Note the CASE expression to take care of period greater than 1 day:

    IF OBJECT_ID('#test')>0

     DROP TABLE #test

    GO

    SET NOCOUNT ON

    CREATE TABLE #test

    (

     ENTRYDATE DATETIME

     , [name] VARCHAR(10)

     , Location INT

    )

    INSERT INTO #test VALUES('2005-01-01 10:00:00', 'Customer1', 50)

    INSERT INTO #test VALUES('2005-01-01 11:00:00', 'Customer1', 50)

    INSERT INTO #test VALUES('2005-01-01 12:00:00', 'Customer1', 45)

    INSERT INTO #test VALUES('2005-01-01 13:00:00', 'Customer1', 45)

    INSERT INTO #test VALUES('2005-01-01 14:00:00', 'Customer1', 50)

    INSERT INTO #test VALUES('2005-01-02 15:00:00', 'Customer1', 50)

    SELECT

     t1.location

     , RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

     , CASE

        WHEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)>=1440

        THEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)-1440

        ELSE DATEDIFF(mi,t1.EntryDate,t2.EntryDate)

       END

    FROM

     #test t1

    JOIN

     #test t2

    ON

     t1.location = t2.location

    AND

     (DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0)

     OR

     DATEADD(d,DATEDIFF(d,0,t1.EntryDate),0)+1=DATEADD(d,DATEDIFF(d,0,t2.EntryDate),0))

    AND

     RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

     <

     RIGHT(STUFF(CONVERT(CHAR(22),t2.EntryDate,13),21,4,' '),9)

    WHERE

     t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'

    AND

     t1.Location = 50

    AND NOT EXISTS

     (SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)

    GROUP BY

     t1.location

     , t1.EntryDate

     , t2.EntryDate

    ORDER BY t1.EntryDate

    SET NOCOUNT OFF

    location                         

    ----------- --------- -----------

    50          10:00:00  60

    50          14:00:00  60

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • The second date was a typo they should all have been on the same day (the 1st) - although it is perfectly feasible that the date could be the next day so the duration would span the night.

    I'll have a look at your query later on today, Thanks Frank

  • Aargh, sometimes you can't see the wood for the trees:

    SELECT

     t1.location

     , RIGHT(STUFF(CONVERT(CHAR(22),t1.EntryDate,13),21,4,' '),9)

     , CASE

        WHEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)>1440

        THEN DATEDIFF(mi,t1.EntryDate,t2.EntryDate)-1440

        ELSE DATEDIFF(mi,t1.EntryDate,t2.EntryDate)

       END

    FROM

     #test t1

    JOIN

     #test t2

    ON

     t1.location = t2.location

    AND

     t1.EntryDate < t2.EntryDate

    WHERE

     t1.EntryDate>='20050101' AND t1.EntryDate<='20050103'

    AND

     t1.Location = 50

    AND NOT EXISTS

     (SELECT * FROM #test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)

    GROUP BY

     t1.location

     , t1.EntryDate

     , t2.EntryDate

    ORDER BY t1.EntryDate

    should also do

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Frank, that is a nice solution.  I typically use "IN" instead of "EXISTS" - which I am trying to get away from.  Anyway, your "EXISTS" statement is confusing me so I was hoping you would elaborate on why it works (I tested it an know it does).

    I understand that because of the t1.EntryDate < t2.EntryDate part of the join, there are extra records (for instance 10:00 - 14:00) that need to be eliminated.  I'm just not sure why:

     WHERE NOT EXISTS (SELECT * FROM test t3 WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate)

    is doing this.  I read that they query will select locations <> 50 that have t3 times between t1 and t2 times.  My problem is that your query is selecting locations <> 50 of which your where statement has allready excluded - so I would assume the NOT EXISTS would not have anything additional to exclude.

    I am wrong of course since I can see it works.  Would you mind explaining a bit?  I am so used to "IN" where I am comparing an particular field.  I guess with EXISTS I am comparing an entire record hit.

    I have read BOL for EXISTS - but the example are very simplistic.

    TIA

  • I'm not sure my post above properly explained my issue.  I rewrote the EXISTS as I understand it:

    SELECT t3.* FROM

    test t3, test t2, test t1

    WHERE t3.location != 50 AND t3.EntryDate BETWEEN t1.EntryDate AND t2.EntryDate

    This query returns:

    ENTRYDATE                                              name       Location   

    ------------------------------------------------------ ---------- -----------

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 12:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    2005-01-01 13:00:00.000                                Customer1  45

    I don't see why this result set is excluding the records you are trying to exclude - which are:

    2005-01-01 10:00:00.000    2005-01-01 14:00:00.000    50    10:00:00    240

    2005-01-01 10:00:00.000    2005-01-02 15:00:00.000    50    10:00:00    300

    2005-01-01 11:00:00.000    2005-01-01 14:00:00.000    50    11:00:00    180

    2005-01-01 11:00:00.000    2005-01-02 15:00:00.000    50    11:00:00    240

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply