Records per hour within a time span

  • Hello,

    I'm trying to come up with a way to count how many records exist on a hourly basis within a period of time.

    I have a record with start time and stop time. I want to determine an hourly count of records that exist during the time span.

    The end goal will look like this.

    [hour of the day], [count of records]

    0, 4

    1, 10

    2, 6

    etc....

    If record 6876 had a start time of "2010-06-01 06:13:54.000" and a stop time of "2010-06-01 16:00:26.000"...I want to account for each hour in this record.

    --===== If the test table already exists, drop it

    IF OBJECT_ID('TempDB..#whosonwhen','U') IS NOT NULL

    DROP TABLE #whosonwhen

    --===== Create the test table

    CREATE TABLE #whosonwhen

    (

    [ID] INT,

    [Start] DATETIME,

    [Stop] DATETIME

    )

    --===== Insert the test data into the test table

    INSERT INTO #whosonwhen ([ID], [Start], [Stop])

    SELECT '6876','2010-06-01 06:13:54.000','2010-06-01 16:00:26.000' UNION ALL

    SELECT '6919','2010-06-01 07:56:59.000','2010-06-01 17:57:00.000' UNION ALL

    SELECT '6863','2010-06-01 08:37:00.000','2010-06-01 15:30:52.000' UNION ALL

    SELECT '6851','2010-06-01 08:18:00.000','2010-06-01 15:00:26.000' UNION ALL

    SELECT '6852','2010-06-01 09:02:17.000','2010-06-01 15:00:29.000' UNION ALL

    SELECT '683','2010-06-01 20:02:17.000','2010-06-02 02:00:29.000'

    Thanks!

  • K big question time.

    In you example ID 683 starts on the 1st but does not stop till the 2nd.

    When will this query be run and how would this record be handled. to have an accurrate hourly report that is to say a report that places the count into a bucket by hour then you have to have a cut off point. Idealy this cutoff would be midnight otherwise you have to get creative with your math. basically everything after midnight would be an extra hour and then you end up with more than 24 hours in a day. that can get confusing so I think the query depends on this record. I have an idea for a query but it the cutoff is not midnight then my idea is not valid.

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • Nice problem.

    Here is my first attempt:

    declare @periodStart datetime

    declare @periodEnd datetime

    -- they must be inside of the same day!!!!

    set @periodStart = '20100601'

    set @periodEnd = '2010-06-10 23:59:59.000'

    declare @givendate datetime

    set @givendate = convert(varchar(12),@periodStart,112)

    -- you may create this table by other ways...

    select top 24

    DATEADD(hour,ROW_NUMBER() OVER (ORDER BY Object_id)-1,@givendate) as HourOfDay

    ,ROW_NUMBER() OVER (ORDER BY Object_id)-1 as DayHour

    into #dh

    from sys.columns

    ;with correctRange

    as

    (

    select [ID]

    ,case when w.[Start] < @periodStart and w.[Stop] > @periodStart then @periodStart else w.[Start] end as [Start]

    ,case when w.[Stop] >= @periodEnd and w.[Start] < @periodEnd then @periodEnd else w.[Stop] end as [Stop]

    from #whosonwhen w

    )

    select d.DayHour, count(*) as RecCount

    from correctRange w

    join #dh d on d.HourOfDay between w.[Start] and w.[Stop]

    where w.[Start] >= @periodStart

    group by d.DayHour

    order by d.DayHour

    drop table #dh

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (6/18/2010)


    Nice problem.

    Here is my first attempt:

    declare @periodStart datetime

    declare @periodEnd datetime

    -- they must be inside of the same day!!!!

    set @periodStart = '20100601'

    set @periodEnd = '2010-06-10 23:59:59.000'

    declare @givendate datetime

    set @givendate = convert(varchar(12),@periodStart,112)

    -- you may create this table by other ways...

    select top 24

    DATEADD(hour,ROW_NUMBER() OVER (ORDER BY Object_id)-1,@givendate) as HourOfDay

    ,ROW_NUMBER() OVER (ORDER BY Object_id)-1 as DayHour

    into #dh

    from sys.columns

    ;with correctRange

    as

    (

    select [ID]

    ,case when w.[Start] < @periodStart and w.[Stop] > @periodStart then @periodStart else w.[Start] end as [Start]

    ,case when w.[Stop] >= @periodEnd and w.[Start] < @periodEnd then @periodEnd else w.[Stop] end as [Stop]

    from #whosonwhen w

    )

    select d.DayHour, count(*) as RecCount

    from correctRange w

    join #dh d on d.HourOfDay between w.[Start] and w.[Stop]

    where w.[Start] >= @periodStart

    group by d.DayHour

    order by d.DayHour

    drop table #dh

    Nicely done but I believe you're dropping the first hour. Also, be careful of the 23:59:59.000 thing. A lot of data can happen in the last second of the day not to mention the roundup to the next day at .998 should it occur in the data.

    Here's one possible solution using a zero based Tally table which also returns the count of 0 for hours not included in the original data.

    DECLARE @StartDay DATETIME,

    @NextStartDay DATETIME

    ;

    SELECT @StartDay = '20100601',

    @NextStartDay = '20100602'

    ;

    WITH

    cteDateRanges AS

    (

    SELECT DATEADD(hh, t.N, @StartDay) AS PeriodStart,

    DATEADD(hh, t.N + 1, @StartDay) AS NextPeriodStart

    FROM dbo.Tally t

    WHERE t.N < DATEDIFF(hh, @StartDay, @NextStartDay)

    )

    SELECT range.PeriodStart, COUNT(source.Start)

    FROM #whosonwhen source

    FULL JOIN cteDateRanges range

    ON range.PeriodStart <= source.Stop

    AND range.NextPeriodStart > source.Start

    GROUP BY range.PeriodStart

    ORDER BY range.PeriodStart

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Also be sure to truncate the @StartTime (of the reporting period) passed in down to the even hour, or you would get a totally incorrect result.

    Scott Pletcher, SQL Server MVP 2008-2010

  • Scott is correct. The code I posted has no checks for whole days nor even whole hours.

    If your code is passing the start date as a legal datetime value and not a string, here's a fast way for the code to force the start date to the beginning of the day and to ensure that you only get one day back...

    DECLARE @StartDay DATETIME,

    @NextStartDay DATETIME

    ;

    SELECT @StartDay = '20100601 13:17:19.123',

    @StartDay = CAST(DATEDIFF(dd, 0, @StartDay) AS DATETIME),

    @NextStartDay = DATEADD(dd,1,@StartDay)

    ;

    SELECT @startday

    ;

    Be careful, though.. if someone passes a string that looks like '23:59:59.999', it'll return the next day instead of the current day (at least in 2k5)...

    DECLARE @StartDay DATETIME,

    @NextStartDay DATETIME

    ;

    SELECT @StartDay = '20100601 23:59:59.999',

    @StartDay = CAST(DATEDIFF(dd, 0, @StartDay) AS DATETIME),

    @NextStartDay = DATEADD(dd,1,@StartDay)

    ;

    SELECT @startday

    ;

    ... and it does so even without the round down attempt.

    DECLARE @StartDay DATETIME,

    @NextStartDay DATETIME

    ;

    SELECT @StartDay = '20100601 23:59:59.999'

    ;

    SELECT @startday

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/18/2010)


    Eugene Elutin (6/18/2010)


    ...

    Nicely done but I believe you're dropping the first hour. Also, be careful of the 23:59:59.000 thing. A lot of data can happen in the last second of the day not to mention the roundup to the next day at .998 should it occur in the data.

    ...

    Jeff, you are right about missing first hour. Of cause, for a proper check the time range is required. The records for the last second in day should be fine as validated against the next date.

    Your query does proper check and returns the count of records for each hour between given dates in StartDate and NextStartDate, however if you add time part (eg. from 10:00:00.000 to 18:00:00.000) into input, then it does not behave properly - it returns raw for PeriodStart = NULL.

    So, another version is required (I will use "Tally" cte created on fly):

    declare @periodStart datetime

    declare @periodEnd datetime

    -- they must be inside of the same day, otherwise counting hours from 0 to 23 is not appropriate!!!!

    set @periodStart = '2010-06-01 03:15:00.000'

    set @periodEnd = '2010-06-01 21:20:00.000'

    -- Given Period is taken as inclusive of given hours in the input (eg. 15:25:30 will be taken as 15:00:00)

    -- frist make sure that the minutes, seconds and milliseconds are removed from input range for clarity

    set @periodStart = dateadd(hh, datepart(hh,@periodStart), convert(varchar(12),@periodStart,112))

    set @periodEnd = dateadd(hh, datepart(hh,@periodEnd), convert(varchar(12),@periodEnd,112))

    -- you may create this CTE by other ways (eg. from permanent Tally table)...

    ;with dh

    as

    (

    select top 24

    DATEADD(hour,ROW_NUMBER() OVER (ORDER BY [Object_id])-1,convert(varchar(12),@periodStart,112)) as HoDstart

    ,DATEADD(hour,ROW_NUMBER() OVER (ORDER BY [Object_id]),convert(varchar(12),@periodStart,112)) as HoDend

    ,ROW_NUMBER() OVER (ORDER BY Object_id)-1 as DayHour

    from sys.columns -- or any other (not very big) table which have more than 24 raws, just remamber to change

    -- [Object_id] in OVER (ORDER BY [Object_id]... to some existing column

    )

    select d.DayHour, count(w.ID) as RecCount

    from dh d

    left join #whosonwhen w

    on w.[Start] < d.HoDend

    and w.[Stop] >= d.HoDstart

    where d.HoDstart between @periodStart and @periodEnd

    group by d.DayHour

    order by d.DayHour

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (6/19/2010)


    Your query does proper check and returns the count of records for each hour between given dates in StartDate and NextStartDate, however if you add time part (eg. from 10:00:00.000 to 18:00:00.000) into input, then it does not behave properly - it returns raw for PeriodStart = NULL.

    Thanks, Eugene... I'll take a look at both.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ya know... after I got done tweaking my code, except for the way we gen the dates, the code came out nearly identical as your latest tweek. I'll also add that your original idea of creating a small temp table instead of using the CTE makes the code a whole lot faster in the face of any scalability. The optimizer thinks it needs to regen the 24 rows for each row in the source table on the CTE code instead of genning the 24 rows just once.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, in a final code I would use temp table (like in the first sample). But in this case I did it with CTE to save few lines of code :-D. I guess, this would also give opportunity to the requestor to learn a bit more about optimisation in case of inadequate performance*.

    *Of cause, the last bit, is just a nice excuse which came up in my mind, after reading your valid comment 😛

    I don't know why I'm still here - it's a Saturday at the end... 😀

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • It's interesting. I thought the requestor was interested in time-of-day across any number of days, rather than just on one 24-hr period.

    For example, for the last, say, 2 weeks, how much activity was there at a certain hour -- midnight? 1AM? 2AM? 3AM? -- without regard for which day it was.

    I was working on code for that but dropped it once the other approach seemed to be generally accepted by everyone else.

    Scott Pletcher, SQL Server MVP 2008-2010

  • Thanks for the replies! You've given me some great ideas. I'll do some testing on larger data sets and see how it goes.

    Thanks

    Dave

  • Eugene Elutin (6/19/2010)


    Jeff, in a final code I would use temp table (like in the first sample). But in this case I did it with CTE to save few lines of code :-D. I guess, this would also give opportunity to the requestor to learn a bit more about optimisation in case of inadequate performance*.

    *Of cause, the last bit, is just a nice excuse which came up in my mind, after reading your valid comment 😛

    I don't know why I'm still here - it's a Saturday at the end... 😀

    From the original description, I don't actually know what the real end result the OP wants is. Post your code... it can't hurt.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OK... I tried out Eugenes code on a much bigger data set. It appears to be aggregating hourly.

    08993

    18971

    28954

    38943

    48932

    58916

    68879

    78786

    88372

    96975

    104606

    112251

    12860

    13447

    14289

    15214

    16159

    17131

    18108

    1985

    2064

    2146

    2227

    2311

    What I'm trying to do is get a total hourly count, not a running total. It would look something like this.

    012

    187

    2100

    3256

    4980

    5589

    6278

    7109

    878

    956

    1050

    11267

    12345

    13106

    1434

    1523

    1616

    176

    1857

    1965

    2014

    219

    221

    23123

    Thanks !

  • daveb-840383 (6/22/2010)


    OK... I tried out Eugenes code on a much bigger data set. It appears to be aggregating hourly.

    What I'm trying to do is get a total hourly count, not a running total. It would look something like this.

    Thanks !

    Are you saying that you want an hourly count by day? And what do you want to show if you have a time span of more than 1 day?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply