Grouping by time - can't figure this out!

  • I have a bunch of data with timestamps - there may be any number of rows in a given time period, and I want to do averages and min/max for the time periods which are generated.

    First I use a CTE to generate the times - the interval is not always the same.

    ;with Times as (select convert(datetime2, 'Jun 29 2012 4:48:29:000PM') as PeriodTime

    union all

    select dateadd(minute, 1, PeriodTime) as PeriodTimeStart

    from Times where dateadd(minute, 1, PeriodTime) <= 'Jul 4 2012 12:00:00:000AM')

    select PeriodTime from Times order by PeriodTime option (maxrecursion 32767)

    I have logic earlier in the procedure to make sure I don't go over the 32767 rows limit - it picks either second, minute, hour, or day, and uses that in the CTE in the dateadd function. So, at the end of all that, I have a table with rows that look like this...

    PeriodTime

    2012-06-29 16:48:29.0000000

    2012-06-29 16:49:29.0000000

    2012-06-29 16:50:29.0000000

    2012-06-29 16:51:29.0000000

    2012-06-29 16:52:29.0000000

    2012-06-29 16:53:29.0000000

    2012-06-29 16:54:29.0000000

    2012-06-29 16:55:29.0000000

    2012-06-29 16:56:29.0000000

    2012-06-29 16:57:29.0000000

    2012-06-29 16:58:29.0000000

    2012-06-29 16:59:29.0000000

    2012-06-29 17:00:29.0000000

    2012-06-29 17:01:29.0000000

    2012-06-29 17:02:29.0000000

    2012-06-29 17:03:29.0000000

    Now, I have my data in another table and what I would like to do is get the average, min, and max of that data based on the times in the temp table. How can I do that? My best attempt is this... but it's not getting all the rows?

    select [time] = min(Times.PeriodTime), Y = AVG([POSITION]), YMin = MIN([POSITION]), YMax = MAX([POSITION])

    from [TLM_UMC-TGT_X_0], Times where TLM_TIME between Times.[PeriodTime] and dateadd(minute, 1, Times.[PeriodTime])

    option (maxrecursion 32767)

    I know that I could do this with three sub-queries, but isn't there a better way? THIS... is ugly... (and slow)

    ;with Times as (select convert(datetime2, 'Jun 29 2012 4:48:29:000PM') as PeriodTime

    union all

    select dateadd(minute, 1, PeriodTime) as PeriodTimeStart

    from Times where dateadd(minute, 1, PeriodTime) <= 'Jul 4 2012 12:00:00:000AM')

    select [time] = Times.PeriodTime,

    Y = (select AVG([POSITION]) from [TLM_UMC-TGT_X_0] where TLM_TIME between Times.[PeriodTime] and dateadd(minute, 1, Times.[PeriodTime])),

    YMin = (select MIN([POSITION]) from [TLM_UMC-TGT_X_0] where TLM_TIME between Times.[PeriodTime] and dateadd(minute, 1, Times.[PeriodTime])),

    YMax = (select MAX([POSITION]) from [TLM_UMC-TGT_X_0] where TLM_TIME between Times.[PeriodTime] and dateadd(minute, 1, Times.[PeriodTime]))

    from Times option (maxrecursion 32767)

  • Could you post the DDL for your tables, some sample data (INSERT INTO statements), and expected results based on the sample data? This would really help us help you.

  • If I understand correctly, maybe something like below.

    [Btw, as a general performance guideline, make sure the datatype of the time in the table exactly matches your list of times. Your times list looks to be datetime2, so I'll assume your table also has datetime2. If not, change all the types to match the time type in your table.]

    DECLARE @MinutesInterval int

    SET @MinutesInterval = 1 --for min; for hr, set to 60; etc..

    ;WITH Times (

    ...

    ), Times_Range AS (

    SELECT

    MIN([PeriodTime]) AS StartTime,

    DATEADD(MINUTE, 1, MAX([PeriodTime])) AS EndTime

    )

    SELECT

    Times.[PeriodTime],

    AVG(tlm.[POSITION]) AS Y,

    MIN(tlm.[POSITION]) AS YMin,

    MAX(tlm.[POSITION]) AS YMax

    FROM Times

    LEFT OUTER JOIN [TLM_UMC-TGT_X_0] tlm ON

    tlm.TLM_TIME >= Times.[PeriodTime] AND

    tlm.TLM_TIME < DATEADD(MINUTE, @MinutesInterval, Times.[PeriodTime])

    WHERE

    tlm.TLM_TIME >= Times_Range.StartTime AND tlm.TLM_TIME < Times_Range.EndTime

    GROUP BY

    Times.[PeriodTime]

    ORDER BY

    Times.[PeriodTime]

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Yes Scott, thanks. That's the kind of change I was looking for. I think performance will be better with that but I'll test and let you know. I was just a little frustrated - I sit here next to a slamming door all day and it really messes with my head and sometimes I'm not sure if I'm doing the best code I can. In a normal environment maybe I would have come up with the grouping query myself, but thank you very much, it is really helpful!

  • Lynn Pettis (12/14/2012)


    Could you post the DDL for your tables, some sample data (INSERT INTO statements), and expected results based on the sample data? This would really help us help you.

    No I can't really do that for two reasons: one is that the database is classified and I probably posted too much already, and the other is you need thousands of points to make any sense of it - this data has 100 to 1000 rows per second.

    The telemetry table ([TLM_UMC-TGT_X_0]) has a TLM_TIME column of datetime2, and a POSITION column of float - those are the only columns that matter for this issue. So, you can assume it's...

    create table [TLM_UMC-TGT_X_0] (

    TLM_TIME datetime2,

    POSITION float

    )

    There is an index on the TLM_TIME, but it's not the clustered index until the next release - that should give a massive improvement in queries by TLM_TIME, which is all we ever do.

  • How about something like this:

    SET NOCOUNT ON;

    USE ProofOfConcept;

    GO

    IF OBJECT_ID(N'dbo.Periods') IS NOT NULL

    DROP TABLE dbo.Periods;

    GO

    CREATE TABLE dbo.Periods

    (Period INT PRIMARY KEY);

    GO

    INSERT INTO dbo.Periods

    (Period)

    SELECT TOP (32768)

    ROW_NUMBER() OVER (ORDER BY t1.OBJECT_ID) - 1 /* 0 through 32767 */

    FROM sys.columns AS t1

    CROSS JOIN sys.columns AS t2

    CROSS JOIN sys.columns AS t3;

    GO

    IF OBJECT_ID(N'dbo.[TLM_UMC-TGT_X_0]') IS NOT NULL

    DROP TABLE dbo.[TLM_UMC-TGT_X_0];

    GO

    CREATE TABLE dbo.[TLM_UMC-TGT_X_0]

    (ID INT IDENTITY

    PRIMARY KEY,

    TLM_TIME DATETIME,

    POSITION FLOAT);

    GO

    INSERT INTO dbo.[TLM_UMC-TGT_X_0]

    (TLM_TIME, POSITION)

    VALUES ('20120629 16:48:29.000', 1),

    ('20120629 16:48:29.000', 5),

    ('20120629 16:49:29.000', 10);

    GO

    -- If you just want periods that have data

    SELECT DATEADD(MINUTE, Period, '20120629 16:48:29'),

    AVG(POSITION),

    MIN(POSITION),

    MAX(POSITION)

    FROM dbo.Periods AS P

    INNER JOIN dbo.[TLM_UMC-TGT_X_0] AS T

    ON T.TLM_TIME >= DATEADD(MINUTE, Period, '20120629 16:48:29')

    AND T.TLM_TIME < DATEADD(MINUTE, Period + 1, '20120629 16:48:29')

    GROUP BY DATEADD(MINUTE, Period, '20120629 16:48:29')

    ORDER BY DATEADD(MINUTE, Period, '20120629 16:48:29');

    -- If you also want periods without data

    SELECT DATEADD(MINUTE, Period, '20120629 16:48:29'),

    T.MinPosition,

    T.MaxPosition,

    T.AvgPosition

    FROM dbo.Periods AS P

    OUTER APPLY (SELECT MIN(POSITION) AS MinPosition,

    MAX(POSITION) AS MaxPosition,

    AVG(POSITION) AS AvgPosition

    FROM dbo.[TLM_UMC-TGT_X_0] AS T

    WHERE T.TLM_TIME >= DATEADD(MINUTE, Period, '20120629 16:48:29')

    AND T.TLM_TIME < DATEADD(MINUTE, Period + 1, '20120629 16:48:29')) AS T

    ORDER BY P.Period;

    Once you have the Periods table created, you don't need to do the calculations on the fly any more. (This makes it a classic Numbers table [aka Tally table] solution. Those are popular because they do this kind of thing very, very efficiently.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply