Grouping records by time interval

  • I'm connecting to SQL Server 2008 R2.

  • geert.de.vylder (3/4/2013)


    I'm connecting to SQL Server 2008 R2.

    Run this: SELECT @@VERSION

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • geert.de.vylder (3/4/2013)


    Hi ChrisM@Work,

    Yes, event_channel has to be populated, even if there isn't a match in the Tussen_Ticket table. This is because I have to update the results into another table in another database.

    Thank you very much for your help and assistance.

    Greetz,

    Geert

    You're very welcome. Your cooperation is very very much appreciated. Here's a modified query which takes Event_Channel into account as discussed:

    SELECT mx.Ticket_Closed_Date, mx.Interval, op.*

    FROM ( -- create a matrix containing all intervals from Tijdsintervallen and all dates & Event_Channels from Tussen_Tickets

    SELECT *

    FROM Tijdsintervallen

    CROSS JOIN (

    SELECT DISTINCT Ticket_Closed_Date = CAST(Ticket_Closed_DateTime AS DATE)

    FROM Tussen_Tickets

    ) d

    CROSS JOIN (

    SELECT DISTINCT Event_Channel

    FROM Tussen_Tickets

    ) e

    ) mx

    OUTER APPLY (

    SELECT tt.Event_Channel,

    DVBIntern = SUM(tt.DVBIntern),

    DVBExtern = SUM(tt.DVBExtern),

    Tijdsduur = SUM(tt.Tijdsduur),

    WrapUp = SUM(tt.WrapUp),

    OnHold = SUM(tt.OnHold),

    Talk = SUM(tt.Talk),

    RowsAggregated = COUNT(*) -- not in spec but handy to have

    FROM Tussen_Tickets tt

    CROSS APPLY (

    SELECT

    Ticket_Closed_Time = CAST(tt.Ticket_Closed_DateTime AS TIME(3)),

    Ticket_Closed_Date = CAST(tt.Ticket_Closed_DateTime AS DATE)

    ) ta

    WHERE ta.Ticket_Closed_Time BETWEEN mx.beginTijdsinterval AND mx.eindTijdsinterval

    AND ta.Ticket_Closed_Date = mx.Ticket_Closed_Date

    AND tt.Event_Channel = mx.Event_Channel

    GROUP BY ta.Ticket_Closed_Date, tt.Event_Channel

    ) op

    ORDER BY mx.Ticket_Closed_Date, mx.Interval, op.Event_Channel;

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • With the added cost of Event_Channel in the matrix table correlating with the inner query, it may be worthwhile splitting the query up, like this:

    SELECT *

    INTO #Matrix

    FROM Tijdsintervallen

    CROSS JOIN (

    SELECT DISTINCT Ticket_Closed_Date = CAST(Ticket_Closed_DateTime AS DATE)

    FROM Tussen_Tickets

    ) d

    CROSS JOIN (

    SELECT DISTINCT Event_Channel

    FROM Tussen_Tickets

    ) e

    CREATE UNIQUE CLUSTERED INDEX ucx_Everything ON #Matrix (Event_Channel, Ticket_Closed_Date, beginTijdsinterval)

    -----------------------------------------------------

    SELECT mx.Ticket_Closed_Date, mx.Interval, op.*

    FROM #Matrix mx

    OUTER APPLY (

    SELECT tt.Event_Channel,

    DVBIntern = SUM(tt.DVBIntern),

    DVBExtern = SUM(tt.DVBExtern),

    Tijdsduur = SUM(tt.Tijdsduur),

    WrapUp = SUM(tt.WrapUp),

    OnHold = SUM(tt.OnHold),

    Talk = SUM(tt.Talk),

    RowsAggregated = COUNT(*) -- not in spec but handy to have

    FROM Tussen_Tickets tt

    CROSS APPLY (

    SELECT

    Ticket_Closed_Time = CAST(tt.Ticket_Closed_DateTime AS TIME(3)),

    Ticket_Closed_Date = CAST(tt.Ticket_Closed_DateTime AS DATE)

    ) ta

    WHERE ta.Ticket_Closed_Time BETWEEN mx.beginTijdsinterval AND mx.eindTijdsinterval

    AND ta.Ticket_Closed_Date = mx.Ticket_Closed_Date

    AND tt.Event_Channel = mx.Event_Channel

    GROUP BY ta.Ticket_Closed_Date, tt.Event_Channel

    ) op

    ORDER BY mx.Ticket_Closed_Date, mx.Interval, op.Event_Channel;

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi,

    I connect to version: Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64) Apr 2 2010 15:48:46 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: ) (Hypervisor)

    Greetz,

    Geert

  • ChrisM@Work (3/4/2013)


    With the added cost of Event_Channel in the matrix table correlating with the inner query, it may be worthwhile splitting the query up, like this:

    SELECT *

    INTO #Matrix

    FROM Tijdsintervallen

    CROSS JOIN (

    SELECT DISTINCT Ticket_Closed_Date = CAST(Ticket_Closed_DateTime AS DATE)

    FROM Tussen_Tickets

    ) d

    CROSS JOIN (

    SELECT DISTINCT Event_Channel

    FROM Tussen_Tickets

    ) e

    CREATE UNIQUE CLUSTERED INDEX ucx_Everything ON #Matrix (Event_Channel, Ticket_Closed_Date, beginTijdsinterval)

    -----------------------------------------------------

    SELECT mx.Ticket_Closed_Date, mx.Interval, op.*

    FROM #Matrix mx

    OUTER APPLY (

    SELECT tt.Event_Channel,

    DVBIntern = SUM(tt.DVBIntern),

    DVBExtern = SUM(tt.DVBExtern),

    Tijdsduur = SUM(tt.Tijdsduur),

    WrapUp = SUM(tt.WrapUp),

    OnHold = SUM(tt.OnHold),

    Talk = SUM(tt.Talk),

    RowsAggregated = COUNT(*) -- not in spec but handy to have

    FROM Tussen_Tickets tt

    CROSS APPLY (

    SELECT

    Ticket_Closed_Time = CAST(tt.Ticket_Closed_DateTime AS TIME(3)),

    Ticket_Closed_Date = CAST(tt.Ticket_Closed_DateTime AS DATE)

    ) ta

    WHERE ta.Ticket_Closed_Time BETWEEN mx.beginTijdsinterval AND mx.eindTijdsinterval

    AND ta.Ticket_Closed_Date = mx.Ticket_Closed_Date

    AND tt.Event_Channel = mx.Event_Channel

    GROUP BY ta.Ticket_Closed_Date, tt.Event_Channel

    ) op

    ORDER BY mx.Ticket_Closed_Date, mx.Interval, op.Event_Channel;

    Hi ChrisM@Work,

    Thank you for your help. I did run this query and it generated 1878912 records. Taking a close look, everything is correct but if the rows are null they are repeated several times. In attachment you find the result in a zipped text file.

    By the way, with the index on the Tussen_Tickets table this query runs in a little more than 1 minute. That's a large improvement.

    Greetz,

    Geert

  • I think the window functions should be of help here.

    Please repost the data I am not able to get the data you have posted.

  • geert.de.vylder (3/5/2013)


    ... if the rows are null they are repeated several times....

    There will be many many rows in the matrix table which don't have a matching ticket, hence null values in columns sourced from the ticket table. However, there shouldn't be any duplicate rows at all. Have you tested for this?

    You may get another performance lift from preprocessing the ticket table:

    SELECT *

    INTO #Matrix

    FROM Tijdsintervallen

    CROSS JOIN (

    SELECT DISTINCT Ticket_Closed_Date = CAST(Ticket_Closed_DateTime AS DATE)

    FROM Tussen_Tickets

    ) d

    CROSS JOIN (

    SELECT DISTINCT Event_Channel

    FROM Tussen_Tickets

    ) e

    CREATE UNIQUE CLUSTERED INDEX ucx_Everything ON #Matrix (Event_Channel, Ticket_Closed_Date, beginTijdsinterval)

    -----------------------------------------------------

    SELECT

    tt.Event_Channel,

    ta.Ticket_Closed_Date,

    ta.beginTijdsinterval,

    DVBIntern = SUM(tt.DVBIntern),

    DVBExtern = SUM(tt.DVBExtern),

    Tijdsduur = SUM(tt.Tijdsduur),

    WrapUp = SUM(tt.WrapUp),

    OnHold = SUM(tt.OnHold),

    Talk = SUM(tt.Talk),

    RowsAggregated = COUNT(*) -- not in spec but handy to have

    INTO #Tussen_Tickets

    FROM Tussen_Tickets tt

    CROSS APPLY (

    SELECT

    beginTijdsinterval = CAST(DATEADD(MINUTE,0-(DATEDIFF(MINUTE,0,tt.Ticket_Closed_DateTime)%15),tt.Ticket_Closed_DateTime) AS TIME(3)),

    Ticket_Closed_Date = CAST(tt.Ticket_Closed_DateTime AS DATE)

    ) ta

    GROUP BY tt.Event_Channel, ta.Ticket_Closed_Date, ta.beginTijdsinterval

    CREATE UNIQUE CLUSTERED INDEX ucx_Tussen_Tickets_NearlyEverything ON #Tussen_Tickets (Event_Channel, Ticket_Closed_Date, beginTijdsinterval)

    ---------------------------------------

    SELECT mx.Ticket_Closed_Date, mx.Event_Channel, mx.Interval,

    tt.DVBIntern, tt.DVBExtern, tt.Tijdsduur, tt.WrapUp, tt.OnHold, tt.Talk, tt.RowsAggregated

    FROM #Matrix mx

    LEFT JOIN #Tussen_Tickets tt

    ON tt.Event_Channel = mx.Event_Channel

    AND tt.Ticket_Closed_Date = mx.Ticket_Closed_Date

    AND tt.beginTijdsinterval = mx.beginTijdsinterval

    ORDER BY mx.Ticket_Closed_Date, mx.Event_Channel, mx.Interval;

    -- 11520 rows

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • b ghanekar (3/6/2013)


    I think the window functions should be of help here.

    Please repost the data I am not able to get the data you have posted.

    The fourth post on this thread has a .txt attachment which contains the ddl and dml and it works just fine. I'd be very interested to see how window functions might be used in a solution.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi all,

    I adjusted the TestTables file. I included the complete Tussen_Tickets data which is actualy in our table. And I adjusted some other things.

    Hope this helps.

    You will find it in attachment.

    Greetz,

    Geert

  • geert.de.vylder (3/6/2013)


    Hi all,

    I adjusted the TestTables file. I included the complete Tussen_Tickets data which is actualy in our table. And I adjusted some other things.

    Hope this helps.

    You will find it in attachment.

    Greetz,

    Geert

    There's an error in the ddl for the interval table - check the PK.

    When providing a script for loading data, always check it first. I know from experience that

    INSERT INTO ...

    SELECT ... UNION ALL

    throws a wobbly on anything more than say 10,000 rows. Break it up into sections of a couple of thousand rows.

    Try this:

    -------------------

    DROP TABLE #Matrix

    SELECT *

    INTO #Matrix

    FROM Tijdsintervallen

    CROSS JOIN (

    SELECT DISTINCT Ticket_Closed_Date = CAST(Ticket_Closed_DateTime AS DATE)

    FROM Tussen_Tickets

    WHERE Ticket_Closed_DateTime IS NOT NULL

    ) d

    CROSS JOIN (

    SELECT DISTINCT Event_Channel

    FROM Tussen_Tickets

    ) e

    -- (1,180,224 row(s) affected) / 00:00:04

    CREATE UNIQUE CLUSTERED INDEX ucx_Everything ON #Matrix (Event_Channel, Ticket_Closed_Date, beginTijdsinterval)

    -----------------------------------------------------

    -- there's a row in Tussen_Tickets where Ticket_Closed_DateTime is NULL:

    DROP TABLE #Tussen_Tickets

    SELECT

    tt.Event_Channel,

    ta.Ticket_Closed_Date,

    ta.beginTijdsinterval,

    DVBIntern = SUM(tt.DVBIntern),

    DVBExtern = SUM(tt.DVBExtern),

    Tijdsduur = SUM(tt.Tijdsduur),

    WrapUp = SUM(tt.WrapUp),

    OnHold = SUM(tt.OnHold),

    Talk = SUM(tt.Talk),

    RowsAggregated = COUNT(*) -- not in spec but handy to have

    INTO #Tussen_Tickets

    FROM Tussen_Tickets tt

    CROSS APPLY (

    SELECT

    beginTijdsinterval = CAST(DATEADD(MINUTE,0-(DATEDIFF(MINUTE,0,tt.Ticket_Closed_DateTime)%15),tt.Ticket_Closed_DateTime) AS TIME(3)),

    Ticket_Closed_Date = CAST(tt.Ticket_Closed_DateTime AS DATE)

    ) ta

    WHERE tt.Ticket_Closed_DateTime IS NOT NULL

    GROUP BY tt.Event_Channel, ta.Ticket_Closed_Date, ta.beginTijdsinterval

    -- (24,740 row(s) affected) / 00:00:01

    CREATE UNIQUE CLUSTERED INDEX ucx_Tussen_Tickets_NearlyEverything ON #Tussen_Tickets (Event_Channel, Ticket_Closed_Date, beginTijdsinterval)

    ---------------------------------------

    SELECT mx.Ticket_Closed_Date, mx.Event_Channel, mx.Interval,

    tt.DVBIntern, tt.DVBExtern, tt.Tijdsduur, tt.WrapUp, tt.OnHold, tt.Talk, tt.RowsAggregated

    FROM #Matrix mx

    LEFT JOIN #Tussen_Tickets tt

    ON tt.Event_Channel = mx.Event_Channel

    AND tt.Ticket_Closed_Date = mx.Ticket_Closed_Date

    AND tt.beginTijdsinterval = mx.beginTijdsinterval

    ORDER BY mx.Ticket_Closed_Date, mx.Event_Channel, mx.Interval;

    -- (1,180,224 row(s) affected) / 00:00:17 (time is network + display)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (3/6/2013)


    There's an error in the ddl for the interval table - check the PK.

    When providing a script for loading data, always check it first. I know from experience that

    INSERT INTO ...

    SELECT ... UNION ALL

    throws a wobbly on anything more than say 10,000 rows. Break it up into sections of a couple of thousand rows.

    Try this:

    -------------------

    DROP TABLE #Matrix

    SELECT *

    INTO #Matrix

    FROM Tijdsintervallen

    CROSS JOIN (

    SELECT DISTINCT Ticket_Closed_Date = CAST(Ticket_Closed_DateTime AS DATE)

    FROM Tussen_Tickets

    WHERE Ticket_Closed_DateTime IS NOT NULL

    ) d

    CROSS JOIN (

    SELECT DISTINCT Event_Channel

    FROM Tussen_Tickets

    ) e

    -- (1,180,224 row(s) affected) / 00:00:04

    CREATE UNIQUE CLUSTERED INDEX ucx_Everything ON #Matrix (Event_Channel, Ticket_Closed_Date, beginTijdsinterval)

    -----------------------------------------------------

    -- there's a row in Tussen_Tickets where Ticket_Closed_DateTime is NULL:

    DROP TABLE #Tussen_Tickets

    SELECT

    tt.Event_Channel,

    ta.Ticket_Closed_Date,

    ta.beginTijdsinterval,

    DVBIntern = SUM(tt.DVBIntern),

    DVBExtern = SUM(tt.DVBExtern),

    Tijdsduur = SUM(tt.Tijdsduur),

    WrapUp = SUM(tt.WrapUp),

    OnHold = SUM(tt.OnHold),

    Talk = SUM(tt.Talk),

    RowsAggregated = COUNT(*) -- not in spec but handy to have

    INTO #Tussen_Tickets

    FROM Tussen_Tickets tt

    CROSS APPLY (

    SELECT

    beginTijdsinterval = CAST(DATEADD(MINUTE,0-(DATEDIFF(MINUTE,0,tt.Ticket_Closed_DateTime)%15),tt.Ticket_Closed_DateTime) AS TIME(3)),

    Ticket_Closed_Date = CAST(tt.Ticket_Closed_DateTime AS DATE)

    ) ta

    WHERE tt.Ticket_Closed_DateTime IS NOT NULL

    GROUP BY tt.Event_Channel, ta.Ticket_Closed_Date, ta.beginTijdsinterval

    -- (24,740 row(s) affected) / 00:00:01

    CREATE UNIQUE CLUSTERED INDEX ucx_Tussen_Tickets_NearlyEverything ON #Tussen_Tickets (Event_Channel, Ticket_Closed_Date, beginTijdsinterval)

    ---------------------------------------

    SELECT mx.Ticket_Closed_Date, mx.Event_Channel, mx.Interval,

    tt.DVBIntern, tt.DVBExtern, tt.Tijdsduur, tt.WrapUp, tt.OnHold, tt.Talk, tt.RowsAggregated

    FROM #Matrix mx

    LEFT JOIN #Tussen_Tickets tt

    ON tt.Event_Channel = mx.Event_Channel

    AND tt.Ticket_Closed_Date = mx.Ticket_Closed_Date

    AND tt.beginTijdsinterval = mx.beginTijdsinterval

    ORDER BY mx.Ticket_Closed_Date, mx.Event_Channel, mx.Interval;

    -- (1,180,224 row(s) affected) / 00:00:17 (time is network + display)

    Hi ChrisM@Work,

    Thank you for the new query. The rows affected are as you mention. But the columns DVBIntern, DVBExtern, Tijdsduur, WrapUp, OnHold and Talk are all NULL or zero.

    Greetz,

    Geert

  • geert.de.vylder (3/6/2013)


    ...But the columns DVBIntern, DVBExtern, Tijdsduur, WrapUp, OnHold and Talk are all NULL or zero...

    That sounds about right. The matrix table contains 1,180,224 rows (all dates and event channels found in the tickets table, all intervals ). The aggregated tickets table only contains 24,740 rows, and not much data;

    SELECT

    DVBIntern = SUM(tt.DVBIntern),

    DVBExtern = SUM(tt.DVBExtern),

    Tijdsduur = SUM(tt.Tijdsduur),

    WrapUp = SUM(tt.WrapUp),

    OnHold = SUM(tt.OnHold),

    Talk = SUM(tt.Talk)

    FROM Tussen_Tickets tt

    [font="Courier New"]

    DVBIntern DVBExtern Tijdsduur WrapUp OnHold Talk

    0.........0.........270164....2.567..0......270161.433

    [/font]

    This data set obviously isn't what you were expecting to see, even though it exactly meets your requirements. What would you like to do with it?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (3/6/2013)


    This data set obviously isn't what you were expecting to see, even though it exactly meets your requirements. What would you like to do with it?

    I have to update the data set into another database on a local server in development and on another remote server in production. The other database is used for reporting.

    Greetz,

    Geert

  • geert.de.vylder (3/6/2013)


    ChrisM@Work (3/6/2013)


    This data set obviously isn't what you were expecting to see, even though it exactly meets your requirements. What would you like to do with it?

    I have to update the data set into another database on a local server in development and on another remote server in production. The other database is used for reporting.

    Greetz,

    Geert

    I meant in terms of potentially reducing the rowcount.

    Let's call a set from this table containing the same day, all of the intervals, and the same Event_Channel, a frame. One frame contains 96 rows, one row per interval. There are 9 event channels so there are 864 rows per date. Multiply 864 by the number of dates in the tickets table - 1366 - to yield the number of rows in the matrix table.

    One possibility is to remove frames which don't have any corresponding rows in the tickets table.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply