Need help on a query in SQL 2005

  • I have the following table called "test":

    CREATE TABLE test

    (

    call_state tinyint,

    call_type tinyint,

    caller int,

    localtime datetime,

    s_id tinyint,

    callee int,

    duration int

    );

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:35:47' , 6 , 1801 , 2540 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:35:47' , 3 , 1801 , 2540 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:35:48' , 2 , 1801 , 2540 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:35:47' , 1 , 1801 , 2540 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:04' , 6 , 1801 , 6060 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:04' , 3 , 1801 , 6070 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:04' , 2 , 1801 , 6070 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:04' , 1 , 1801 , 6070 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:14' , 6 , 1801 , 3850 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:14' , 3 , 1801 , 3850 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:14' , 2 , 1801 , 3850 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 18:36:14' , 1 , 1801 , 3860 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:03' , 1 , 1801 , 4470 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:04' , 11 , 1801 , 4470 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:04' , 6 , 1801 , 4460 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:04' , 3 , 1801 , 4470 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 20:53:04' , 2 , 1801 , 4460 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 22:21:20' , 2 , 1801 , 4200 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 22:21:20' , 3 , 1801 , 4210 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 22:21:20' , 1 , 1801 , 4210 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/21/2004 22:21:20' , 6 , 1801 , 4210 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:20' , 6 , 1801 , 2730 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:20' , 3 , 1801 , 2730 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:20' , 1 , 1801 , 2730 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:31' , 5 , 1801 , 3760 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:31' , 2 , 1801 , 3770 )

    INSERT INTO test (call_state, call_type, caller, localtime, s_id, callee, Duration) values ( 0 , 0 , 6942 , '1/23/2004 12:17:31' , 1 , 1801 , 3750 )

    I would like to have the result set back in the following format with 2 more columns. Please see the attached result.jpg.

    Group column = If caller and callee are the same, localtime between each row is within 1 second, then it will flag as 1. Next group will be flag as 2 ... etc.

    Filter column = If s_id for all the group calls has 6 and 2, then mark 1 for filter column where s_id = 2. Like group 1.

    If all the group calls has only 6, then mark 1 for filter column where s_id = 6. Like group 6.

    If all the group calls has only 2, then mark 1 for filter column where s_id = 2. Like group 7.

    All the s_id other than 2 or 6, they don't need to be flag as 1 on the filter column. Basically, each group call can only have 1 or no row flag as 1 for the filter column.

    Thanks very much!

  • -- Since group allocation must take place sequentially row by row, your good options

    -- are quite limited. This solution uses a recursive CTE, you could also consider the

    -- Quirky Update, probably the fastest method, or a cursor.

    -- Performance will be horrible unless you have an existing column which models rn in "OrderedSet",

    -- or you can add one. You will need an index on it too.

    ;WITH OrderedSet AS (

    SELECT

    rn = ROW_NUMBER() OVER(ORDER BY [caller], callee, localtime),

    *

    FROM test

    ),

    rCTE AS (

    SELECT tr.*, grp = 1

    FROM OrderedSet tr

    WHERE rn = 1

    UNION ALL

    SELECT tr.*,

    grp = CASE

    WHEN tr.[caller] = lr.[caller] AND tr.callee = lr.callee

    AND DATEDIFF(SECOND, lr.localtime, tr.localtime ) <= 1 THEN lr.grp

    ELSE lr.grp+1 END

    FROM rCTE lr

    INNER JOIN OrderedSet tr

    ON tr.rn = lr.rn+1

    )

    SELECT r.*,

    filter = CASE

    WHEN x.minsid = 2 AND x.maxsid = 6 AND r.s_id = 2 THEN 1

    WHEN x.minsid = 6 AND r.s_id = 6 THEN 1

    WHEN x.minsid = 2 AND x.maxsid = 2 AND r.s_id = 2 THEN 1

    ELSE NULL END

    FROM rCTE r

    CROSS APPLY (

    SELECT minsid = MIN(ri.s_id), maxsid = MAX(ri.s_id)

    FROM rCTE ri

    WHERE ri.grp = r.grp

    AND ri.s_id IN (2,6)

    ) x

    OPTION (MAXRECURSION 0);

    -- Here's how not to do it: a query which resolves the group number but uses a "triangular join".

    -- For each and every row in the set, the whole set up to that point (rn) is scanned.

    -- Horrible if you have anything except a very small number of rows.

    ;WITH OrderedSet AS (

    SELECT

    rn = ROW_NUMBER() OVER(ORDER BY [caller], callee, localtime),

    *

    FROM test

    ),

    MarkedSet AS (

    SELECT tr.*, Marker = ISNULL(x.Marker,1)

    FROM OrderedSet tr

    OUTER APPLY (

    SELECT Marker = 0

    FROM OrderedSet lr

    WHERE lr.rn + 1 = tr.rn

    AND lr.[caller] = tr.[caller]

    AND lr.callee = tr.callee

    AND DATEDIFF(SECOND, lr.localtime, tr.localtime ) <= 1

    ) x

    )

    SELECT *

    FROM MarkedSet m

    CROSS APPLY (

    SELECT grp = SUM(Marker)

    FROM MarkedSet pr

    WHERE pr.rn <= m.rn

    ) x

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you so much for the query and it worked flawlessly. However, is there another way to get the same results faster? My data set has 20 millions records.

  • As Chris said in the comments, the quirky update should be a lot faster. I've been bitten by the rCTEs ' performance when using a lot of information. You can read about the quirky update (QU) in the following article: http://www.sqlservercentral.com/articles/T-SQL/68467/

    This could do the trick, but remember to test and follow the QU rules.

    CREATE TABLE #test

    (

    call_state tinyint,

    call_type tinyint,

    caller int,

    localtime datetime,

    s_id tinyint,

    callee int,

    duration int,

    [group] int,

    filter int

    );

    CREATE CLUSTERED INDEX IX_TempTest ON #Test(caller, callee, localtime)

    INSERT INTO #test(

    call_state,

    call_type ,

    caller ,

    localtime ,

    s_id ,

    callee ,

    duration

    )

    SELECT call_state,

    call_type ,

    caller ,

    localtime ,

    s_id ,

    callee ,

    duration

    FROM test

    DECLARE @Caller int,

    @Callee int,

    @Time datetime,

    @Group int = 0

    UPDATE t SET

    @Group = [Group] = @Group + CASE WHEN Caller = @Caller AND callee = @Callee AND ABS( DATEDIFF( ss, localtime, @Time)) <= 1 THEN 0 ELSE 1 END,

    @Caller = caller,

    @Callee = callee,

    @Time = localtime,

    filter = CASE WHEN s_id = 2 THEN 1 END

    FROM #test t WITH(TABLOCKX)

    OPTION(MAXDOP 1)

    UPDATE t SET

    filter = 1

    FROM #test t

    WHERE s_id = 6

    AND [group] IN (SELECT [group] FROM #test GROUP BY [group] HAVING MAX(filter) IS NULL)

    SELECT *,

    --If you don't want to update the table twice, you can calculate the filter on the fly.

    CASE ROW_NUMBER() OVER( PARTITION BY [group] ORDER BY CASE s_id WHEN 2 THEN 1 WHEN 6 THEN 2 ELSE 3 END) WHEN 1 THEN 1 END filter2

    FROM #test

    DROP TABLE #test

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply