Query approach

  • I can figure a number of Big Hammer methods of doing this job, but I"m trying to improve myself and stop hitting things with hammers.

    Given:

    USE [tempdb]

    GO

    CREATE TABLE [Person]

    (

    IPCode INT

    );

    CREATE TABLE [Profile]

    (

    IPCode INT

    , ID NVARCHAR(32)

    );

    INSERT INTO [Person] VALUES (1);

    INSERT INTO [Person] VALUES (2);

    INSERT INTO [Person] VALUES (3);

    INSERT INTO [Person] VALUES (4);

    INSERT INTO [Profile] VALUES (1,'AAAA');

    INSERT INTO [Profile] VALUES (1,'AAAA');

    INSERT INTO [Profile] VALUES (2,'BBBB');

    INSERT INTO [Profile] VALUES (2,'BBBB');

    INSERT INTO [Profile] VALUES (2,'BBBC');

    INSERT INTO [Profile] VALUES (3,'CCCC');

    INSERT INTO [Profile] VALUES (3,'CCCC');

    INSERT INTO [Profile] VALUES (3,'CCCC');

    INSERT INTO [Profile] VALUES (4,'DDDD');

    INSERT INTO [Profile] VALUES (4,'DDDA');

    INSERT INTO [Profile] VALUES (4,'DDDB');

    INSERT INTO [Profile] VALUES (4,'DDDD');

    INSERT INTO [Profile] VALUES (4,'DDDD');

    SELECT *

    FROM [Person] AS p

    JOIN [Profile] AS pr

    ON [p].[IPCode] = [pr].[IPCode];

    DROP TABLE [Person];

    DROP TABLE [Profile];

    How would I make the query return only the rows from [Profile] with the same [IPCode] and different [ID]'s. i.e. - I should get back:

    2,BBBB

    2,BBBC

    4,DDDD

    4,DDDA

    4,DDDB

    In the application involved [Person] has about 500K rows, about 10K of which have rows in [Profile] and there are rarely more than 10 rows in [Profile] for a [Person].

    [Meta-question: is this an effective way to ask this question in this community?]

  • This should get you what you need:

    WITH

    xxx AS

    (

    SELECTpe.IPCode AS pe_ip,

    pr.IPCode AS pr_ip,

    pr.ID,

    DENSE_RANK() OVER (PARTITION BY pr.IPCode ORDER BY ID) AS x

    FROM Person pe

    CROSS APPLY [Profile] pr

    WHERE pe.IPCode=pr.IPCode

    ),

    qualifiers AS

    (

    SELECT DISTINCT pr_ip

    FROM xxx

    WHERE x>1

    )

    SELECT q.pr_ip , xxx.ID

    FROM qualifiers q

    CROSS APPLY xxx

    WHERE q.pr_ip=xxx.pr_ip

    GROUP BY q.pr_ip, ID

    Both tables should also have clustered indexes and non-clustered index on the join keys (e.g. on IP_Code). You would also want to include NULL constraints; it's a best practice to make all your columns not-nullable whenever possible.

    [Meta-question: is this an effective way to ask this question in this community?]

    Including DDL and sample data is super-helpful. It's good to include any constraints and indexes that are present too (if you have not done so).

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • OK, that's going to take some looking at :-D. I can feel self-improvement coming on.

    Many thanks.

  • This might do the trick as well.

    SELECT DISTINCT IPCode, ID

    FROM Profile p

    WHERE EXISTS( SELECT IPCode

    FROM Profile x

    WHERE p.IPCode = x.IPCode

    GROUP BY IPCode HAVING COUNT( DISTINCT ID) > 1)

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I ran both solutions on your sample data and Luis' solution was WAY faster, almost double. But the solution provided by Alan uses so many different methods to arrive at the same answer it is worth the study to see how it all comes together. 2 ctes, 1 Dense_Rank, and 2 cross applys on a small dataset provide an excellent learning opportunity.

    Run the first cte by itself and study its output with cross apply. Then add the 2nd cte. Try it without the x>1 conditional to see why that is important.

    Great stuff here.

    __________________________________________________________________________________________________________
    How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • It's a great thing that you found an answer that will perform correctly and will be fast and simple to understand and an answer that will help you to learn more about different techniques.

    I'm sure there are many possible variations to obtain what you need and that's the greatest thing about SQL.:-D

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I'm pretty happy with the responses! I look forward to learning from both of them.

  • And another slightly different way for the sake of it...

    ;with data as

    (

    SELECT p.IPCode,pr.ID,count(*) over(partition by p.IPCode) as UID

    FROM [Person] AS p

    JOIN [Profile] AS pr

    ON [p].[IPCode] = [pr].[IPCode]

    group by p.IPCode,pr.ID

    )

    select IPCode,ID

    from data

    where UID>1

    Got it slightly wrong, I realised...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Here's another way that avoids DISTINCT:

    SELECT a.IPCode, ID

    FROM Profile a

    JOIN

    (

    SELECT IPCode

    FROM Profile

    GROUP BY IPCode

    HAVING MAX(ID) <> MIN(ID)

    ) b ON a.IPCode = b.IPCode

    GROUP BY a.IPCode, ID;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I can't rival the other guys on this board, but I am using this as practice for myself and would like (constructive but friendly) feedback on the solution I came up with if anyone is willing. 🙂

    WITH CTE1 AS

    (

    SELECT IPCode, ID

    FROM [Profile]

    GROUP BY IPCode, ID

    ),

    CTE2 AS

    (

    SELECT IPCode

    FROM CTE1

    GROUP BY IPCode

    HAVING COUNT(*) > 1

    )

    SELECT IPCode, ID

    FROM [Profile]

    WHERE IPCode IN

    (SELECT IPCode FROM CTE2)

    GROUP BY IPCode, ID


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (12/6/2013)


    I can't rival the other guys on this board, but I am using this as practice for myself and would like (constructive but friendly) feedback on the solution I came up with if anyone is willing. 🙂

    Actually you have done quite well my young Jedi! Witness the following 1,000,000 row test harness incorporating the 4 solutions suggested.

    CREATE TABLE [Person]

    (

    IPCode INT

    );

    CREATE TABLE [Profile]

    (

    IPCode INT

    , ID NVARCHAR(32)

    );

    --CREATE INDEX id1 ON [Profile](IPCode, ID);

    --INSERT INTO [Person] VALUES (1);

    --INSERT INTO [Person] VALUES (2);

    --INSERT INTO [Person] VALUES (3);

    --INSERT INTO [Person] VALUES (4);

    WITH Tally (n) AS

    (

    SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    )

    INSERT INTO [Person]

    SELECT n

    FROM Tally;

    --INSERT INTO [Profile] VALUES (1,'AAAA');

    --INSERT INTO [Profile] VALUES (1,'AAAA');

    --INSERT INTO [Profile] VALUES (2,'BBBB');

    --INSERT INTO [Profile] VALUES (2,'BBBB');

    --INSERT INTO [Profile] VALUES (2,'BBBC');

    --INSERT INTO [Profile] VALUES (3,'CCCC');

    --INSERT INTO [Profile] VALUES (3,'CCCC');

    --INSERT INTO [Profile] VALUES (3,'CCCC');

    --INSERT INTO [Profile] VALUES (4,'DDDD');

    --INSERT INTO [Profile] VALUES (4,'DDDA');

    --INSERT INTO [Profile] VALUES (4,'DDDB');

    --INSERT INTO [Profile] VALUES (4,'DDDD');

    --INSERT INTO [Profile] VALUES (4,'DDDD');

    WITH Tally (n) AS

    (

    SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    )

    INSERT INTO [Profile]

    SELECT a.n, RIGHT(10000+b.n/20, 4)

    FROM Tally a

    CROSS APPLY

    (

    SELECT n

    FROM Tally

    WHERE n <= 100

    ) b;

    PRINT 'Alan.B';

    SET STATISTICS IO, TIME ON;

    WITH

    xxx AS

    (

    SELECTpe.IPCode AS pe_ip,

    pr.IPCode AS pr_ip,

    pr.ID,

    DENSE_RANK() OVER (PARTITION BY pr.IPCode ORDER BY ID) AS x

    FROM Person pe

    CROSS APPLY [Profile] pr

    WHERE pe.IPCode=pr.IPCode

    ),

    qualifiers AS

    (

    SELECT DISTINCT pr_ip

    FROM xxx

    WHERE x>1

    )

    SELECT q.pr_ip , xxx.ID

    FROM qualifiers q

    CROSS APPLY xxx

    WHERE q.pr_ip=xxx.pr_ip

    GROUP BY q.pr_ip, ID;

    SET STATISTICS IO, TIME OFF;

    PRINT 'Luis Cazares';

    SET STATISTICS IO, TIME ON;

    SELECT DISTINCT IPCode, ID

    FROM Profile p

    WHERE EXISTS( SELECT IPCode

    FROM Profile x

    WHERE p.IPCode = x.IPCode

    GROUP BY IPCode HAVING COUNT( DISTINCT ID) > 1)

    SET STATISTICS IO, TIME OFF;

    PRINT 'Dwain.C';

    SET STATISTICS IO, TIME ON;

    SELECT a.IPCode, ID

    FROM Profile a

    JOIN

    (

    SELECT IPCode

    FROM [Profile]

    GROUP BY IPCode

    HAVING MAX(ID) <> MIN(ID)

    ) b ON a.IPCode = b.IPCode

    GROUP BY a.IPCode, ID;

    SET STATISTICS IO, TIME OFF;

    PRINT 'Autoexcrement';

    SET STATISTICS IO, TIME ON;

    WITH CTE1 AS

    (

    SELECT IPCode, ID

    FROM [Profile]

    GROUP BY IPCode, ID

    ),

    CTE2 AS

    (

    SELECT IPCode

    FROM CTE1

    GROUP BY IPCode

    HAVING COUNT(*) > 1

    )

    SELECT IPCode, ID

    FROM [Profile]

    WHERE IPCode IN

    (SELECT IPCode FROM CTE2)

    GROUP BY IPCode, ID;

    SET STATISTICS IO, TIME OFF;

    GO

    DROP TABLE [Person];

    DROP TABLE [Profile];

    Without the INDEXI've proposed the results are as follows:

    Alan.B

    (60000 row(s) affected)

    Table 'Person'. Scan count 10, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4461 ms, elapsed time = 4203 ms.

    Luis Cazares

    (60000 row(s) affected)

    Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2073 ms, elapsed time = 938 ms.

    Dwain.C

    (60000 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 3526 ms, elapsed time = 1330 ms.

    Autoexcrement

    (60000 row(s) affected)

    Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2199 ms, elapsed time = 892 ms.

    With the INDEX, the results are this:

    Alan.B

    (60000 row(s) affected)

    Table 'Person'. Scan count 10, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Profile'. Scan count 10, logical reads 6192, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4791 ms, elapsed time = 3098 ms.

    Luis Cazares

    (60000 row(s) affected)

    Table 'Profile'. Scan count 2, logical reads 7508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 405 ms, elapsed time = 470 ms.

    Dwain.C

    (60000 row(s) affected)

    Table 'Profile'. Scan count 10, logical reads 7776, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 998 ms, elapsed time = 584 ms.

    Autoexcrement

    (60000 row(s) affected)

    Table 'Profile'. Scan count 2, logical reads 7508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 405 ms, elapsed time = 456 ms.

    So yours is the fastest in both cases, no doubt due to the fact that you aggregate the aggregate results.

    Nicely done!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Holy cow!


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • Viewing 12 posts - 1 through 11 (of 11 total)

    You must be logged in to reply to this topic. Login to reply