Home Forums SQL Server 2008 T-SQL (SS2K8) which is the easy way to Eliminate the duplicates? RE: which is the easy way to Eliminate the duplicates?

  • dwain.c (10/16/2012)


    I recommend that you take heed of Jeff's question above. An improper JOIN often generates tons of duplicates and developers can be too lazy to understand why and eliminate them the proper way.

    To return the id number with the code, you can do this:

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM @T)

    SELECT id, code

    FROM Results

    WHERE rn=1

    Why bother with the row_number?

    SELECT MIN(id) AS id, code

    FROM @T

    GROUP BY code;

    Performance test: -

    IF object_id('tempdb..#T') IS NOT NULL

    BEGIN

    DROP TABLE #T;

    END;

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,

    CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code

    INTO #T

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

    DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);

    PRINT 'ROWNUMBER';

    SET STATISTICS IO, TIME ON;

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM #T)

    SELECT @HOLDER = id, @HOLDER2 = code

    FROM Results

    WHERE rn=1;

    SET STATISTICS IO, TIME OFF;

    PRINT 'MIN';

    SET STATISTICS IO, TIME ON;

    SELECT @HOLDER = MIN(id), @HOLDER2 = code

    FROM #T

    GROUP BY code;

    SET STATISTICS IO, TIME OFF;

    Results: -

    ROWNUMBER

    Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4797 ms, elapsed time = 2547 ms.

    MIN

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 626 ms, elapsed time = 164 ms.

    Again, but with indexes: -

    IF object_id('tempdb..#T') IS NOT NULL

    BEGIN

    DROP TABLE #T;

    END;

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,

    CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code

    INTO #T

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

    CREATE CLUSTERED INDEX cl_t_id ON #T(id);

    CREATE NONCLUSTERED INDEX nc_t_code ON #T(code);

    DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);

    PRINT 'ROWNUMBER';

    SET STATISTICS IO, TIME ON;

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM #T)

    SELECT @HOLDER = id, @HOLDER2 = code

    FROM Results

    WHERE rn=1;

    SET STATISTICS IO, TIME OFF;

    PRINT 'MIN';

    SET STATISTICS IO, TIME ON;

    SELECT @HOLDER = MIN(id), @HOLDER2 = code

    FROM #T

    GROUP BY code;

    SET STATISTICS IO, TIME OFF;

    Results: -

    ROWNUMBER

    Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 15, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 422 ms, elapsed time = 409 ms.

    MIN

    Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 391 ms, elapsed time = 402 ms.

    So roughly equivalent once indexes are in place, but wildly different before they are.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/