dwain.c (10/16/2012)
I recommend that you take heed of Jeff's question above. An improper JOIN often generates tons of duplicates and developers can be too lazy to understand why and eliminate them the proper way.To return the id number with the code, you can do this:
;WITH Results AS (
SELECT id, code
,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)
FROM @T)
SELECT id, code
FROM Results
WHERE rn=1
Why bother with the row_number?
SELECT MIN(id) AS id, code
FROM @T
GROUP BY code;
Performance test: -
IF object_id('tempdb..#T') IS NOT NULL
BEGIN
DROP TABLE #T;
END;
SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,
CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code
INTO #T
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;
DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);
PRINT 'ROWNUMBER';
SET STATISTICS IO, TIME ON;
;WITH Results AS (
SELECT id, code
,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)
FROM #T)
SELECT @HOLDER = id, @HOLDER2 = code
FROM Results
WHERE rn=1;
SET STATISTICS IO, TIME OFF;
PRINT 'MIN';
SET STATISTICS IO, TIME ON;
SELECT @HOLDER = MIN(id), @HOLDER2 = code
FROM #T
GROUP BY code;
SET STATISTICS IO, TIME OFF;
Results: -
ROWNUMBER
Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4797 ms, elapsed time = 2547 ms.
MIN
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 626 ms, elapsed time = 164 ms.
Again, but with indexes: -
IF object_id('tempdb..#T') IS NOT NULL
BEGIN
DROP TABLE #T;
END;
SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,
CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code
INTO #T
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;
CREATE CLUSTERED INDEX cl_t_id ON #T(id);
CREATE NONCLUSTERED INDEX nc_t_code ON #T(code);
DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);
PRINT 'ROWNUMBER';
SET STATISTICS IO, TIME ON;
;WITH Results AS (
SELECT id, code
,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)
FROM #T)
SELECT @HOLDER = id, @HOLDER2 = code
FROM Results
WHERE rn=1;
SET STATISTICS IO, TIME OFF;
PRINT 'MIN';
SET STATISTICS IO, TIME ON;
SELECT @HOLDER = MIN(id), @HOLDER2 = code
FROM #T
GROUP BY code;
SET STATISTICS IO, TIME OFF;
Results: -
ROWNUMBER
Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 15, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 422 ms, elapsed time = 409 ms.
MIN
Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 391 ms, elapsed time = 402 ms.
So roughly equivalent once indexes are in place, but wildly different before they are.