which is the easy way to Eliminate the duplicates?

  • Hi,

    Please suggest me to get the distinct values from below query

    declare @T as table(id int identity, code varchar(10))

    insert into @T(code) values('111')

    insert into @T(code) values('111')

    insert into @T(code) values('222')

    insert into @T(code) values('222')

    insert into @T(code) values('333')

    select * from @T

    I want the result as

    ------------------

    111

    222

    333

  • use distinct clause in select

  • select distinct code from @T

  • lokesha.b (10/16/2012)


    Hi,

    Please suggest me to get the distinct values from below query

    declare @T as table(id int identity, code varchar(10))

    insert into @T(code) values('111')

    insert into @T(code) values('111')

    insert into @T(code) values('222')

    insert into @T(code) values('222')

    insert into @T(code) values('333')

    select * from @T

    I want the result as

    ------------------

    111

    222

    333

    For what you're showing, the others are correct. DISTINCT will do it.

    The real question is, what are you actually doing? I ask because the use of DISTINCT is a usual indication that a joined query was written incorrectly or the underlying database design really needs some work.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm Sorry, I confudsed the question; however, my target is to return both the values like ID and Code result should be as

    ID Code

    -- -----

    1 111

    3 222

    5 333

    Querry is

    ---------

    declare @T as table(ID int identity, Code varchar(10))

    insert into @T(code) values('111')

    insert into @T(code) values('111')

    insert into @T(code) values('222')

    insert into @T(code) values('222')

    insert into @T(code) values('333')

    select * from @T

    result

    ------

    ID Code

    -- ----

    1 111

    3 222

    5 333

  • I recommend that you take heed of Jeff's question above. An improper JOIN often generates tons of duplicates and developers can be too lazy to understand why and eliminate them the proper way.

    To return the id number with the code, you can do this:

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM @T)

    SELECT id, code

    FROM Results

    WHERE rn=1


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (10/16/2012)


    I recommend that you take heed of Jeff's question above. An improper JOIN often generates tons of duplicates and developers can be too lazy to understand why and eliminate them the proper way.

    To return the id number with the code, you can do this:

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM @T)

    SELECT id, code

    FROM Results

    WHERE rn=1

    Why bother with the row_number?

    SELECT MIN(id) AS id, code

    FROM @T

    GROUP BY code;

    Performance test: -

    IF object_id('tempdb..#T') IS NOT NULL

    BEGIN

    DROP TABLE #T;

    END;

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,

    CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code

    INTO #T

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

    DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);

    PRINT 'ROWNUMBER';

    SET STATISTICS IO, TIME ON;

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM #T)

    SELECT @HOLDER = id, @HOLDER2 = code

    FROM Results

    WHERE rn=1;

    SET STATISTICS IO, TIME OFF;

    PRINT 'MIN';

    SET STATISTICS IO, TIME ON;

    SELECT @HOLDER = MIN(id), @HOLDER2 = code

    FROM #T

    GROUP BY code;

    SET STATISTICS IO, TIME OFF;

    Results: -

    ROWNUMBER

    Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4797 ms, elapsed time = 2547 ms.

    MIN

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 626 ms, elapsed time = 164 ms.

    Again, but with indexes: -

    IF object_id('tempdb..#T') IS NOT NULL

    BEGIN

    DROP TABLE #T;

    END;

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,

    CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code

    INTO #T

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

    CREATE CLUSTERED INDEX cl_t_id ON #T(id);

    CREATE NONCLUSTERED INDEX nc_t_code ON #T(code);

    DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);

    PRINT 'ROWNUMBER';

    SET STATISTICS IO, TIME ON;

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM #T)

    SELECT @HOLDER = id, @HOLDER2 = code

    FROM Results

    WHERE rn=1;

    SET STATISTICS IO, TIME OFF;

    PRINT 'MIN';

    SET STATISTICS IO, TIME ON;

    SELECT @HOLDER = MIN(id), @HOLDER2 = code

    FROM #T

    GROUP BY code;

    SET STATISTICS IO, TIME OFF;

    Results: -

    ROWNUMBER

    Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 15, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 422 ms, elapsed time = 409 ms.

    MIN

    Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 391 ms, elapsed time = 402 ms.

    So roughly equivalent once indexes are in place, but wildly different before they are.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (10/17/2012)


    dwain.c (10/16/2012)


    I recommend that you take heed of Jeff's question above. An improper JOIN often generates tons of duplicates and developers can be too lazy to understand why and eliminate them the proper way.

    To return the id number with the code, you can do this:

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM @T)

    SELECT id, code

    FROM Results

    WHERE rn=1

    Why bother with the row_number?

    SELECT MIN(id) AS id, code

    FROM @T

    GROUP BY code;

    Performance test: -

    IF object_id('tempdb..#T') IS NOT NULL

    BEGIN

    DROP TABLE #T;

    END;

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,

    CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code

    INTO #T

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

    DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);

    PRINT 'ROWNUMBER';

    SET STATISTICS IO, TIME ON;

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM #T)

    SELECT @HOLDER = id, @HOLDER2 = code

    FROM Results

    WHERE rn=1;

    SET STATISTICS IO, TIME OFF;

    PRINT 'MIN';

    SET STATISTICS IO, TIME ON;

    SELECT @HOLDER = MIN(id), @HOLDER2 = code

    FROM #T

    GROUP BY code;

    SET STATISTICS IO, TIME OFF;

    Results: -

    ROWNUMBER

    Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 4797 ms, elapsed time = 2547 ms.

    MIN

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 626 ms, elapsed time = 164 ms.

    Again, but with indexes: -

    IF object_id('tempdb..#T') IS NOT NULL

    BEGIN

    DROP TABLE #T;

    END;

    SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,

    CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code

    INTO #T

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

    CREATE CLUSTERED INDEX cl_t_id ON #T(id);

    CREATE NONCLUSTERED INDEX nc_t_code ON #T(code);

    DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);

    PRINT 'ROWNUMBER';

    SET STATISTICS IO, TIME ON;

    ;WITH Results AS (

    SELECT id, code

    ,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

    FROM #T)

    SELECT @HOLDER = id, @HOLDER2 = code

    FROM Results

    WHERE rn=1;

    SET STATISTICS IO, TIME OFF;

    PRINT 'MIN';

    SET STATISTICS IO, TIME ON;

    SELECT @HOLDER = MIN(id), @HOLDER2 = code

    FROM #T

    GROUP BY code;

    SET STATISTICS IO, TIME OFF;

    Results: -

    ROWNUMBER

    Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 15, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 422 ms, elapsed time = 409 ms.

    MIN

    Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 391 ms, elapsed time = 402 ms.

    So roughly equivalent once indexes are in place, but wildly different before they are.

    Nice effort Cadavre! Interesting results too (with the indexing). The former doesn't surprise me at all - wish I'd thought of it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Just my 2 cents on indexing. Be careful when using only indexing to "tune" a query especially on OLTP tables. I'm not suggesting that you should avoid such a thing but if you can make a query run much faster by code rather than by index, you've not only decreased index maintenance time a bit, but you've also made INSERTs run a little faster (sometimes, a lot faster... I've seen some indexes actually cause timeouts because of extent splitting) and you've made the code more bullet proof for the future. Another advantage of increasing performance by code instead of by index is space savings on disk and backup/restore times.

    There are times where indexing is absolutely necessary and there are times where you really should bite the bullet and write some alternative code. If the code is using DISTINCT, really consider the latter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/17/2012)


    Just my 2 cents on indexing. Be careful when using only indexing to "tune" a query especially on OLTP tables. I'm not suggesting that you should avoid such a thing but if you can make a query run much faster by code rather than by index, you've not only decreased index maintenance time a bit, but you've also made INSERTs run a little faster (sometimes, a lot faster... I've seen some indexes actually cause timeouts because of extent splitting) and you've made the code more bullet proof for the future. Another advantage of increasing performance by code instead of by index is space savings on disk and backup/restore times.

    There are times where indexing is absolutely necessary and there are times where you really should bite the bullet and write some alternative code. If the code is using DISTINCT, really consider the latter.

    Sounds like the techie explanation of my second signature line! 😛


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • CELKO (10/17/2012)


    Your real problem is that you have no idea how RDBMS works. SQL is a declarative language. We do not put redundant duplicates into tables in the first place. That is what a key does. IDENTITY is never a key; it is a table property that counts the physical insertion attempts to storage.

    Actually, if you're trying to load 3rd party data, this happens all the time in staging tables. In such cases, it is sometimes important to take the last available entry according to the order of the IDENTITY column for other reasons. Think of it as a first step to getting to where you've correctly stated.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • dwain.c (10/17/2012)


    Jeff Moden (10/17/2012)


    Just my 2 cents on indexing. Be careful when using only indexing to "tune" a query especially on OLTP tables. I'm not suggesting that you should avoid such a thing but if you can make a query run much faster by code rather than by index, you've not only decreased index maintenance time a bit, but you've also made INSERTs run a little faster (sometimes, a lot faster... I've seen some indexes actually cause timeouts because of extent splitting) and you've made the code more bullet proof for the future. Another advantage of increasing performance by code instead of by index is space savings on disk and backup/restore times.

    There are times where indexing is absolutely necessary and there are times where you really should bite the bullet and write some alternative code. If the code is using DISTINCT, really consider the latter.

    Sounds like the techie explanation of my second signature line! 😛

    Zactly! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply