which is the easy way to Eliminate the duplicates?

Question

Post reply

which is the easy way to Eliminate the duplicates?

lokesha.b

SSC Enthusiast

Points: 128
More actions
October 16, 2012 at 8:13 am

#263300

Hi,
Please suggest me to get the distinct values from below query
declare @T as table(id int identity, code varchar(10))
insert into @T(code) values('111')
insert into @T(code) values('111')
insert into @T(code) values('222')
insert into @T(code) values('222')
insert into @T(code) values('333')
select * from @T
I want the result as
------------------
111
222
333

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Chandra Sekhara Vyas Dhara SSChampion Points: 10207 More actions · Answer 1

Chandra Sekhara Vyas Dhara

SSChampion

Points: 10207

October 16, 2012 at 8:14 am

#1549593

use distinct clause in select

sqlrd22 SSCrazy Points: 2255 More actions · Answer 2

sqlrd22

SSCrazy

Points: 2255

October 16, 2012 at 8:59 am

#1549623

select distinct code from @T

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 3

lokesha.b (10/16/2012)
Hi,
Please suggest me to get the distinct values from below query
declare @T as table(id int identity, code varchar(10))
insert into @T(code) values('111')
insert into @T(code) values('111')
insert into @T(code) values('222')
insert into @T(code) values('222')
insert into @T(code) values('333')
select * from @T
I want the result as
------------------
111
222
333

For what you're showing, the others are correct. DISTINCT will do it.

The real question is, what are you actually doing? I ask because the use of DISTINCT is a usual indication that a joined query was written incorrectly or the underlying database design really needs some work.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

lokesha.b SSC Enthusiast Points: 128 More actions · Answer 4

I'm Sorry, I confudsed the question; however, my target is to return both the values like ID and Code result should be as

ID Code

-- -----

1 111

3 222

5 333

Querry is

---------

declare @T as table(ID int identity, Code varchar(10))

insert into @T(code) values('111')

insert into @T(code) values('222')

insert into @T(code) values('333')

select * from @T

result

------

ID Code

-- ----

1 111

3 222

5 333

Dwain Camps SSC Guru Points: 86908 More actions · Answer 5

I recommend that you take heed of Jeff's question above. An improper JOIN often generates tons of duplicates and developers can be too lazy to understand why and eliminate them the proper way.

To return the id number with the code, you can do this:

;WITH Results AS (

SELECT id, code

,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

FROM @T)

SELECT id, code

FROM Results

WHERE rn=1

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Cadavre SSC-Forever Points: 41690 More actions · Answer 6

dwain.c (10/16/2012)
I recommend that you take heed of Jeff's question above. An improper JOIN often generates tons of duplicates and developers can be too lazy to understand why and eliminate them the proper way.
To return the id number with the code, you can do this:
;WITH Results AS (
SELECT id, code
,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)
FROM @T)
SELECT id, code
FROM Results
WHERE rn=1

Why bother with the row_number?

SELECT MIN(id) AS id, code

FROM @T

GROUP BY code;

Performance test: -

IF object_id('tempdb..#T') IS NOT NULL

BEGIN

DROP TABLE #T;

END;

SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,

CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code

INTO #T

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);

PRINT 'ROWNUMBER';

SET STATISTICS IO, TIME ON;

;WITH Results AS (

SELECT id, code

,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

FROM #T)

SELECT @HOLDER = id, @HOLDER2 = code

FROM Results

WHERE rn=1;

SET STATISTICS IO, TIME OFF;

PRINT 'MIN';

SET STATISTICS IO, TIME ON;

SELECT @HOLDER = MIN(id), @HOLDER2 = code

FROM #T

GROUP BY code;

SET STATISTICS IO, TIME OFF;

Results: -

ROWNUMBER

Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 4797 ms, elapsed time = 2547 ms.

MIN

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 626 ms, elapsed time = 164 ms.

Again, but with indexes: -

IF object_id('tempdb..#T') IS NOT NULL

BEGIN

DROP TABLE #T;

END;

SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,

CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code

INTO #T

FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;

CREATE CLUSTERED INDEX cl_t_id ON #T(id);

CREATE NONCLUSTERED INDEX nc_t_code ON #T(code);

DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);

PRINT 'ROWNUMBER';

SET STATISTICS IO, TIME ON;

;WITH Results AS (

SELECT id, code

,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)

FROM #T)

SELECT @HOLDER = id, @HOLDER2 = code

FROM Results

WHERE rn=1;

SET STATISTICS IO, TIME OFF;

PRINT 'MIN';

SET STATISTICS IO, TIME ON;

SELECT @HOLDER = MIN(id), @HOLDER2 = code

FROM #T

GROUP BY code;

SET STATISTICS IO, TIME OFF;

Results: -

ROWNUMBER

Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 15, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 422 ms, elapsed time = 409 ms.

MIN

Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 391 ms, elapsed time = 402 ms.

So roughly equivalent once indexes are in place, but wildly different before they are.

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Dwain Camps SSC Guru Points: 86908 More actions · Answer 7

Cadavre (10/17/2012)
dwain.c (10/16/2012)
I recommend that you take heed of Jeff's question above. An improper JOIN often generates tons of duplicates and developers can be too lazy to understand why and eliminate them the proper way.
To return the id number with the code, you can do this:
;WITH Results AS (
SELECT id, code
,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)
FROM @T)
SELECT id, code
FROM Results
WHERE rn=1
Why bother with the row_number?
SELECT MIN(id) AS id, code
FROM @T
GROUP BY code;
Performance test: -
IF object_id('tempdb..#T') IS NOT NULL
BEGIN
DROP TABLE #T;
END;
SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,
CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code
INTO #T
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;
DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);
PRINT 'ROWNUMBER';
SET STATISTICS IO, TIME ON;
;WITH Results AS (
SELECT id, code
,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)
FROM #T)
SELECT @HOLDER = id, @HOLDER2 = code
FROM Results
WHERE rn=1;
SET STATISTICS IO, TIME OFF;
PRINT 'MIN';
SET STATISTICS IO, TIME ON;
SELECT @HOLDER = MIN(id), @HOLDER2 = code
FROM #T
GROUP BY code;
SET STATISTICS IO, TIME OFF;
Results: -
ROWNUMBER
Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4797 ms, elapsed time = 2547 ms.
MIN
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#T'. Scan count 5, logical reads 2341, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 626 ms, elapsed time = 164 ms.
Again, but with indexes: -
IF object_id('tempdb..#T') IS NOT NULL
BEGIN
DROP TABLE #T;
END;
SELECT TOP 1000000 IDENTITY(INT,1,1) AS id,
CAST((ABS(CHECKSUM(NEWID())) % 100) + 1 AS VARCHAR(3)) AS code
INTO #T
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3;
CREATE CLUSTERED INDEX cl_t_id ON #T(id);
CREATE NONCLUSTERED INDEX nc_t_code ON #T(code);
DECLARE @HOLDER INT, @HOLDER2 VARCHAR(3);
PRINT 'ROWNUMBER';
SET STATISTICS IO, TIME ON;
;WITH Results AS (
SELECT id, code
,rn=ROW_NUMBER() OVER (PARTITION BY code ORDER BY id)
FROM #T)
SELECT @HOLDER = id, @HOLDER2 = code
FROM Results
WHERE rn=1;
SET STATISTICS IO, TIME OFF;
PRINT 'MIN';
SET STATISTICS IO, TIME ON;
SELECT @HOLDER = MIN(id), @HOLDER2 = code
FROM #T
GROUP BY code;
SET STATISTICS IO, TIME OFF;
Results: -
ROWNUMBER
Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 15, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 422 ms, elapsed time = 409 ms.
MIN
Table '#T'. Scan count 1, logical reads 1979, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 391 ms, elapsed time = 402 ms.
So roughly equivalent once indexes are in place, but wildly different before they are.

Nice effort Cadavre! Interesting results too (with the indexing). The former doesn't surprise me at all - wish I'd thought of it.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 8

Just my 2 cents on indexing. Be careful when using only indexing to "tune" a query especially on OLTP tables. I'm not suggesting that you should avoid such a thing but if you can make a query run much faster by code rather than by index, you've not only decreased index maintenance time a bit, but you've also made INSERTs run a little faster (sometimes, a lot faster... I've seen some indexes actually cause timeouts because of extent splitting) and you've made the code more bullet proof for the future. Another advantage of increasing performance by code instead of by index is space savings on disk and backup/restore times.

There are times where indexing is absolutely necessary and there are times where you really should bite the bullet and write some alternative code. If the code is using DISTINCT, really consider the latter.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86908 More actions · Answer 9

Jeff Moden (10/17/2012)
Just my 2 cents on indexing. Be careful when using only indexing to "tune" a query especially on OLTP tables. I'm not suggesting that you should avoid such a thing but if you can make a query run much faster by code rather than by index, you've not only decreased index maintenance time a bit, but you've also made INSERTs run a little faster (sometimes, a lot faster... I've seen some indexes actually cause timeouts because of extent splitting) and you've made the code more bullet proof for the future. Another advantage of increasing performance by code instead of by index is space savings on disk and backup/restore times.
There are times where indexing is absolutely necessary and there are times where you really should bite the bullet and write some alternative code. If the code is using DISTINCT, really consider the latter.

Sounds like the techie explanation of my second signature line! 😛

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 10

CELKO (10/17/2012)
Your real problem is that you have no idea how RDBMS works. SQL is a declarative language. We do not put redundant duplicates into tables in the first place. That is what a key does. IDENTITY is never a key; it is a table property that counts the physical insertion attempts to storage.

Actually, if you're trying to load 3rd party data, this happens all the time in staging tables. In such cases, it is sometimes important to take the last available entry according to the order of the IDENTITY column for other reasons. Think of it as a first step to getting to where you've correctly stated.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 11

dwain.c (10/17/2012)
Jeff Moden (10/17/2012)
Just my 2 cents on indexing. Be careful when using only indexing to "tune" a query especially on OLTP tables. I'm not suggesting that you should avoid such a thing but if you can make a query run much faster by code rather than by index, you've not only decreased index maintenance time a bit, but you've also made INSERTs run a little faster (sometimes, a lot faster... I've seen some indexes actually cause timeouts because of extent splitting) and you've made the code more bullet proof for the future. Another advantage of increasing performance by code instead of by index is space savings on disk and backup/restore times.
There are times where indexing is absolutely necessary and there are times where you really should bite the bullet and write some alternative code. If the code is using DISTINCT, really consider the latter.
Sounds like the techie explanation of my second signature line! 😛

Zactly! 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)