Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Removing Duplicates Expand / Collapse
Author
Message
Posted Monday, October 7, 2013 9:18 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 10:59 AM
Points: 2,766, Visits: 7,239
Comments posted to this topic are about the item Removing Duplicates

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Post #1502457
Posted Tuesday, October 8, 2013 1:38 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, October 16, 2013 2:22 AM
Points: 13, Visits: 49
Since I have worked with Oracle I have removed duplicates using the rowid column. SQL Server has something similar but its an undocumented feature (might be changed without notice).

delete
from table t1
where exists
(select *
from table t2
where t1.column1 = t2.column1
and t1.column2 = t2.column2
...
and t1.rowid > t2. rowid) -- in Oracle
and t1.%%physloc%% > %%physloc%%) -- in SQL Server

This self join can be useful, you dont need to compare all rows for example and it removed more than one row in case of tripple duplicates.
Post #1502488
Posted Tuesday, October 8, 2013 1:56 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Monday, August 11, 2014 12:00 AM
Points: 1,691, Visits: 757
peterswe (10/8/2013)
Since I have worked with Oracle I have removed duplicates using the rowid column. SQL Server has something similar but its an undocumented feature (might be changed without notice).

delete
from table t1
where exists
(select *
from table t2
where t1.column1 = t2.column1
and t1.column2 = t2.column2
...
and t1.rowid > t2. rowid) -- in Oracle
and t1.%%physloc%% > %%physloc%%) -- in SQL Server

This self join can be useful, you dont need to compare all rows for example and it removed more than one row in case of tripple duplicates.


http://www.sqlskills.com/blogs/paul/sql-server-2008-new-undocumented-physical-row-locator-function/

Not so sure how happy I would be using this - but then as Paul says all the cool stuff is undocumented.

Great question, I really like how you have included the ability to retain the latest version.. I usually use a more simple way to remove duplicate rows..

Using the same table...

Declare @FName varchar(30), -- values to look for
@LName varchar (30),-- values to look for
@cnt int -- count


Declare getallrecords cursor local static For
Select count (1), Fname, Lname
from DuplicateRow (nolock)
group by FName, LName having count(1)>1

Open getallrecords

Fetch next from getallrecords into @cnt,@FName,@LName
--Cursor to check with all other records
While @@fetch_status=0
Begin
Set @cnt= @cnt-1
Set rowcount @cnt

-- Deleting the duplicate records. Observe that all fields are mentioned at the where condition
Delete from DuplicateRow where Fname=@FName and LName=@LName


Set rowcount 0

Fetch next from getallrecords into @cnt,@FName,@LName
End

Close getallrecords
Deallocate getallrecords

Obviously depending on how many rows you need to compare to ensure your only removing duplicates would depend on what you declare and select.


Hope this helps...

Ford Fairlane
Rock and Roll Detective





Post #1502491
Posted Tuesday, October 8, 2013 3:59 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 3:33 PM
Points: 8, Visits: 204
Thanks for a useful article, clearly explained. I shall have occasion to use your method when I import data from Excel, which despite precautions has duplicates.
Post #1502519
Posted Tuesday, October 8, 2013 4:55 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Today @ 8:07 AM
Points: 645, Visits: 422
I believe the following statement in the article is misleading:

You don’t need to do the SELECT first, but it is nice to see what you’ll be deleting.


It is required that the select section be in the delete statement because the windowing functions cannot be part of the where clause. The following DOES NOT work:
SELECT
ROW_NUMBER() OVER ( PARTITION BY FName, LName, JobTitle, Age ORDER BY LName ) R
, FName
, LName
, JobTitle
, Age
FROM DuplicateRow
WHERE ROW_NUMBER() OVER ( PARTITION BY FName, LName, JobTitle, Age ORDER BY LName ) > 1

Gives the following error:
Windowed functions can only appear in the SELECT or ORDER BY clauses.


Post #1502542
Posted Tuesday, October 8, 2013 4:58 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, August 15, 2014 9:56 AM
Points: 3, Visits: 451
Thanks Stefan. Great article.
Post #1502543
Posted Tuesday, October 8, 2013 7:19 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 10:59 AM
Points: 2,766, Visits: 7,239
tom.w.brannon (10/8/2013)
I believe the following statement in the article is misleading:

You don’t need to do the SELECT first, but it is nice to see what you’ll be deleting.


It is required that the select section be in the delete statement because the windowing functions cannot be part of the where clause. The following DOES NOT work:
SELECT
ROW_NUMBER() OVER ( PARTITION BY FName, LName, JobTitle, Age ORDER BY LName ) R
, FName
, LName
, JobTitle
, Age
FROM DuplicateRow
WHERE ROW_NUMBER() OVER ( PARTITION BY FName, LName, JobTitle, Age ORDER BY LName ) > 1

Gives the following error:
Windowed functions can only appear in the SELECT or ORDER BY clauses.




What I meant by "You don't need to do a SELECT first" is that you can just use the DELETE query. The SELECT subquery within the DELETE query is still needed. By "first" I meant running the SELECT to see the results before running the DELETE. Both have another SELECT subquery as part of the operation.


--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Post #1502601
Posted Tuesday, October 8, 2013 7:56 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, May 29, 2014 2:48 PM
Points: 3, Visits: 21
-- I have added DateInserted and Address columns to show how to remove all old records for the same person

-- drop table #DuplicateRow

CREATE TABLE #DuplicateRow(
ID INT identity(1,1),
DateInserted Datetime,
FName varchar(30),
LName varchar(30),
JobTitle varchar(30),
Age tinyint,
[Address] Varchar(255)
)
GO

INSERT INTO #DuplicateRow (DateInserted,FName,LName,JobTitle,Age,[Address])
SELECT DateInserted,FName,LName,JobTitle,Age,[Address] FROM (
SELECT '1/1/12' [DateInserted], 'Mangu' [FName],'Changu' [LName],'Bekar' [JobTitle], 21 [Age] , '1 broad street,ny ' [Address]UNION ALL
SELECT '2/1/12' [DateInserted],'Mangu' [FName],'Changu' [LName],'Bekar' [JobTitle], 21 [Age] , '11 broad street,ny ' [Address]UNION ALL
SELECT '6/1/12' [DateInserted],'Mehul' [FName],'Shah' [LName],'Manager' [JobTitle], 55 [Age] , '1 main street,ny' [Address]UNION ALL
SELECT '3/1/12' [DateInserted],'Mehul' [FName],'Shah' [LName],'Manager' [JobTitle], 55 [Age] , '11 main street,ny' [Address]UNION ALL
SELECT '7/1/12' [DateInserted],'Mangu' [FName],'Changu' [LName],'Bekar' [JobTitle], 21 [Age] , '11/a broad street,ny' [Address]UNION ALL
SELECT '8/1/12' [DateInserted],'Mangu' [FName],'Changu' [LName],'Bekar' [JobTitle], 21 [Age] , '11/b broad street,ny' [Address]UNION ALL
SELECT '9/1/12' [DateInserted],'Mehul' [FName],'Shah' [LName],'Manager' [JobTitle], 55 [Age] , '115 main street,ny' [Address]UNION ALL
SELECT '10/1/12' [DateInserted],'Mehul' [FName],'Shah' [LName],'Manager' [JobTitle], 55 [Age], '118 main street,ny' [Address]
) A

SELECT * FROM #DuplicateRow order by FName,LName,JobTitle,Age,DateInserted desc

-- we need to remove all rows except id= 6 and 8

SELECT
ROW_NUMBER() OVER ( PARTITION BY FName, LName, JobTitle, Age ORDER BY DateInserted desc ) R
, FName
, LName
, JobTitle
, Age
FROM #DuplicateRow

SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( PARTITION BY FName, LName, JobTitle, Age ORDER BY DateInserted desc) R
, FName
, LName
, JobTitle
, Age
FROM #DuplicateRow
) B
WHERE R > 1


DELETE B
FROM ( SELECT ROW_NUMBER() OVER ( PARTITION BY FName, LName, JobTitle, Age ORDER BY DateInserted desc ) R
FROM #DuplicateRow
) B
WHERE R > 1


SELECT * FROM #DuplicateRow
Post #1502625
Posted Tuesday, October 8, 2013 7:57 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 4:17 PM
Points: 1, Visits: 26
Hello There,

You may check this article that provides detailed examples of how to remove duplicate records.
http://www.dfarber.com/computer-consulting-blog/2011/12/26/remove-duplicate-records-in-sql.aspx

It combines several values from several records into one good record.

Regards,

Doron
The Farber Consulting Group, Inc.
http://www.dFarber.com


Post #1502626
Posted Tuesday, October 8, 2013 8:23 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:11 AM
Points: 36,994, Visits: 31,513
Short, sweet, to the point with great examples and good simple explanations. Very nicely done, Mr. Krzywicki.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1502643
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse