SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Deleting Duplicate Records


Deleting Duplicate Records

Author
Message
Sureshkumar Ramakrishnan
Sureshkumar Ramakrishnan
Mr or Mrs. 500
Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)

Group: General Forum Members
Points: 574 Visits: 37
Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/s


Kindest Regards,

Sureshkumar Ramakrishnan

Payaswini
Payaswini
Valued Member
Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)Valued Member (73 reputation)

Group: General Forum Members
Points: 73 Visits: 1
Quite helpful
Ghanshyam Manchandia
Ghanshyam Manchandia
SSC Rookie
SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)SSC Rookie (43 reputation)

Group: General Forum Members
Points: 43 Visits: 1
excellent & helpful script
katesl
katesl
Right there with Babe
Right there with Babe (759 reputation)Right there with Babe (759 reputation)Right there with Babe (759 reputation)Right there with Babe (759 reputation)Right there with Babe (759 reputation)Right there with Babe (759 reputation)Right there with Babe (759 reputation)Right there with Babe (759 reputation)

Group: General Forum Members
Points: 759 Visits: 476

Clue: Anyone who begins by thinking "SQL Server doesn't have what Oracle has" is going to arrive at a solution that fails to make best use of SQL Server. Simply add a column of datatype int and set the identity property to "Yes (not for replication)" and all the existing rows in the table will be given a row identifier.

>>>

create table test (name varchar(10))
insert into test values ('Kate')
insert into test values ('Kate')
insert into test values ('Sue')

-- Enterprise Manager is the easiest way to add a column, so add column ID using EM. Datatype is int and property

create view testUniqueID as select name, min(Id) as minID
from test
group by name

delete test
from testuniqueID join test
on testuniqueID.name=test.name
where testuniqueID.minID < test.id

drop table test

drop view testUniqueID

>>>

Lesson: SQL Server does have a row identifier. It stays out of the way unless you need it. Anything Oracle does, SQL Server does more elegantly.



_________________
"Look, those sheep have been shorn."
data analyst replies, "On the sides that we can see.."
lozzamoore
lozzamoore
SSC Journeyman
SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)SSC Journeyman (77 reputation)

Group: General Forum Members
Points: 77 Visits: 15

Didn't find this very helpful.

For very large tables, the cursor approach would be dreadfully slow.

There are other techniques that exist, such as using duplicate indexes with IGNORE_DUP_KEY option that would be better for very large tables.

Cheers,





Sureshkumar Ramakrishnan
Sureshkumar Ramakrishnan
Mr or Mrs. 500
Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)Mr or Mrs. 500 (574 reputation)

Group: General Forum Members
Points: 574 Visits: 37

Yeah I totally agree with SQL does things better than what Oracle does.My article is aimed at novice SQL developers who can find the script very convinent in situations wherein the data in a table is not very large and they can do data cleansing easily .




Kindest Regards,

Sureshkumar Ramakrishnan

Stacy Parrish
Stacy Parrish
Old Hand
Old Hand (386 reputation)Old Hand (386 reputation)Old Hand (386 reputation)Old Hand (386 reputation)Old Hand (386 reputation)Old Hand (386 reputation)Old Hand (386 reputation)Old Hand (386 reputation)

Group: General Forum Members
Points: 386 Visits: 3

I thought the information was helpful as providing an example. As to the performance, I try to look for ways to performing row oriented operations without cursors. I would try to use new @table variable declaration instead for SQL 2000 and a temp table in 7.0.

Thanks for the the article


herb walles
herb walles
SSC Veteran
SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)

Group: General Forum Members
Points: 284 Visits: 1
Would it be just easier to save duplicated records in temp table where we have only 1 records for all duplicated records, then to delete all duplicated records in one shot from the original table and then insert records from the temp. table without any cursor?
herb walles
herb walles
SSC Veteran
SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)SSC Veteran (284 reputation)

Group: General Forum Members
Points: 284 Visits: 1

set nocount on
/* Script for Creation of Employee Table*/
CREATE TABLE [dbo].[#Employee] (
[id] [int] NULL ,
[name] [Varchar] (50) NULL ,
[salary] [Numeric](18, 2) NULL
  ON [PRIMARY]
GO
/*********************************************************/


/**********************************************************/
/* Script for Insertion of duplicate records to the Table */
Insert into #employee values (1,'Ram', 1000.00)
Insert into #employee values (1,'Ram', 1000.00)
Insert into #employee values (2,'Joe', 2000.00)
Insert into #employee values (2,'Joe', 1000.00)
Insert into #employee values (3,'Mary', 1000.00)
Insert into #employee values (4,'Julie', 5000.00)
Insert into #employee values (2,'Joe', 1000.00)
Insert into #employee values (1,'Ram', 1000.00)


select id,name,salary into #temp from #employee
group by id,name,salary
having count(*)>1


delete from e
from #employee e
inner join #temp t on e.id=t.id and e.name=t.name and e.salary=t.salary

insert into #employee
select * from #temp


select * from #employee


drop table #Employee
drop table #temp


DCPeterson
DCPeterson
SSCertifiable
SSCertifiable (6.2K reputation)SSCertifiable (6.2K reputation)SSCertifiable (6.2K reputation)SSCertifiable (6.2K reputation)SSCertifiable (6.2K reputation)SSCertifiable (6.2K reputation)SSCertifiable (6.2K reputation)SSCertifiable (6.2K reputation)

Group: General Forum Members
Points: 6189 Visits: 432

The best way of dealing with duplicates is to avoid them in the first place! Design your databases right and you don't have to worry about it. There is no logically sound reason to allow duplicates.

If you find yourself in a situation (not of your own making, of course) where you have to deal with them, I have found this method helpful:

CREATE TABLE dupes (col1 int, col2 int)

GO
INSERT INTO dupes VALUES (1,1)
INSERT INTO dupes VALUES (1,1)
INSERT INTO dupes VALUES (1,1)
INSERT INTO dupes VALUES (2,2)
INSERT INTO dupes VALUES (2,2)
INSERT INTO dupes VALUES (2,2)
INSERT INTO dupes VALUES (2,2)

GO

CREATE TABLE nodupes (col1 int, col2 int)

GO

CREATE UNIQUE CLUSTERED INDEX ix
ON nodupes(col1, col2)
WITH IGNORE_DUP_KEY

GO

INSERT INTO nodupes
SELECT * FROM dupes

GO

SELECT * FROM nodupes

This gives you the safety of being able to do some "sanity checks" before actually modifying data, and it generally performs much better than cursor based methods.



/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek



*****************/
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum







































































































































































SQLServerCentral


Search