Deleting Duplicate Records

Question

Post reply

Deleting Duplicate Records

Sureshkumar Ramakrishnan

Ten Centuries

Points: 1200
More actions
November 24, 2004 at 11:44 am

#182955

Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/s
Kindest Regards,
Sureshkumar Ramakrishnan

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply

Payaswini SSC Enthusiast Points: 161 More actions · Answer 1

Payaswini

SSC Enthusiast

Points: 161

November 25, 2004 at 6:25 am

#531524

Quite helpful

Ghanshyam Manchandia SSC Enthusiast Points: 145 More actions · Answer 2

Ghanshyam Manchandia

SSC Enthusiast

Points: 145

November 28, 2004 at 10:44 pm

#531665

excellent & helpful script

katesl SSCommitted Points: 1743 More actions · Answer 3

Clue: Anyone who begins by thinking "SQL Server doesn't have what Oracle has" is going to arrive at a solution that fails to make best use of SQL Server. Simply add a column of datatype int and set the identity property to "Yes (not for replication)" and all the existing rows in the table will be given a row identifier.

>>>

create table test (name varchar(10))

insert into test values ('Kate')

insert into test values ('Sue')

-- Enterprise Manager is the easiest way to add a column, so add column ID using EM. Datatype is int and property

create view testUniqueID as select name, min(Id) as minID

from test

group by name

delete test

from testuniqueID join test

on testuniqueID.name=test.name

where testuniqueID.minID < test.id

drop table test

drop view testUniqueID

>>>

Lesson: SQL Server does have a row identifier. It stays out of the way unless you need it. Anything Oracle does, SQL Server does more elegantly.

_________________
"Look, those sheep have been shorn."
data analyst replies, "On the sides that we can see.."

lozzamoore SSC Enthusiast Points: 141 More actions · Answer 4

Didn't find this very helpful.

For very large tables, the cursor approach would be dreadfully slow.

There are other techniques that exist, such as using duplicate indexes with IGNORE_DUP_KEY option that would be better for very large tables.

Cheers,

Sureshkumar Ramakrishnan Ten Centuries Points: 1200 More actions · Answer 5

Yeah I totally agree with SQL does things better than what Oracle does.My article is aimed at novice SQL developers who can find the script very convinent in situations wherein the data in a table is not very large and they can do data cleansing easily .

Kindest Regards,

Sureshkumar Ramakrishnan

Stacy Parrish SSC-Addicted Points: 488 More actions · Answer 6

I thought the information was helpful as providing an example. As to the performance, I try to look for ways to performing row oriented operations without cursors. I would try to use new @table variable declaration instead for SQL 2000 and a temp table in 7.0.

Thanks for the the article

herb walles Say Hey Kid Points: 692 More actions · Answer 7

Would it be just easier to save duplicated records in temp table where we have only 1 records for all duplicated records, then to delete all duplicated records in one shot from the original table and then insert records from the temp. table without any cursor?

herb walles Say Hey Kid Points: 692 More actions · Answer 8

set nocount on

/* Script for Creation of Employee Table*/

CREATE TABLE [dbo].[#Employee] (

[id] [int] NULL ,

[name] [Varchar] (50) NULL ,

[salary] [Numeric](18, 2) NULL

&nbsp ON [PRIMARY]

GO

/*********************************************************/

/**********************************************************/

/* Script for Insertion of duplicate records to the Table */

Insert into #employee values (1,'Ram', 1000.00)

Insert into #employee values (2,'Joe', 2000.00)

Insert into #employee values (2,'Joe', 1000.00)

Insert into #employee values (3,'Mary', 1000.00)

Insert into #employee values (4,'Julie', 5000.00)

Insert into #employee values (2,'Joe', 1000.00)

Insert into #employee values (1,'Ram', 1000.00)

select id,name,salary into #temp from #employee

group by id,name,salary

having count(*)>1

delete from e

from #employee e

inner join #temp t on e.id=t.id and e.name=t.name and e.salary=t.salary

insert into #employee

select * from #temp

select * from #employee

drop table #Employee

drop table #temp

DCPeterson SSCoach Points: 15266 More actions · Answer 9

The best way of dealing with duplicates is to avoid them in the first place! Design your databases right and you don't have to worry about it. There is no logically sound reason to allow duplicates.

If you find yourself in a situation (not of your own making, of course) where you have to deal with them, I have found this method helpful:

CREATE TABLE dupes (col1 int, col2 int)

GO

INSERT INTO dupes VALUES (1,1)

INSERT INTO dupes VALUES (2,2)

GO

CREATE TABLE nodupes (col1 int, col2 int)

GO

CREATE UNIQUE CLUSTERED INDEX ix

ON nodupes(col1, col2)

WITH IGNORE_DUP_KEY

GO

INSERT INTO nodupes

SELECT * FROM dupes

GO

SELECT * FROM nodupes

This gives you the safety of being able to do some "sanity checks" before actually modifying data, and it generally performs much better than cursor based methods.

/*****************

If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

*****************/

Anup pandey SSC Rookie Points: 41 More actions · Answer 10

How does such bad design and missing constraint get into Dbs that allow duplicates to get into databases in the first place? Heard of Primary Keys (oh, some people make IDs as PKs for all tables where naturalkeys make sense, tsk tsk), heard of constraints and triggers (in SQL at least). There is no reason to have to use these duplicate deletion scripts. Design tables with proper PKs and constraints and use triggers.

http://www.DynamicWebTX.Com

katesl SSCommitted Points: 1743 More actions · Answer 11

Lots of data exists in the world outside real (that is, normalized and managed by a DBMS) databases. The big ETL (extract, transform, load) task is to load this messy data into your database. I love this work and never have to write a program or a cursor thanks to DTS and to my progammer's editor Epsilon, which can load and edit huge files.

_________________
"Look, those sheep have been shorn."
data analyst replies, "On the sides that we can see.."

Ashish Bajpai Grasshopper Points: 11 More actions · Answer 12

SQL was good but too long

Do One thing

Select ID, NAME, SALARY into #TableName from TableName Group by ID, NAME, SALARY
Delete From TABLENAME
INSERT INTO TABLENAME SELECT * FROM #TABLENAME

This is 3 Line SQL Make a Program & EXECUTE it.

Ashish Bajpai

Programer

NCR Corpration, Mumbai

[91] - 9324 959982

Dan Collier SSC-Addicted Points: 473 More actions · Answer 13

This does the same thing, simplier and NO CURSORS. If it can be done without a cursor, than much better.

CREATE TABLE [dbo].[Employee] (

[id] [int] NULL ,

[name] [Varchar] (50) COLLATE Latin1_General_CI_AS NULL ,

[salary] [Numeric](18, 2) NULL

&nbsp ON [PRIMARY]

GO

/*********************************************************/

/**********************************************************/

/* Script for Insertion of duplicate records to the Table */

Insert into employee values (1,'Ram', 1000.00)

Insert into employee values (2,'Joe', 2000.00)

Insert into employee values (2,'Joe', 1000.00)

Insert into employee values (3,'Mary', 1000.00)

Insert into employee values (4,'Julie', 5000.00)

Insert into employee values (2,'Joe', 1000.00)

Insert into employee values (1,'Ram', 1000.00)

/**********************************************************/

BEGIN TRANSACTION

Select [id], [name], [salary]

INTO #MakeUnique

FROM employee

GROUP BY [id], [name], [salary]

DELETE FROM employee

INSERT INTO employee( [id], [name], [salary] )

SELECT [id], [name], [salary] FROM #MakeUnique

COMMIT TRANSACTION

SELECT * FROM employee

DROP TABLE #MakeUnique

DROP TABLE employee

Dan Collier SSC-Addicted Points: 473 More actions · Answer 14

Yes very very bad, really a NO DESIGN table. Believe or not, I had to deal with NO UNIQUE CONSTRAINT Tables, these BAD Designs with Commercial Enterprise systems.