http://www.sqlservercentral.com/blogs/sqlstudies/2013/02/04/duplicate-rows-in-a-table/

Printed 2014/08/01 08:34AM

Duplicate rows in a table

By Kenneth Fisher, 2013/02/04

I recently got the following email:

“I just keyed in two rows into a table with all identical column values. How do I get rid of just one of them?”

Now even way back when in the dark ages when I started as a database developer I ran into this problem occasionally. Every now and again I would create a table and forget to put a primary key on it. (I’m fairly sure I’m the only one who has ever done this.) As a result I inevitably ended up with some rows that were exact duplicates and I needed to get rid of just one of them. So the first thing I tried was this:

 DELETE TOP 1 FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

And it failed. Looking in BOL (written on stone tablets at the time) I found out that TOP wasn’t available in DELETE, or UPDATE for that matter. Remember this was back in SQL 2000. I honestly don’t remember if this could be done in FoxPro (my first database language). I developed a couple of tricks to manage the problem, and they worked fairly well, but I had always wished for DELETE TOP #. Well my wish finally came true, as of SQL 2005 TOP is available for use with DELETE, UPDATE and even INSERT. Unfortunately I only learned this recently while studying for my 2008 Development MCTS. Here is the definition in 2005-2008 R2. I haven’t looked at 2012 yet but I imagine that it is the same or similar.

TOP (expression) [PERCENT]
     [ WITH TIES ]

So back to my developer’s problem, I could now give him the simple solution.

 DELETE TOP 1 FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

and it worked perfectly. In case you were wondering the original solutions I came up with was this:

SELECT DISTINCT * INTO #temp
FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

DELETE FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

INSERT INTO MyHeap
SELECT * FROM #temp

With complications for identity columns of course. So you can see how much easier the TOP keyword makes things.

FYI You can also use the TOP keyword as a way to limit a transaction size. I’m using 5000 rows as a transactions size for the following examples.

DELETE TOP (5000) SET ColumnA = 'NewValue'
	WHERE ColumnA = 'OldValue'
WHILE @@ROWCOUNT = 5000
BEGIN
	WAITFOR DELAY ’00:00:02’ – Wait 2 seconds between transactions
	DELETE TOP (5000) SET ColumnA = 'NewValue'
		WHERE ColumnA = 'OldValue'
END

Or

UPDATE TOP (5000) SET ColumnA = 'NewValue'
	WHERE ColumnA = 'OldValue'
WHILE @@ROWCOUNT = 5000
BEGIN
	WAITFOR DELAY '00:00:02' – Wait 2 seconds between transactions
	UPDATE TOP (5000) SET ColumnA = 'NewValue'
		WHERE ColumnA = 'OldValue'
END

Note: Scripts corrected based on comments below.

I haven’t had a chance to do any extensive testing on this yet but the theory is sound and I didn’t run into any problems the couple of times I’ve tried it.



Copyright © 2002-2014 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use. Report Abuse.