Blog Post

Duplicate rows in a table

,

I recently got the following email:

“I just keyed in two rows into a table with all identical column values. How do I get rid of just one of them?”

Now even way back when in the dark ages when I started as a database developer I ran into this problem occasionally. Every now and again I would create a table and forget to put a primary key on it. (I’m fairly sure I’m the only one who has ever done this.) As a result I inevitably ended up with some rows that were exact duplicates and I needed to get rid of just one of them. So the first thing I tried was this:

 DELETE TOP 1 FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

And it failed. Looking in BOL (written on stone tablets at the time) I found out that TOP wasn’t available in DELETE, or UPDATE for that matter. Remember this was back in SQL 2000. I honestly don’t remember if this could be done in FoxPro (my first database language). I developed a couple of tricks to manage the problem, and they worked fairly well, but I had always wished for DELETE TOP #. Well my wish finally came true, as of SQL 2005 TOP is available for use with DELETE, UPDATE and even INSERT. Unfortunately I only learned this recently while studying for my 2008 Development MCTS. Here is the definition in 2005-2008 R2. I haven’t looked at 2012 yet but I imagine that it is the same or similar.

TOP (expression) [PERCENT]
     [ WITH TIES ]

So back to my developer’s problem, I could now give him the simple solution.

 DELETE TOP 1 FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

and it worked perfectly. In case you were wondering the original solutions I came up with was this:

SELECT DISTINCT * INTO #temp
FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'
DELETE FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'
INSERT INTO MyHeap
SELECT * FROM #temp

With complications for identity columns of course. So you can see how much easier the TOP keyword makes things.

FYI You can also use the TOP keyword as a way to limit a transaction size. I’m using 5000 rows as a transactions size for the following examples.

DELETE TOP (5000) SET ColumnA = 'NewValue'
WHERE ColumnA = 'OldValue'
WHILE @@ROWCOUNT = 5000
BEGIN
WAITFOR DELAY ’00:00:02’ – Wait 2 seconds between transactions
DELETE TOP (5000) SET ColumnA = 'NewValue'
WHERE ColumnA = 'OldValue'
END

Or

UPDATE TOP (5000) SET ColumnA = 'NewValue'
WHERE ColumnA = 'OldValue'
WHILE @@ROWCOUNT = 5000
BEGIN
WAITFOR DELAY '00:00:02' – Wait 2 seconds between transactions
UPDATE TOP (5000) SET ColumnA = 'NewValue'
WHERE ColumnA = 'OldValue'
END

Note: Scripts corrected based on comments below.

I haven’t had a chance to do any extensive testing on this yet but the theory is sound and I didn’t run into any problems the couple of times I’ve tried it.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating