Find and Remove Duplicate Records SQL Server

Question

Find and Remove Duplicate Records SQL Server

Viewing 15 posts - 61 through 75 (of 77 total)

You must be logged in to reply to this topic. Login to reply

Brandie Tarvin SSC Guru Points: 173105 More actions · Answer 1

Luis Cazares (2/3/2016)
Brandie Tarvin (2/3/2016)
Luis Cazares (2/3/2016)
Brandie Tarvin (2/3/2016)
SQLian (2/3/2016)
Not sure if I am missing something here, but why can't the following be used?
delete [FROM] Customers
where CustID NOT in (
select min(CustID)
from Customers
group by CustName
)
--Brandie added FROM clause
Because that will delete all the non duplicate customers as well.
No it won't. But it won't work with rows that are completely duplicates or with tables with composite keys.
Maybe I'm misreading, but SELECT MIN(CustID) would select the minimum customer ID (say 1) and then delete everything else, wouldn't it?
If not, what am I missing?
You're missing the GROUP BY CustName. It will leave one CustID for each CustName. For unique names it won't do anything, and for duplicate names it will delete the higher CustIDs.

Ahh, yes. I was skimming past that. It's been that kind of a morning.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

Brandie Tarvin (2/3/2016)
Luis Cazares (2/3/2016)
Brandie Tarvin (2/3/2016)
Luis Cazares (2/3/2016)
Brandie Tarvin (2/3/2016)
SQLian (2/3/2016)
Not sure if I am missing something here, but why can't the following be used?
delete [FROM] Customers
where CustID NOT in (
select min(CustID)
from Customers
group by CustName
)
--Brandie added FROM clause
Because that will delete all the non duplicate customers as well.
No it won't. But it won't work with rows that are completely duplicates or with tables with composite keys.
Maybe I'm misreading, but SELECT MIN(CustID) would select the minimum customer ID (say 1) and then delete everything else, wouldn't it?
If not, what am I missing?
You're missing the GROUP BY CustName. It will leave one CustID for each CustName. For unique names it won't do anything, and for duplicate names it will delete the higher CustIDs.
Ahh, yes. I was skimming past that. It's been that kind of a morning.

Been there

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

RobCarter SSC Enthusiast Points: 176 More actions · Answer 3

Am I being dim?

delete [FROM] Customers
where CustID NOT in (
select min(CustID)
from Customers
group by CustName
)

wouldn't this just not do anything at all?

if you have

100,

200,

300,

400,

500

doing the aggregate would just say delete everything where the customerid is not in 100,200,300,400,500. but the duplicate customer id is 300, so it wouldn't delete?

Happy to be wrong (It happens often).

x SSC-Insane Points: 23660 More actions · Answer 4

Jeff Moden (2/2/2016)
andy_111 (1/31/2016)
Not really useful article.
Not really a useful comment, either. Please explain why you think it's not useful.

I have to agree with andy_111's sentiment, he and others noted the row number / windowing function alternative, and additionally I think that with the warnings about SET ROWCOUNT's changing semantics, the original posted code is probably a trap for the unwary.

Also deleting dupes is sort of a FAQ and I don't think the author really covered the material, especially given what I've seen with a basic web search on the topic. Clearly the windowing functionality should have been mentioned, heck even Microsoft offers the selecting distinct into a temp table, deleting and reinserting as an option in one of their older pages. Given the shakey semantics lifetime of SET ROWCOUNT on updates, I think a decent effort should have discussed this, so I have to in general agree with andy_111's sentiment.

It looks like an old fashion article, doesn't it? For SQL 2000 or something.

Even with that, I like the insert distinct copies of the dupes into a temp table, remove the dupes, then reinsert back into the source if we're talking about non windowing function methods. Still yeah, a bit on the old fashioned side

SQLRNNR SSC Guru Points: 281344 More actions · Answer 5

Luis Cazares (2/1/2016)
venkataprasanth (2/1/2016)
The author's posted solution works and it DOESN'T DELETE ALL Duplicates. (If there are 3 dup rows it deletes 2 and leaves 1). So it works. Also I guess people posting the row_number() solution works only when we have a unique identifier in the table. But having a unique identifier obviously doesn't make two rows appear duplicate (no matter even if all other columns are same). The post is related to deleting duplicates when there is no unique identifier (i.e all columns values are same for more than one row)
What do you mean with having a unique identifier? The ROW_NUMBER option allows you to define what constitutes a duplicate (either one column, some columns or all the columns). The method shown in the article is bad in terms that it needs the table to have a single column that defines duplication and the worst part is that it deletes one row at a time using a deprecated option.
I'm sorry that Ginger Keys got such a harsh reaction, but the alternatives are certainly much better.

I did an article on the cte method years ago. Maybe the demo in the article will help clear things up a bit. There seems to be a lot of back and forth on using that method.

Remove Duplicates Article