Duplicate Records

Question

Duplicate Records

jspencer-506909

Valued Member

Points: 69
More actions
August 8, 2008 at 4:44 pm

#93786

Hi I have serveral product files i import each day from different vendors
each file contains manufacturer part numbers, prices, weight, descriptions etc etc for roughly 10,000 products
I Import daily the relevant info from each vendor files into their own tables
and then import all of the tables into one big table for uploading to our website.
the problem is that there are duplicate products coming from different sources
is there a way to weed out the duplicates based on price? either remove duplicates once they are in the combined table leaving the one with the lowest price, or filter so the duplicates cant happen in the first place.
Thanks

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 734564 More actions · Answer 1

You have to write a query that would find duplicates, and then decide (disable or delete) how to handle the duplicates. There's no way for us to know from which table you'd want to delete things.

You can use standard join syntax to find out which rows in which tables might be duplicates.

Senjaya SSC Enthusiast Points: 155 More actions · Answer 2

CREATE TABLE Dups (NameId int, FirstName varchar(10), LastName varchar(10))

GO

INSERT INTO Dups VALUES (106, 'Robert', 'Aide')

INSERT INTO Dups VALUES (107, 'David', 'Koidl')

INSERT INTO Dups VALUES (108, 'Hongbo', 'Li')

INSERT INTO Dups VALUES (123, 'Robert', 'Aide')

INSERT INTO Dups VALUES (124, 'David', 'Koidl')

INSERT INTO Dups VALUES (125, 'Hongbo', 'Li')

INSERT INTO Dups VALUES (138, 'Robert', 'Aide')

INSERT INTO Dups VALUES (139, 'David', 'Koidl')

INSERT INTO Dups VALUES (140, 'Hongbo', 'Li')

INSERT INTO Dups VALUES (153, 'Robert', 'Aide')

INSERT INTO Dups VALUES (154, 'David', 'Koidl')

INSERT INTO Dups VALUES (155, 'Hongbo', 'Li')

GO

DELETE FROM dups

WHERE

EXISTS

(

SELECT

NameId

FROM

dups DupsInner

WHERE

DupsInner.FirstName = Dups.FirstName

AND DupsInner.LastName = Dups.LastName

AND DupsInner.NameId < Dups.NameId

)

GO

GSquared SSC Guru Points: 260824 More actions · Answer 3

The easiest solution is going to be deleting from the combined table, before the final upload. You could dump them in the load from the vendor-specific tables, but that's going to be much more complex.

My usual solution for deleting duplicates is to use a CTE and the Row_Number() function.

;with Products (ID, Row) as

(select ID, row_number() over (partition by ProductID order by Price)

from dbo.CombinedTable)

delete from Products

where Row > 1

I'm guessing on your table structure, name, etc., but that's the basic code.

Does that help?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon