August 8, 2008 at 4:44 pm
Hi I have serveral product files i import each day from different vendors
each file contains manufacturer part numbers, prices, weight, descriptions etc etc for roughly 10,000 products
I Import daily the relevant info from each vendor files into their own tables
and then import all of the tables into one big table for uploading to our website.
the problem is that there are duplicate products coming from different sources
is there a way to weed out the duplicates based on price? either remove duplicates once they are in the combined table leaving the one with the lowest price, or filter so the duplicates cant happen in the first place.
Thanks
August 10, 2008 at 5:20 pm
You have to write a query that would find duplicates, and then decide (disable or delete) how to handle the duplicates. There's no way for us to know from which table you'd want to delete things.
You can use standard join syntax to find out which rows in which tables might be duplicates.
August 11, 2008 at 11:13 am
CREATE TABLE Dups (NameId int, FirstName varchar(10), LastName varchar(10))
GO
INSERT INTO Dups VALUES (106, 'Robert', 'Aide')
INSERT INTO Dups VALUES (107, 'David', 'Koidl')
INSERT INTO Dups VALUES (108, 'Hongbo', 'Li')
INSERT INTO Dups VALUES (123, 'Robert', 'Aide')
INSERT INTO Dups VALUES (124, 'David', 'Koidl')
INSERT INTO Dups VALUES (125, 'Hongbo', 'Li')
INSERT INTO Dups VALUES (138, 'Robert', 'Aide')
INSERT INTO Dups VALUES (139, 'David', 'Koidl')
INSERT INTO Dups VALUES (140, 'Hongbo', 'Li')
INSERT INTO Dups VALUES (153, 'Robert', 'Aide')
INSERT INTO Dups VALUES (154, 'David', 'Koidl')
INSERT INTO Dups VALUES (155, 'Hongbo', 'Li')
GO
DELETE FROM dups
WHERE
EXISTS
(
SELECT
NameId
FROM
dups DupsInner
WHERE
DupsInner.FirstName = Dups.FirstName
AND DupsInner.LastName = Dups.LastName
AND DupsInner.NameId < Dups.NameId
)
GO
August 11, 2008 at 1:12 pm
The easiest solution is going to be deleting from the combined table, before the final upload. You could dump them in the load from the vendor-specific tables, but that's going to be much more complex.
My usual solution for deleting duplicates is to use a CTE and the Row_Number() function.
;with Products (ID, Row) as
(select ID, row_number() over (partition by ProductID order by Price)
from dbo.CombinedTable)
delete from Products
where Row > 1
I'm guessing on your table structure, name, etc., but that's the basic code.
Does that help?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply