SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Duplicate Records


Duplicate Records

Author
Message
jspencer-506909
jspencer-506909
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 17
Hi I have serveral product files i import each day from different vendors
each file contains manufacturer part numbers, prices, weight, descriptions etc etc for roughly 10,000 products

I Import daily the relevant info from each vendor files into their own tables

and then import all of the tables into one big table for uploading to our website.

the problem is that there are duplicate products coming from different sources

is there a way to weed out the duplicates based on price? either remove duplicates once they are in the combined table leaving the one with the lowest price, or filter so the duplicates cant happen in the first place.

Thanks
Steve Jones
Steve Jones
SSC Guru
SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)

Group: Administrators
Points: 63338 Visits: 19115
You have to write a query that would find duplicates, and then decide (disable or delete) how to handle the duplicates. There's no way for us to know from which table you'd want to delete things.

You can use standard join syntax to find out which rows in which tables might be duplicates.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Senjaya
Senjaya
Forum Newbie
Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)Forum Newbie (5 reputation)

Group: General Forum Members
Points: 5 Visits: 51
CREATE TABLE Dups (NameId int, FirstName varchar(10), LastName varchar(10))
GO

INSERT INTO Dups VALUES (106, 'Robert', 'Aide')
INSERT INTO Dups VALUES (107, 'David', 'Koidl')
INSERT INTO Dups VALUES (108, 'Hongbo', 'Li')
INSERT INTO Dups VALUES (123, 'Robert', 'Aide')
INSERT INTO Dups VALUES (124, 'David', 'Koidl')
INSERT INTO Dups VALUES (125, 'Hongbo', 'Li')
INSERT INTO Dups VALUES (138, 'Robert', 'Aide')
INSERT INTO Dups VALUES (139, 'David', 'Koidl')
INSERT INTO Dups VALUES (140, 'Hongbo', 'Li')
INSERT INTO Dups VALUES (153, 'Robert', 'Aide')
INSERT INTO Dups VALUES (154, 'David', 'Koidl')
INSERT INTO Dups VALUES (155, 'Hongbo', 'Li')
GO

DELETE FROM dups
WHERE
EXISTS
(
SELECT
NameId
FROM
dups DupsInner
WHERE
DupsInner.FirstName = Dups.FirstName
AND DupsInner.LastName = Dups.LastName
AND DupsInner.NameId < Dups.NameId
)
GO
GSquared
GSquared
SSC-Insane
SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)

Group: General Forum Members
Points: 23715 Visits: 9730
The easiest solution is going to be deleting from the combined table, before the final upload. You could dump them in the load from the vendor-specific tables, but that's going to be much more complex.

My usual solution for deleting duplicates is to use a CTE and the Row_Number() function.


;with Products (ID, Row) as
(select ID, row_number() over (partition by ProductID order by Price)
from dbo.CombinedTable)
delete from Products
where Row > 1



I'm guessing on your table structure, name, etc., but that's the basic code.

Does that help?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search