Duplicate Records

  • Hi I have serveral product files i import each day from different vendors

    each file contains manufacturer part numbers, prices, weight, descriptions etc etc for roughly 10,000 products

    I Import daily the relevant info from each vendor files into their own tables

    and then import all of the tables into one big table for uploading to our website.

    the problem is that there are duplicate products coming from different sources

    is there a way to weed out the duplicates based on price? either remove duplicates once they are in the combined table leaving the one with the lowest price, or filter so the duplicates cant happen in the first place.

    Thanks

  • You have to write a query that would find duplicates, and then decide (disable or delete) how to handle the duplicates. There's no way for us to know from which table you'd want to delete things.

    You can use standard join syntax to find out which rows in which tables might be duplicates.

  • CREATE TABLE Dups (NameId int, FirstName varchar(10), LastName varchar(10))

    GO

    INSERT INTO Dups VALUES (106, 'Robert', 'Aide')

    INSERT INTO Dups VALUES (107, 'David', 'Koidl')

    INSERT INTO Dups VALUES (108, 'Hongbo', 'Li')

    INSERT INTO Dups VALUES (123, 'Robert', 'Aide')

    INSERT INTO Dups VALUES (124, 'David', 'Koidl')

    INSERT INTO Dups VALUES (125, 'Hongbo', 'Li')

    INSERT INTO Dups VALUES (138, 'Robert', 'Aide')

    INSERT INTO Dups VALUES (139, 'David', 'Koidl')

    INSERT INTO Dups VALUES (140, 'Hongbo', 'Li')

    INSERT INTO Dups VALUES (153, 'Robert', 'Aide')

    INSERT INTO Dups VALUES (154, 'David', 'Koidl')

    INSERT INTO Dups VALUES (155, 'Hongbo', 'Li')

    GO

    DELETE FROM dups

    WHERE

    EXISTS

    (

    SELECT

    NameId

    FROM

    dups DupsInner

    WHERE

    DupsInner.FirstName = Dups.FirstName

    AND DupsInner.LastName = Dups.LastName

    AND DupsInner.NameId < Dups.NameId

    )

    GO

  • The easiest solution is going to be deleting from the combined table, before the final upload. You could dump them in the load from the vendor-specific tables, but that's going to be much more complex.

    My usual solution for deleting duplicates is to use a CTE and the Row_Number() function.

    ;with Products (ID, Row) as

    (select ID, row_number() over (partition by ProductID order by Price)

    from dbo.CombinedTable)

    delete from Products

    where Row > 1

    I'm guessing on your table structure, name, etc., but that's the basic code.

    Does that help?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply