best way to remove duplicates when no primary key or date present

  • Heh... sorry about the "soapbox" coming up... πŸ˜€

    The "best" way to do all of this is to not let dupes into production tables to begin with. Unknown data should always be inserted into a staging table (yeah, with an IDENTITY column), dupe checked on its own, and then properly merged.

    If the production table is inherently unknown in duplicate quality (no natural PK), then you really need to plan for dupes... there has to be a DateTime column AND and IDENTITY column to break datetime ties.

    Then, you don't have to worry about finding a the "best" way because you've already done it the "best" way. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have seen topics like this about a thousand times on different sites, and have yet to see one where the OP could answer the question β€œWhat constitutes a duplicate?”.

    Doesn't look like this one will be any different.

  • speaking of dupes...

    methinks we've all been duped into continuing to beat up on this topic, since the OP is long gone...:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • It's not all about the OP, you know πŸ™‚

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • now that the soap box has been brought out...:hehe:

    I would like to see creation and modification information as well as a unique identifier. You may also want to have a status column.

    Ok I'm done

    Q

    Please take a number. Now serving emergency 1,203,894

  • Matt Miller (5/1/2008)


    speaking of dupes...

    methinks we've all been duped into continuing to beat up on this topic, since the OP is long gone...:)

    Yep... gotta love it when the OP bails right after posting the question :blink:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/1/2008)


    Matt Miller (5/1/2008)


    speaking of dupes...

    methinks we've all been duped into continuing to beat up on this topic, since the OP is long gone...:)

    Yep... gotta love it when the OP bails right after posting the question :blink:

    These are the ones where I wish my post had been "Truncate table will get rid of the dupes". πŸ™‚

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Yep should have just given them the nuclear option right from the beginning.

    Q

    Please take a number. Now serving emergency 1,203,894

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply