duplicate entries in a field

  • Naga's update is good and nasty fast... it just needs a filter...

    update Customer

    set address = LEFT (address, LEN(address) - CHARINDEX(',', REVERSE(address)))

    WHERE LEN(Address) - LEN(REPLACE(Address,',','')) = 2

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I had thaught about this... but isn't it possible that someone included a comma in their adress manually?

  • Absolutely correct, Remi... you and I both know that "open text" columns that contain multiple pieces of data (city,state... full name, etc) are the absolute worst and will usually require some manual verification (post implementation testing on the data).  What I'm banking on is that, unlike a human, the app made the same mistake in a consistant fashion.  That doesn't mean that there wasn't some other garbage already present in the column which is why the manual verification is going to be necessary.

    Anita,

    Remi is absolutely correct and you are wise to do these tests in a temporary table until you are sure.  I'd recommend that you take it one step further... copy only those records that have more than one comma in the address column to the temporary table to reduce the number of records you need to verify.  Be sure to include the primary key column(s). 

    Here's where the difference comes in...

    Instead of updating the address column in the temporary table, add a new column called something like "CorrectedAddress"... use whatever fix you decide on to update the CorrectedAddress so that you can easily compare the original address with the correction.

    Then, once you've verified all of the corrections (you can even make the occasional manual correction for exceptions), use your temporary table as a "driver" to control the updates on the original table (that's why you need to copy the PK to the temporary table).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ya this seems like the safest solution... I don't think you can assume anything here and "hope" that an update statement will fix all 100% of the data without any side effects. I really wish it will for your sake but I'm afraid you'll meet at least one exception. That's why I'd really go with Jeff's solution. And one last step to add would be to take a backup just before updating... just in case!?!?.

  • I've created a test table and tested it and it works fine on the the data that looks like adress,NY,NY, but you're right I've found some exceptions. AUGH! The app lead was not aware of the exceptions so we'll need to do more research. thank you so much for your help!

    Thanks!

  • I'm just glad we finished by getting it right in the end.

     

    How many exceptions did you find? (total and percentage)

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply