Performing Row-By-Row Data Corrections - Cursor or Loop?

  • Hi,

    I'm relatively new to T-SQL as an intern at an insurance company. I try to do most of my learning through Google, but I wasn't getting the results I needed so I'm going to try this forum and hopefully you pros can answer my questions.

    I have a table which simply contains NAME and ID. I have a series of filters that I need to apply to the name fields, i.e. remove certain formatting, words, etc. I'm not sure whether to use cursors or a loop seeing how so many sources speak of cursor's negative performance impacts. I can't even find a good example - most of the ones out there perform complex operations, while I'm trying to do something very simple.

    Any ideas? Just point me in the right direction and I'll take over from there.

    Todd

  • Actually, with doing string manipulations on the NAME field and comparing, you're gonna be doing table scans anyways for most operations.

    It all depends on what you're most comfortable with. I personally would just do a stored proc that ran a bunch of UPDATE queries as needed.

    i.e.,

    create procedure FixName as

    update mytable set name=replace(name, '#','');

    update mytable set name = null where ltrim(name) = '';

    /* yada yada yada...*/

    go

  • From your question, I presume the ID column is unique else pls modify the following code to push the data to a temporary column with an identity column and loop through.

    Here is my code.

    Declare @ID INT

    WHILE

    1 = 1

    BEGIN

        SELECT @ID= MIN(ID)

          FROM  Table_Name

        WHERE (ID_Num > @ID_Num)

    IF (@@rowcount = 0) OR (@ID IS NULL)

        BREAK

    SELECT NAME FROM Table_Name Where ID = @ID

    END

    -- WHILE Loop through each name

     

    Thanks,

    Ganesh

     

  • The broad heuristic is that if what you want to do can be done with set-based operations, do it with set-based operations; only use a cursor if you have to.

    Depending on how simple the filters you want to apply are, you may or may not be able to code them in T-SQL to go into an UPDATE statement. And you may end up with lots of UPDATE statements to do all the work you want doing.

    Having said to prefer set-based operations, if this is a one-time operation - cleaning data after an initial import, say - then don't be afraid to use a cursor if it looks like the easiest thing to do. If you're only running code once, it doesn't matter if its performance is not optimal

     

  • Thanks for the responses. You are correct,the ID field is the unique identifier. Also, time is not really of the issue, as I can easily start the procedure when I leave, which gives it all night to run (I have no idea how long this will be for ~5 million rows, but I'm going to be doing some testing on smaller tables to estimate it).

    Is there actually a replace function, or did you just use that as if I had already coded it? I will look that up today - if there is, my life just got easier.

  • I was able to do 90% of the corrections with the Replace command. However, there are a few corrections that have me stumped that I think I will need to implement cursors for. For instance, I need to change 'I B M' to 'IBM'. Any ideas?

  • How many 'I B M' s do you have?  Can you do a visual check to be sure no

    ' SANDI B McCOY ' or ' I B Martin ' records exist? Perhaps a replace where you placed a space before the 'I', and one after the 'M'? An additional update covering ' I B M,'?

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply