Performing Row-By-Row Data Corrections - Cursor or Loop?

Question

Performing Row-By-Row Data Corrections - Cursor or Loop?

T Nappi

Grasshopper

Points: 13
More actions
August 10, 2004 at 1:00 pm

#108682

Hi,
I'm relatively new to T-SQL as an intern at an insurance company. I try to do most of my learning through Google, but I wasn't getting the results I needed so I'm going to try this forum and hopefully you pros can answer my questions.
I have a table which simply contains NAME and ID. I have a series of filters that I need to apply to the name fields, i.e. remove certain formatting, words, etc. I'm not sure whether to use cursors or a loop seeing how so many sources speak of cursor's negative performance impacts. I can't even find a good example - most of the ones out there perform complex operations, while I'm trying to do something very simple.
Any ideas? Just point me in the right direction and I'll take over from there.
Todd

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

corey lawson Hall of Fame Points: 3773 More actions · Answer 1

Actually, with doing string manipulations on the NAME field and comparing, you're gonna be doing table scans anyways for most operations.

It all depends on what you're most comfortable with. I personally would just do a stored proc that ran a bunch of UPDATE queries as needed.

i.e.,

create procedure FixName as

update mytable set name=replace(name, '#','');

update mytable set name = null where ltrim(name) = '';

/* yada yada yada...*/

go

babuganesh2000 Hall of Fame Points: 3123 More actions · Answer 2

From your question, I presume the ID column is unique else pls modify the following code to push the data to a temporary column with an identity column and loop through.

Here is my code.

Declare @ID INT

WHILE

1 = 1

BEGIN

SELECT @ID= MIN(ID)

FROM Table_Name

WHERE (ID_Num > @ID_Num)

IF (@@rowcount = 0) OR (@ID IS NULL)

BREAK

SELECT NAME FROM Table_Name Where ID = @ID

END

-- WHILE Loop through each name

Thanks,

Ganesh

AKM SSCommitted Points: 1538 More actions · Answer 3

The broad heuristic is that if what you want to do can be done with set-based operations, do it with set-based operations; only use a cursor if you have to.

Depending on how simple the filters you want to apply are, you may or may not be able to code them in T-SQL to go into an UPDATE statement. And you may end up with lots of UPDATE statements to do all the work you want doing.

Having said to prefer set-based operations, if this is a one-time operation - cleaning data after an initial import, say - then don't be afraid to use a cursor if it looks like the easiest thing to do. If you're only running code once, it doesn't matter if its performance is not optimal

T Nappi Grasshopper Points: 13 More actions · Answer 4

Thanks for the responses. You are correct,the ID field is the unique identifier. Also, time is not really of the issue, as I can easily start the procedure when I leave, which gives it all night to run (I have no idea how long this will be for ~5 million rows, but I'm going to be doing some testing on smaller tables to estimate it).

Is there actually a replace function, or did you just use that as if I had already coded it? I will look that up today - if there is, my life just got easier.

T Nappi Grasshopper Points: 13 More actions · Answer 5

I was able to do 90% of the corrections with the Replace command. However, there are a few corrections that have me stumped that I think I will need to implement cursors for. For instance, I need to change 'I B M' to 'IBM'. Any ideas?

hoo-t SSChampion Points: 12713 More actions · Answer 6

How many 'I B M' s do you have? Can you do a visual check to be sure no

' SANDI B McCOY ' or ' I B Martin ' records exist? Perhaps a replace where you placed a space before the 'I', and one after the 'M'? An additional update covering ' I B M,'?