SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Remove Leading Zero(s) only if needed


Remove Leading Zero(s) only if needed

Author
Message
Scott Milburn
Scott Milburn
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 177
I have four tables that have a varchar data type which gets imports or daily user updates and this field is an account number field which always has a length of 10, including leading zeros. My current code uses:

UPDATE table.Statements
SET field03 = REPLACE(LTRIM(REPLACE(field03, '0', ' ')), ' ', '0')

The issue is that this runs against all four tables and takes around 5 minutes. After the code runs, I see that it touched every record, which I don't want. I just want the code to look at the rows that this field needs to be changed on and disregard the previously trimmed rows.

I know I need some kind of IF statement to accomplish this, but not sure how.

TIA, Scott
mickyT
mickyT
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1598 Visits: 3317
Try

UPDATE table.Statements
SET field03 = REPLACE(LTRIM(REPLACE(field03, '0', ' ')), ' ', '0')
WHERE field03 like '0%'


This will only try to update where field03 starts with a 0. An index on this would help

However if you only have numbers in the string, I would probably go for a statement more like


UPDATE table.Statements
SET field03 = CAST(CAST(field03 as bigint) as varchar(20))
WHERE field03 like '0%'



Otherwise you could do


UPDATE table.Statements
SET field03 = SUBSTRING(field03,PATINDEX('%[^0]%',field03),20)
WHERE field03 like '0%'


Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)

Group: General Forum Members
Points: 88338 Visits: 41128
Scott Milburn (2/27/2013)
I have four tables that have a varchar data type which gets imports or daily user updates and this field is an account number field which always has a length of 10, including leading zeros. My current code uses:

UPDATE table.Statements
SET field03 = REPLACE(LTRIM(REPLACE(field03, '0', ' ')), ' ', '0')

The issue is that this runs against all four tables and takes around 5 minutes. After the code runs, I see that it touched every record, which I don't want. I just want the code to look at the rows that this field needs to be changed on and disregard the previously trimmed rows.

I know I need some kind of IF statement to accomplish this, but not sure how.

TIA, Scott


If Field03 is for account number, I'm thinking that you probably have an index or two on it. The real problems may be that 1) you're updating both the table and the index(es) and 2) you're fragging the hell out of the index(es).

I'd recommend dropping the index(es), doing the update, then re-add te index(es). You're likely going to have to rebuild them after this, anyway.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)

Group: General Forum Members
Points: 88338 Visits: 41128
mickyT (2/27/2013)
However if you only have numbers in the string, I would probably go for a statement more like


UPDATE table.Statements
SET field03 = CAST(CAST(field03 as bigint) as varchar(20))
WHERE field03 like '0%'



Otherwise you could do


UPDATE table.Statements
SET field03 = SUBSTRING(field03,PATINDEX('%[^0]%',field03),20)
WHERE field03 like '0%'




Why? Please don't say "for performance reasons" until you've actually tested it. I tested all of the solutions offered so far and there's virtually no difference. Try it yourself. Here's a million row test table...

--===== Create and populate a large test table on-the-fly
SELECT TOP 1000000
Field03 = RIGHT('0000000000'+CAST(ABS(CHECKSUM(NEWID()))%2147483647+1 AS VARCHAR(10)),10)
INTO #TestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;



--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
mickyT
mickyT
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1598 Visits: 3317

Why? Please don't say "for performance reasons" until you've actually tested it. I tested all of the solutions offered so far and there's virtually no difference. Try it yourself. Here's a million row test table...


Sorry Jeff, I wasn't suggesting them for performance reasons, just other ways of doing it. I found the method the OP put up originally difficult to decipher at first glance.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)SSC Guru (88K reputation)

Group: General Forum Members
Points: 88338 Visits: 41128
mickyT (2/27/2013)

Why? Please don't say "for performance reasons" until you've actually tested it. I tested all of the solutions offered so far and there's virtually no difference. Try it yourself. Here's a million row test table...


Sorry Jeff, I wasn't suggesting them for performance reasons, just other ways of doing it. I found the method the OP put up originally difficult to decipher at first glance.


Ah. Got it. Thanks for the feedback.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Snigdha Vartak
Snigdha Vartak
Mr or Mrs. 500
Mr or Mrs. 500 (570 reputation)Mr or Mrs. 500 (570 reputation)Mr or Mrs. 500 (570 reputation)Mr or Mrs. 500 (570 reputation)Mr or Mrs. 500 (570 reputation)Mr or Mrs. 500 (570 reputation)Mr or Mrs. 500 (570 reputation)Mr or Mrs. 500 (570 reputation)

Group: General Forum Members
Points: 570 Visits: 247
Have you tried Val?
Michael Valentine Jones
Michael Valentine Jones
SSCertifiable
SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)SSCertifiable (5.9K reputation)

Group: General Forum Members
Points: 5884 Visits: 11771
Scott Milburn (2/27/2013)
I have four tables that have a varchar data type which gets imports or daily user updates and this field is an account number field which always has a length of 10, including leading zeros. My current code uses:

UPDATE table.Statements
SET field03 = REPLACE(LTRIM(REPLACE(field03, '0', ' ')), ' ', '0')

The issue is that this runs against all four tables and takes around 5 minutes. After the code runs, I see that it touched every record, which I don't want. I just want the code to look at the rows that this field needs to be changed on and disregard the previously trimmed rows.

I know I need some kind of IF statement to accomplish this, but not sure how.

TIA, Scott


If the majority of rows do not need to be updated, then you could just do this:

UPDATE table.Statements
SET field03 = REPLACE(LTRIM(REPLACE(field03, '0', ' ')), ' ', '0')
WHERE field03 <> REPLACE(LTRIM(REPLACE(field03, '0', ' ')), ' ', '0')



It would still have to look at every row, but the actual update would be avoided and that is the most expensive part of the process.
dwain.c
dwain.c
SSCertifiable
SSCertifiable (7.5K reputation)SSCertifiable (7.5K reputation)SSCertifiable (7.5K reputation)SSCertifiable (7.5K reputation)SSCertifiable (7.5K reputation)SSCertifiable (7.5K reputation)SSCertifiable (7.5K reputation)SSCertifiable (7.5K reputation)

Group: General Forum Members
Points: 7459 Visits: 6431
If anybody cares, here's my vote:


WITH CTE AS (
SELECT Field03
FROM #TestTable
WHERE Field03 LIKE '0%')
UPDATE CTE
SET Field03 = REPLACE(LTRIM(REPLACE(Field03 COLLATE LATIN1_GENERAL_BIN , '0', ' ')), ' ', '0');




Using Jeff's test harness, the CTE seemed to give a slight performance boost over methods that didn't use the CTE to limit the target row set.

And the Binary collation sequence is a known "feature" of REPLACE:
http://connect.microsoft.com/SQLServer/feedback/details/512459/replace-function-extremely-slow-with-non-binary-windows-collation

Edit: Attached my test harness in case anyone wants to offer a critique.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Attachments
UPDATE with Binary Collation.txt (25 views, 2.00 KB)
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search