This is a continuation of string manipulation techniques. If you are interested, you can read the other articles,
though you do not need to read it before this one. These are mostly beginning
programming articles, but advanced T-SQL programmers may still find something useful here.
- Part 1 deals with SUBSTRING and how it can be used to extract some information
from a field of data
- Part 2 deals with CHARINDEX and how it can be used to extract some information
from a field of data when the data is delimited rather than stored in a particular format.
- Part 3 deals with REPLACE and how it can be used to remove unwanted information
from a field of data when the data is not in a known format.
- Part 4 deals with numeric conversions.
Introduction
Continuing on with taming strings...
The first three parts of this series dealt with manipulating strings in response to some business
issue, usually dealing with phone numbers or zip codes. This article continues in the same
vein with another alternative for dealing with the formatting issues that occur with these types
of data.
The Problem
Usually I receive data in a variety of formats and need to force it into a standard format. In past
articles, I have used various string techniques to remove unnecessary data and reformat
the data into a standard format. A lack of validation of input data, or no control over the data being
input is the cause, but it happens. However, once I remove all extraneous formatting
to standardize the data, I sometimes need to add additional formating
to standardize the data for a front end application(s).
Suppose that I had scrubbed and cleansed all my phone number data from
Part 1 of this
series to look like the following:
phone ---------- 6055552862 5615552700 9045555680 N/A 5805555371 2815558368 2545558430 3365552797 3365557233 5925553181x4951 96615551222x249 447930555271
Now I wish to format this data so that all phone numbers are stored in
the following format:
(999) 999-9999
for US phone numbers. For this article, assume that I have some method of
determining which rows are US phone numbers and which are not. This leaves me
with the following rows
phone ---------- 6055552862 5615552700 9045555680 5805555371 2815558368 2545558430 3365552797 3365557233 5925553181x4951 96615551222x249
that I want to convert to:
phone ---------- (605) 555-2862 (561) 555-2700 (904) 555-5680 (580) 555-5371 (281) 555-8368 (254) 555-8430 (336) 555-2797 (336) 555-7233 (592) 555-3181x4951 (966) 155-51222x249
The Solution
Once again, I decided to search my handy-dandy Books Online (for those of you with young kids, you
will get the joke. For the rest of you look here)
in the string functions area and I decided to use STUFF. This is a string function that allows you to delete
characters from a string and insert a string value into another string with control over the positioning of the
insertion. The difference from REPLACE is that this function uses a position in the
string to make replacement rather than a pattern.
The format for STUFF is as follows:
STUFF ( character_expression 1, start ,
length , character_expression 2)
where the parameters are:
- Character_expression 1 - The string expression in which to insert data
- Start - The start position for the insertion
- Length - The number of characters in character_expression 1 to delete.
- Character_expression 2 - the string to insert into character_expression 1
Normally, I have used this function in the same manner as REPLACE to remove some
data from a string and add different data. However, there is a trick to using this function to
convert the phone number data above.
The trick is to not delete any characters. If I use the following code:
declare @d char( 14) select @d = '6055552862' select stuff( @d,1 ,1,'('), @d
I get
(055552862
which is not what I want. Instead, I want to change the length parameter to a 0, so no
characters are deleted. In this case, an insertion occurs with no characters being deleted.
The code now looks like:
declare @d char( 14) select @d = '6055552862' select stuff( @d,1 , 0,'('), @d
I get
(6055552862
By expanding this to include all the characters that I need, I can reformat the entire
string in one (messy) SQL statement. Here is it:
declare @d char( 14) select @d = '6055552862' select stuff( stuff( stuff( stuff( @d,1 ,0, '('), 5, 0, ')'), 6, 0, ' '), 10, 0, '-')
which yeilds
(605) 555-2862
Conclusion
This statement could have easily been done with the SUBSTRING command and a few
concatenations. I think the STUFF command is cleaner and easier to read. Once you
get used to using it's syntax. Of course, to update the table of information, I would
have to be sure that all rows were in the same format or use a WHERE clause to limit
the update to those rows that match the criteria.
I hope and am sure this function will really spark some ideas in many of you. Hopefully
I will come in handy for you in solving some problem.
As always, I welcome feedback and please rate this article below (and any you read on Swynk).
It helps to motivate and assist us authors in writing better columns.
Steve Jones
March 2001