SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


how to remove characters char(0) to char(31)


how to remove characters char(0) to char(31)

Author
Message
Raymond van Laake
Raymond van Laake
SSC-Addicted
SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)

Group: General Forum Members
Points: 405 Visits: 390
Hi there,

Sometimes customers add data with unwanted characters in the ascii range 0 to 31. How do I remove them? I'd prefer a regular expression type of solution because this would be fastest.

Thanks,

Raymond
Tee Time
Tee Time
Say Hey Kid
Say Hey Kid (671 reputation)Say Hey Kid (671 reputation)Say Hey Kid (671 reputation)Say Hey Kid (671 reputation)Say Hey Kid (671 reputation)Say Hey Kid (671 reputation)Say Hey Kid (671 reputation)Say Hey Kid (671 reputation)

Group: General Forum Members
Points: 671 Visits: 465
something like this may work for your purpose, but I think that may depend on

SELECT REPLACE(REPLACE(yourfield, CHAR(13), ''), CHAR(10), '')

This basically replaces the char with nothing.
Raymond van Laake
Raymond van Laake
SSC-Addicted
SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)

Group: General Forum Members
Points: 405 Visits: 390
It would work indeed, but I'm hoping to find a better way

Thx!
Gazareth
Gazareth
SSCarpal Tunnel
SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)SSCarpal Tunnel (4K reputation)

Group: General Forum Members
Points: 4009 Visits: 5823
I'd go with nested REPLACEs. It's not going to look nice but I don't think there's a better way.
Perhaps wrap it in a Function if you need to reuse or to keep code tidy?
Lowell
Lowell
One Orange Chip
One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)One Orange Chip (28K reputation)

Group: General Forum Members
Points: 28147 Visits: 39945
i have this scalar function saved in my snippets, that basically strips chars that don't fit within desired ranges;

this strips your 0-31, but also spaces, punctuations and all high ascii chracters as well; it's a little greedy with the deletes, but a great example to modify.
you could modify it to fit your specific needs:


CREATE FUNCTION StripNonAlphaNumeric(@OriginalText VARCHAR(8000))
RETURNS VARCHAR(8000)
BEGIN
DECLARE @CleanedText VARCHAR(8000)
;WITH tally (N) as
(SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id)
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2)
SELECT @CleanedText = ISNULL(@CleanedText,'') +
CASE
--ascii numbers are 48(for '0') thru 57 (for '9')
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57
THEN SUBSTRING(@OriginalText,Tally.N,1)
--ascii upper case letters A-Z is 65 thru 90
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 65 AND 90
THEN SUBSTRING(@OriginalText,Tally.N,1)
--ascii lower case letters a-z is 97 thru 122
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 97 AND 122
THEN SUBSTRING(@OriginalText,Tally.N,1)
ELSE '' END

FROM tally WHERE Tally.N <= LEN(@OriginalText)

RETURN @CleanedText
END



Lowell

--
help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

ScottPletcher
ScottPletcher
SSCertifiable
SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)SSCertifiable (7.9K reputation)

Group: General Forum Members
Points: 7882 Visits: 7149
I too think nested REPLACEs would probably be fastest.

You could do a simple function, but I'm not sure that would perform as well, let alone better ... but, really, with SQL, it's hard to tell so you may want to give it a try :-).




CREATE FUNCTION dbo.RemoveSpecifiedCharsFromString (
@string varchar(2000),
@charsToRemove varchar(50)
)
RETURNS varchar(2000)
AS
BEGIN
WHILE PATINDEX('%[' + @charsToRemove + ']%', @string) > 0
SET @string = STUFF(@string, PATINDEX('%[' + @charsToRemove + ']%', @string), 1, '')
RETURN @string
END --FUNCTION



DECLARE @string varchar(2000)
DECLARE @charsToRemove varchar(50)
SET @charsToRemove = CHAR(00) + CHAR(01) + CHAR(02) + CHAR(03) + CHAR(04) + CHAR(05) + /*... + */
CHAR(09) + CHAR(10) + /* ... + */ CHAR(13) + /* ... + */ CHAR(31)

SELECT string, dbo.RemoveSpecifiedCharsFromString(string, @charsToRemove)
FROM (
SELECT 'abc' + CHAR(10) + CHAR(13) + CHAR(01) + 'def' AS string UNION ALL
SELECT 'ghi' + CHAR(03) + CHAR(04) + REPLICATE(CHAR(05), 10) + 'jkl'
) AS test_data




SQL DBA,SQL Server MVP(07, 08, 09)[size=2]Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.[/size]
Raymond van Laake
Raymond van Laake
SSC-Addicted
SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)SSC-Addicted (405 reputation)

Group: General Forum Members
Points: 405 Visits: 390
Thank you all.... wil test your suggestions on Monday!

Ray
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)

Group: General Forum Members
Points: 85951 Visits: 41091
Raymond van Laake (11/16/2012)
Hi there,

Sometimes customers add data with unwanted characters in the ascii range 0 to 31. How do I remove them? I'd prefer a regular expression type of solution because this would be fastest.

Thanks,

Raymond


Regular expressions may be the fastest in another language but they're not necessarily the fastest in T-SQL because of the bit of overhead that a CLR to call RegEx would take. Please see the following article and the comprehensive discussion (click on "Join the Discussion") attached to that for proof.
http://www.sqlservercentral.com/articles/RegEx/88586/

Scott is correct, though. Nested REPLACEs will be faster than most anything else I can come up with especially when you create and use a high performance Inline Table Valued Function instead of using a Scalar UDF.

Most people also forget about the control character at the other end of the basic ASCII table, the DELETE character.

Here's the function that uses nested REPLACEs...
 CREATE FUNCTION dbo.DropControlCharacters 
(@pString VARCHAR(8000))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT CleanedString =
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
@pString
,CHAR(0),'') COLLATE Latin1_General_BIN
,CHAR(1),'') COLLATE Latin1_General_BIN
,CHAR(2),'') COLLATE Latin1_General_BIN
,CHAR(3),'') COLLATE Latin1_General_BIN
,CHAR(4),'') COLLATE Latin1_General_BIN
,CHAR(5),'') COLLATE Latin1_General_BIN
,CHAR(6),'') COLLATE Latin1_General_BIN
,CHAR(7),'') COLLATE Latin1_General_BIN
,CHAR(8),'') COLLATE Latin1_General_BIN
,CHAR(9),'') COLLATE Latin1_General_BIN
,CHAR(10),'') COLLATE Latin1_General_BIN
,CHAR(11),'') COLLATE Latin1_General_BIN
,CHAR(12),'') COLLATE Latin1_General_BIN
,CHAR(13),'') COLLATE Latin1_General_BIN
,CHAR(14),'') COLLATE Latin1_General_BIN
,CHAR(15),'') COLLATE Latin1_General_BIN
,CHAR(16),'') COLLATE Latin1_General_BIN
,CHAR(17),'') COLLATE Latin1_General_BIN
,CHAR(18),'') COLLATE Latin1_General_BIN
,CHAR(19),'') COLLATE Latin1_General_BIN
,CHAR(20),'') COLLATE Latin1_General_BIN
,CHAR(21),'') COLLATE Latin1_General_BIN
,CHAR(22),'') COLLATE Latin1_General_BIN
,CHAR(23),'') COLLATE Latin1_General_BIN
,CHAR(24),'') COLLATE Latin1_General_BIN
,CHAR(25),'') COLLATE Latin1_General_BIN
,CHAR(26),'') COLLATE Latin1_General_BIN
,CHAR(27),'') COLLATE Latin1_General_BIN
,CHAR(28),'') COLLATE Latin1_General_BIN
,CHAR(29),'') COLLATE Latin1_General_BIN
,CHAR(30),'') COLLATE Latin1_General_BIN
,CHAR(31),'') COLLATE Latin1_General_BIN
,CHAR(127),'') COLLATE Latin1_General_BIN
;
GO



If you want to test performance, here's some code to build a wad o' test data. Details, as usual, are in the comments in the code.
--===== Conditionally drop the test table to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL
DROP TABLE #TestTable
;
--===== Create and populate the test table with test data.
-- Most rows will have 2 embedded control characters although some may have just 1
-- just due to random selection.
SELECT TOP 100000
RowNum = IDENTITY(INT,1,1),
SomeString =
STUFF(
STUFF(
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`~!@#$%^&*()_-+={[}]|\:;"''<,>.?/',
ABS(CHECKSUM(NEWID()))%94+1, 1, CHAR(ABS(CHECKSUM(NEWID()))%30+1)),
ABS(CHECKSUM(NEWID()))%94+1, 1, CHAR(ABS(CHECKSUM(NEWID()))%30+1))
INTO #TestTable
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
;
GO



Here's how to use the iTVF (iSF because it returns a scalar value) function against the test data.

 SELECT ca.CleanedString
FROM #TestTable tt
CROSS APPLY dbo.DropControlCharacters(tt.SomeString) ca
;




--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)

Group: General Forum Members
Points: 85951 Visits: 41091
Sorry... duplicate post removed.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)

Group: General Forum Members
Points: 85951 Visits: 41091
Tee Time (11/16/2012)
something like this may work for your purpose, but I think that may depend on

SELECT REPLACE(REPLACE(yourfield, CHAR(13), ''), CHAR(10), '')

This basically replaces the char with nothing.

Raymond van Laake (11/16/2012)
It would work indeed, but I'm hoping to find a better way

Thx!

Define "better" because if those are the only two characters you really need to worry about, then you just blew off the absolute best way. ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search