|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: 2 days ago @ 3:42 AM
Points: 100,
Visits: 271
|
|
Hi there,
Sometimes customers add data with unwanted characters in the ascii range 0 to 31. How do I remove them? I'd prefer a regular expression type of solution because this would be fastest.
Thanks,
Raymond
|
|
|
|
|
Old Hand
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 11:27 AM
Points: 321,
Visits: 236
|
|
something like this may work for your purpose, but I think that may depend on
SELECT REPLACE(REPLACE(yourfield, CHAR(13), ''), CHAR(10), '')
This basically replaces the char with nothing.
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: 2 days ago @ 3:42 AM
Points: 100,
Visits: 271
|
|
It would work indeed, but I'm hoping to find a better way
Thx!
|
|
|
|
|
UDP Broadcaster
      
Group: General Forum Members
Last Login: Monday, May 20, 2013 1:53 AM
Points: 1,474,
Visits: 2,342
|
|
I'd go with nested REPLACEs. It's not going to look nice but I don't think there's a better way. Perhaps wrap it in a Function if you need to reuse or to keep code tidy?
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 3:50 PM
Points: 11,645,
Visits: 27,735
|
|
i have this scalar function saved in my snippets, that basically strips chars that don't fit within desired ranges;
this strips your 0-31, but also spaces, punctuations and all high ascii chracters as well; it's a little greedy with the deletes, but a great example to modify. you could modify it to fit your specific needs:
CREATE FUNCTION StripNonAlphaNumeric(@OriginalText VARCHAR(8000)) RETURNS VARCHAR(8000) BEGIN DECLARE @CleanedText VARCHAR(8000) ;WITH tally (N) as (SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id) FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2) SELECT @CleanedText = ISNULL(@CleanedText,'') + CASE --ascii numbers are 48(for '0') thru 57 (for '9') WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57 THEN SUBSTRING(@OriginalText,Tally.N,1) --ascii upper case letters A-Z is 65 thru 90 WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 65 AND 90 THEN SUBSTRING(@OriginalText,Tally.N,1) --ascii lower case letters a-z is 97 thru 122 WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 97 AND 122 THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END FROM tally WHERE Tally.N <= LEN(@OriginalText) RETURN @CleanedText END
Lowell
--There is no spoon, and there's no default ORDER BY in sql server either. Actually, Common Sense is so rare, it should be considered a Superpower. --my son
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Yesterday @ 3:56 PM
Points: 1,324,
Visits: 1,778
|
|
I too think nested REPLACEs would probably be fastest.
You could do a simple function, but I'm not sure that would perform as well, let alone better ... but, really, with SQL, it's hard to tell so you may want to give it a try .
CREATE FUNCTION dbo.RemoveSpecifiedCharsFromString ( @string varchar(2000), @charsToRemove varchar(50) ) RETURNS varchar(2000) AS BEGIN WHILE PATINDEX('%[' + @charsToRemove + ']%', @string) > 0 SET @string = STUFF(@string, PATINDEX('%[' + @charsToRemove + ']%', @string), 1, '') RETURN @string END --FUNCTION
DECLARE @string varchar(2000) DECLARE @charsToRemove varchar(50) SET @charsToRemove = CHAR(00) + CHAR(01) + CHAR(02) + CHAR(03) + CHAR(04) + CHAR(05) + /*... + */ CHAR(09) + CHAR(10) + /* ... + */ CHAR(13) + /* ... + */ CHAR(31) SELECT string, dbo.RemoveSpecifiedCharsFromString(string, @charsToRemove) FROM ( SELECT 'abc' + CHAR(10) + CHAR(13) + CHAR(01) + 'def' AS string UNION ALL SELECT 'ghi' + CHAR(03) + CHAR(04) + REPLICATE(CHAR(05), 10) + 'jkl' ) AS test_data
SQL DBA,SQL Server MVP('07, '08, '09) One man with courage makes a majority. Andrew Jackson
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: 2 days ago @ 3:42 AM
Points: 100,
Visits: 271
|
|
Thank you all.... wil test your suggestions on Monday!
Ray
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 5:54 PM
Points: 32,907,
Visits: 26,796
|
|
Raymond van Laake (11/16/2012) Hi there,
Sometimes customers add data with unwanted characters in the ascii range 0 to 31. How do I remove them? I'd prefer a regular expression type of solution because this would be fastest.
Thanks,
Raymond
Regular expressions may be the fastest in another language but they're not necessarily the fastest in T-SQL because of the bit of overhead that a CLR to call RegEx would take. Please see the following article and the comprehensive discussion (click on "Join the Discussion") attached to that for proof. http://www.sqlservercentral.com/articles/RegEx/88586/
Scott is correct, though. Nested REPLACEs will be faster than most anything else I can come up with especially when you create and use a high performance Inline Table Valued Function instead of using a Scalar UDF.
Most people also forget about the control character at the other end of the basic ASCII table, the DELETE character.
Here's the function that uses nested REPLACEs...
CREATE FUNCTION dbo.DropControlCharacters (@pString VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT CleanedString = REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( @pString ,CHAR(0),'') COLLATE Latin1_General_BIN ,CHAR(1),'') COLLATE Latin1_General_BIN ,CHAR(2),'') COLLATE Latin1_General_BIN ,CHAR(3),'') COLLATE Latin1_General_BIN ,CHAR(4),'') COLLATE Latin1_General_BIN ,CHAR(5),'') COLLATE Latin1_General_BIN ,CHAR(6),'') COLLATE Latin1_General_BIN ,CHAR(7),'') COLLATE Latin1_General_BIN ,CHAR(8),'') COLLATE Latin1_General_BIN ,CHAR(9),'') COLLATE Latin1_General_BIN ,CHAR(10),'') COLLATE Latin1_General_BIN ,CHAR(11),'') COLLATE Latin1_General_BIN ,CHAR(12),'') COLLATE Latin1_General_BIN ,CHAR(13),'') COLLATE Latin1_General_BIN ,CHAR(14),'') COLLATE Latin1_General_BIN ,CHAR(15),'') COLLATE Latin1_General_BIN ,CHAR(16),'') COLLATE Latin1_General_BIN ,CHAR(17),'') COLLATE Latin1_General_BIN ,CHAR(18),'') COLLATE Latin1_General_BIN ,CHAR(19),'') COLLATE Latin1_General_BIN ,CHAR(20),'') COLLATE Latin1_General_BIN ,CHAR(21),'') COLLATE Latin1_General_BIN ,CHAR(22),'') COLLATE Latin1_General_BIN ,CHAR(23),'') COLLATE Latin1_General_BIN ,CHAR(24),'') COLLATE Latin1_General_BIN ,CHAR(25),'') COLLATE Latin1_General_BIN ,CHAR(26),'') COLLATE Latin1_General_BIN ,CHAR(27),'') COLLATE Latin1_General_BIN ,CHAR(28),'') COLLATE Latin1_General_BIN ,CHAR(29),'') COLLATE Latin1_General_BIN ,CHAR(30),'') COLLATE Latin1_General_BIN ,CHAR(31),'') COLLATE Latin1_General_BIN ,CHAR(127),'') COLLATE Latin1_General_BIN ; GO
If you want to test performance, here's some code to build a wad o' test data. Details, as usual, are in the comments in the code.
--===== Conditionally drop the test table to make reruns in SSMS easier. IF OBJECT_ID('tempdb..#TestTable','U') IS NOT NULL DROP TABLE #TestTable ; --===== Create and populate the test table with test data. -- Most rows will have 2 embedded control characters although some may have just 1 -- just due to random selection. SELECT TOP 100000 RowNum = IDENTITY(INT,1,1), SomeString = STUFF( STUFF( '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`~!@#$%^&*()_-+={[}]|\:;"''<,>.?/', ABS(CHECKSUM(NEWID()))%94+1, 1, CHAR(ABS(CHECKSUM(NEWID()))%30+1)), ABS(CHECKSUM(NEWID()))%94+1, 1, CHAR(ABS(CHECKSUM(NEWID()))%30+1)) INTO #TestTable FROM master.sys.all_columns ac1 CROSS JOIN master.sys.all_columns ac2 ; GO
Here's how to use the iTVF (iSF because it returns a scalar value) function against the test data.
SELECT ca.CleanedString FROM #TestTable tt CROSS APPLY dbo.DropControlCharacters(tt.SomeString) ca ;
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 5:54 PM
Points: 32,907,
Visits: 26,796
|
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 5:54 PM
Points: 32,907,
Visits: 26,796
|
|
Tee Time (11/16/2012) something like this may work for your purpose, but I think that may depend on
SELECT REPLACE(REPLACE(yourfield, CHAR(13), ''), CHAR(10), '')
This basically replaces the char with nothing.
Raymond van Laake (11/16/2012) It would work indeed, but I'm hoping to find a better way
Thx! Define "better" because if those are the only two characters you really need to worry about, then you just blew off the absolute best way.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|