﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / SQL Server 2008 / T-SQL (SS2K8)  / how to strip out all non alpha numeric characters? / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Thu, 24 May 2012 13:32:30 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>[quote]Latin1_General_CI_AS is an example of a Windows collation.SQL_Latin1_General_CP1_CI_AS is a SQL collation.Latin1_General_BIN is a binary code point collation.[/quote]This is exactly what I needed to know.   Thanks for bringing this to our attention, Paul.</description><pubDate>Wed, 24 Mar 2010 06:32:23 GMT</pubDate><dc:creator>The Dixie Flatline</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>[quote][b]The Dixie Flatline (3/24/2010)[/b][hr]Paul, I have to ask just when is Windows collation used?  I would assume that since all the work is being done inside SQL that some default SQL collation sequence would be used, and you say that performance is not at all bad with SQL collation.  Doesn't each server, and DB have a default collation setting?[/quote]The collation rules are quite complex, see [url=http://msdn.microsoft.com/en-us/library/ms179886.aspx]Collation Precedence[/url].There is no 'internal collation' in the way I think you mean.  The collation used in comparisons and sorts always depends on the collation of the data involved, whether explicit or implicit.  The link above provides the details.Just in case it is not obvious, SQL collations all start with SQL_, binary collations all end with BIN or BIN2, and the others are all Windows collations.  That statement is mostly true, anyway ;-)Latin1_General_CI_AS is an example of a Windows collation.SQL_Latin1_General_CP1_CI_AS is a SQL collation.Latin1_General_BIN is a binary code point collation.</description><pubDate>Wed, 24 Mar 2010 06:26:34 GMT</pubDate><dc:creator>SQL Kiwi</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>Paul, I have to ask just when is Windows collation used?   I would assume that since all the work is being done inside SQL that some default SQL collation sequence would be used, and you say that performance is not at all bad with SQL collation.     Doesn't each server, and DB have a default collation setting?</description><pubDate>Wed, 24 Mar 2010 06:06:19 GMT</pubDate><dc:creator>The Dixie Flatline</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>Collation is so important when coding a routine that does a large number of comparisons or sorts, and not just because of the potential for errors as shown in this thread.  Performance can be absolutely shocking (up to 40 times worse) when using Windows collations, even with non-Unicode data.  Binary collations are fastest of all, but the SQL_* collations are not very far behind at all.See [url]https://connect.microsoft.com/SQLServer/feedback/details/512459/replace-function-extremely-slow-with-non-binary-windows-collation[/url] for details and a reproduction script.</description><pubDate>Wed, 24 Mar 2010 05:17:48 GMT</pubDate><dc:creator>SQL Kiwi</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>Interesting, again I learned something new today. Thanks for the reply.Peter</description><pubDate>Tue, 23 Mar 2010 17:45:43 GMT</pubDate><dc:creator>Peter Brinkhaus</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>@Peter:[quote]BTW, where did you get [0-Z] from. I was surprised to see that ':' or ';' do not fall within that range when using a collation like Latin1_General_CI_Ax. I scanned BOL, but I can not find anything about this pattern.[/quote]Actually, it's "developed" using the old fashioned trial and error method ;-)I used Lowells code and modified it to have a case clause for numbers as well as characters. Then I just tried to make it even more simple, doing the "contertest" with CHAR(..) for some (but not all) numbers Lowells code excluded.So, there's no "reliable source"...:-)</description><pubDate>Tue, 23 Mar 2010 17:38:14 GMT</pubDate><dc:creator>LutzM</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>[quote][b]The Dixie Flatline (3/23/2010)[/b][hr]I see what you mean, but my point wasn't the selection/omission criteria in the where clause.[/quote]But that's what the thread was about. The same issue was raised in a couple of recent threads so I thought I should mention it.[quote]In fact, I learned something from the earlier posts and made a mental note to be aware of potential differences in collation.[/quote]I am learning everyday here, both from the answers as the questions. This is a great site.[quote]The point is the slow performance of a user-defined function compared to an inline table-valued function.    At 100,000 rows I was seeing a difference of seconds versus minutes.    Take that for what its worth.  :-)[/quote]Agreed.</description><pubDate>Tue, 23 Mar 2010 17:16:40 GMT</pubDate><dc:creator>Peter Brinkhaus</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>I see what you mean, but my point wasn't the selection/omission criteria in the where clause.   In fact, I learned something from the earlier posts and made a mental note to be aware of potential differences in collation.The point is the slow performance of a user-defined function compared to an inline table-valued function.    At 100,000 rows I was seeing a difference of seconds versus minutes.    Take that for what its worth.  :-)</description><pubDate>Tue, 23 Mar 2010 16:53:02 GMT</pubDate><dc:creator>The Dixie Flatline</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>[quote]...substring(@input,N,1) between '0' and '9' ...[/quote]Still the same mistake. This will match characters like '³' and '²'.</description><pubDate>Tue, 23 Mar 2010 15:22:40 GMT</pubDate><dc:creator>Peter Brinkhaus</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>You may also want to file away this version, which uses an inline table valued function.    It performs significantly faster if you have to clean up a large number of rows in a table with one pass.   I modified the function above to get rid of anything except 0-9 and tested it against an existing function that does the same thing.[code]CREATE FUNCTION dbo.itvf_NumbersOnly (		@Input Varchar(max))RETURNS TABLE ASRETURN (WITH data as(select N,substring(@input,N,1) as element from dbo.tally where substring(@input,N,1) between '0' and '9'   -- alter to include only characters needed  and N &amp;lt;= len(@input))Select(Select ''+element		FROM data 		ORDER BY N		FOR XML PATH('')	   )  as [NumberString])GO[/code]Also, I should note that CLR is said to be significantly faster at string manipulation, but we don't do CLR here yet, so I can't test.    There was a huge thread on the subject [url=http://www.sqlservercentral.com/Forums/Topic695508-338-1.aspx][b]here[/b][/url].   There may be a variation in performance depending on the size of the strings to be cleaned and the number of characters to be replaced.[code]----------------- code to test---------------create table #test (rowID int identity primary key, data varchar(max))insert into #testselect top 100000 newID()from dbo.tallyselect top 10 * from #testset statistics io on;set statistics time on;select rowid,Data,NumberStringinto #test2from #testcross apply dbo.itvf_NumbersOnly(data)set statistics time off;set statistics io off;print '----------------------------------------------------------------'set statistics io on;set statistics time on;select rowid,data,dbo.StripNonNumeric(data) as NumberStringinto #test3from #testset statistics time off;set statistics io off;select top 10 * from #test2 order by rowidselect top 10 * from #test3 order by rowiddrop table #testdrop table #test2drop table #test3[/code]</description><pubDate>Tue, 23 Mar 2010 14:59:13 GMT</pubDate><dc:creator>The Dixie Flatline</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>[quote][b]lmu92 (3/23/2010)[/b][hr]Thanx for clarification, Peter!I knew there must have been a reason Lowell provided a method that [i]seemed[/i] to leave room for improvement... :blush:[/quote]Well, you were close. I think this one will do the job too[code="sql"]  CASE     WHEN (SUBSTRING(@OriginalText COLLATE Latin1_General_BIN2,Tally.N,1)) like '[0-9a-zA-Z]'     THEN SUBSTRING(@OriginalText,Tally.N,1)     ELSE '' END,[/code]BTW, where did you get [0-Z] from. I was surprised to see that ':' or ';' do not fall within that range when using a collation like Latin1_General_CI_Ax. I scanned BOL, but I can not find anything about this pattern.Peter</description><pubDate>Tue, 23 Mar 2010 14:18:34 GMT</pubDate><dc:creator>Peter Brinkhaus</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>Thanx for clarification, Peter!I knew there must have been a reason Lowell provided a method that [i]seemed[/i] to leave room for improvement... :blush:</description><pubDate>Tue, 23 Mar 2010 13:46:17 GMT</pubDate><dc:creator>LutzM</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>[quote][b]lmu92 (3/23/2010)[/b][hr]Maybe you could make the CASE statement of Lowells nice function a little shorter:[code="sql"]CASE   WHEN (SUBSTRING(@OriginalText,Tally.N,1)) like '[0-Z]'   THEN SUBSTRING(@OriginalText,Tally.N,1)   ELSE '' END[/code]It seems like it'll return the same value.[/quote]It seems, but it really depends on the collation being used. Check out the following script:[code="sql"]DECLARE @OriginalText VARCHAR(MAX) = '0;:ëêà^AaZz'SELECT  CASE     WHEN (SUBSTRING(@OriginalText COLLATE Latin1_General_BIN2,Tally.N,1)) like '[0-Z]'     THEN SUBSTRING(@OriginalText,Tally.N,1)     ELSE '' END,  CASE     WHEN (SUBSTRING(@OriginalText COLLATE Latin1_General_CI_AS,Tally.N,1)) like '[0-Z]'     THEN SUBSTRING(@OriginalText,Tally.N,1)     ELSE '' END,  CASE     WHEN (SUBSTRING(@OriginalText COLLATE Latin1_General_CI_AI,Tally.N,1)) like '[0-Z]'     THEN SUBSTRING(@OriginalText,Tally.N,1)     ELSE '' ENDFROM(  SELECT Number N FROM master..spt_values WHERE Type = 'P') Tallywhere  Tally.N &amp;lt;= LEN(@OriginalText)[/code]If you really want to remove any character but 0-9, a-z, A-Z compare the Ascii-codes. Also note that in a binary collation ':' and ';' fall between [0-Z].Peter</description><pubDate>Tue, 23 Mar 2010 13:37:30 GMT</pubDate><dc:creator>Peter Brinkhaus</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>Maybe you could make the CASE statement of Lowells nice function a little shorter:[code="sql"]CASE   WHEN (SUBSTRING(@OriginalText,Tally.N,1)) like '[0-Z]'   THEN SUBSTRING(@OriginalText,Tally.N,1)   ELSE '' END[/code]It seems like it'll return the same value.</description><pubDate>Tue, 23 Mar 2010 12:52:06 GMT</pubDate><dc:creator>LutzM</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>That's exactly what i was looking for. Thanks Lowel. I knew there was a way via ascii but hadn't done it before. This ones going in the tool box.Again thank you!</description><pubDate>Tue, 23 Mar 2010 11:02:47 GMT</pubDate><dc:creator>BaldingLoopMan</dc:creator></item><item><title>RE: how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>i think this will do what you are after; this leaves 0-9,A-Z,a-z and strips out everything else...commas,punctuation, hi ascii, etc.[code]select dbo.StripNonAlphaNumeric(address1) from #addr--results53240thaveneapt1041000sectorwaystapt111ac3333summerpointplacepobox22000a3300425566firststettnlot204aapt2c[/code][code]select ascii('a'),ascii('z'),ascii('A'),ascii('Z')GOCREATE FUNCTION StripNonAlphaNumeric(@OriginalText VARCHAR(8000))RETURNS VARCHAR(8000)  BEGIN DECLARE @CleanedText VARCHAR(8000) ;WITH tally (N) as(SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id) FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2)SELECT @CleanedText = ISNULL(@CleanedText,'') +  CASE   --ascii numbers are 48(for '0') thru 57 (for '9')  WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND  57    THEN SUBSTRING(@OriginalText,Tally.N,1)   --ascii upper case letters A-Z is 65 thru 90  WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 65 AND  90    THEN SUBSTRING(@OriginalText,Tally.N,1)   --ascii lower case letters a-z is 97 thru 122  WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 97 AND  122    THEN SUBSTRING(@OriginalText,Tally.N,1)   ELSE '' END      FROM tally           WHERE Tally.N &amp;lt;= LEN(@OriginalText)                            RETURN @CleanedText END[/code]</description><pubDate>Tue, 23 Mar 2010 10:03:36 GMT</pubDate><dc:creator>Lowell</dc:creator></item><item><title>how to strip out all non alpha numeric characters?</title><link>http://www.sqlservercentral.com/Forums/Topic888194-392-1.aspx</link><description>--drop table #addrcreate table #addr( id int identity  (1,1), address1 varchar(200))insert into #addrselect '532 40th ave. n.e. apt)104'union all select '1000 sector way st. apt)111-ac))'union all select '3333 summer-point place'union all select 'p.o.box 22000a-330042'union all select '5566 first stett n. lot.204-a apt)2-c'--should return only alpha numeric chars.'53240thaveneapt104''1000sectorwaystapt111ac''3333summerpointplace''pobox22000a330042''5566firststettnlot204aapt2c'</description><pubDate>Tue, 23 Mar 2010 08:59:13 GMT</pubDate><dc:creator>BaldingLoopMan</dc:creator></item></channel></rss>
