|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, September 22, 2011 1:51 PM
Points: 276,
Visits: 400
|
|
--drop table #addr create table #addr ( id int identity (1,1) , address1 varchar(200) )
insert into #addr select '532 40th ave. n.e. apt)104' union all select '1000 sector way st. apt)111-ac))' union all select '3333 summer-point place' union all select 'p.o.box 22000a-330042' union all select '5566 first stett n. lot.204-a apt)2-c'
--should return only alpha numeric chars. '53240thaveneapt104' '1000sectorwaystapt111ac' '3333summerpointplace' 'pobox22000a330042' '5566firststettnlot204aapt2c'
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Today @ 5:47 PM
Points: 8,150,
Visits: 20,892
|
|
i think this will do what you are after; this leaves 0-9,A-Z,a-z and strips out everything else...commas,punctuation, hi ascii, etc.
select dbo.StripNonAlphaNumeric(address1) from #addr --results 53240thaveneapt104 1000sectorwaystapt111ac 3333summerpointplace pobox22000a330042 5566firststettnlot204aapt2c
select ascii('a'),ascii('z'),ascii('A'),ascii('Z') GO CREATE FUNCTION StripNonAlphaNumeric(@OriginalText VARCHAR(8000)) RETURNS VARCHAR(8000) BEGIN DECLARE @CleanedText VARCHAR(8000) ;WITH tally (N) as (SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id) FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2) SELECT @CleanedText = ISNULL(@CleanedText,'') + CASE --ascii numbers are 48(for '0') thru 57 (for '9') WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57 THEN SUBSTRING(@OriginalText,Tally.N,1) --ascii upper case letters A-Z is 65 thru 90 WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 65 AND 90 THEN SUBSTRING(@OriginalText,Tally.N,1) --ascii lower case letters a-z is 97 thru 122 WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 97 AND 122 THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END FROM tally WHERE Tally.N <= LEN(@OriginalText) RETURN @CleanedText END
Lowell
--There is no spoon, and there's no default ORDER BY in sql server either. SQL Forums are unique among other technical forums for getting exact solutions.. If you provided a CREATE TABLE... and INSERT INTO... scripts, volunteers here can give you a tested, working solution to your questions. Help us help you and provide that and a description of what you are trying to accomplish!
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Thursday, September 22, 2011 1:51 PM
Points: 276,
Visits: 400
|
|
That's exactly what i was looking for. Thanks Lowel. I knew there was a way via ascii but hadn't done it before. This ones going in the tool box.
Again thank you!
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Today @ 3:59 PM
Points: 6,522,
Visits: 11,526
|
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Yesterday @ 9:26 AM
Points: 1,295,
Visits: 4,815
|
|
lmu92 (3/23/2010)
Maybe you could make the CASE statement of Lowells nice function a little shorter: CASE WHEN (SUBSTRING(@OriginalText,Tally.N,1)) like '[0-Z]' THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END It seems like it'll return the same value.
It seems, but it really depends on the collation being used. Check out the following script:
DECLARE @OriginalText VARCHAR(MAX) = '0;:ëêà^AaZz' SELECT CASE WHEN (SUBSTRING(@OriginalText COLLATE Latin1_General_BIN2,Tally.N,1)) like '[0-Z]' THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END, CASE WHEN (SUBSTRING(@OriginalText COLLATE Latin1_General_CI_AS,Tally.N,1)) like '[0-Z]' THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END, CASE WHEN (SUBSTRING(@OriginalText COLLATE Latin1_General_CI_AI,Tally.N,1)) like '[0-Z]' THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END FROM ( SELECT Number N FROM master..spt_values WHERE Type = 'P' ) Tally where Tally.N <= LEN(@OriginalText)
If you really want to remove any character but 0-9, a-z, A-Z compare the Ascii-codes. Also note that in a binary collation ':' and ';' fall between [0-Z].
Peter
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Today @ 3:59 PM
Points: 6,522,
Visits: 11,526
|
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Yesterday @ 9:26 AM
Points: 1,295,
Visits: 4,815
|
|
lmu92 (3/23/2010)
Thanx for clarification, Peter! I knew there must have been a reason Lowell provided a method that seemed to leave room for improvement...  Well, you were close. I think this one will do the job too
CASE WHEN (SUBSTRING(@OriginalText COLLATE Latin1_General_BIN2,Tally.N,1)) like '[0-9a-zA-Z]' THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END,
BTW, where did you get [0-Z] from. I was surprised to see that ':' or ';' do not fall within that range when using a collation like Latin1_General_CI_Ax. I scanned BOL, but I can not find anything about this pattern.
Peter
|
|
|
|
|
Hall of Fame
       
Group: General Forum Members
Last Login: Monday, January 23, 2012 11:01 AM
Points: 3,539,
Visits: 5,118
|
|
You may also want to file away this version, which uses an inline table valued function. It performs significantly faster if you have to clean up a large number of rows in a table with one pass. I modified the function above to get rid of anything except 0-9 and tested it against an existing function that does the same thing.
CREATE FUNCTION dbo.itvf_NumbersOnly ( @Input Varchar(max) ) RETURNS TABLE AS RETURN ( WITH data as (select N,substring(@input,N,1) as element from dbo.tally where substring(@input,N,1) between '0' and '9' -- alter to include only characters needed and N <= len(@input) )
Select(Select ''+element FROM data ORDER BY N FOR XML PATH('') ) as [NumberString] ) GO
Also, I should note that CLR is said to be significantly faster at string manipulation, but we don't do CLR here yet, so I can't test. There was a huge thread on the subject here. There may be a variation in performance depending on the size of the strings to be cleaned and the number of characters to be replaced.
--------------- -- code to test --------------- create table #test (rowID int identity primary key, data varchar(max)) insert into #test select top 100000 newID() from dbo.tally
select top 10 * from #test
set statistics io on; set statistics time on;
select rowid,Data,NumberString into #test2 from #test cross apply dbo.itvf_NumbersOnly(data)
set statistics time off; set statistics io off;
print '----------------------------------------------------------------' set statistics io on; set statistics time on;
select rowid,data,dbo.StripNonNumeric(data) as NumberString into #test3 from #test
set statistics time off; set statistics io off;
select top 10 * from #test2 order by rowid select top 10 * from #test3 order by rowid
drop table #test drop table #test2 drop table #test3
__________________________________________________
That guy's about as useless as a nvarchar(1). -- Original Against stupidity the gods themselves contend in vain. -- Friedrich Schiller Whenever possible, I prefer to eat the rude. -- Hannibal Lecter
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Yesterday @ 9:26 AM
Points: 1,295,
Visits: 4,815
|
|
...substring(@input,N,1) between '0' and '9' ...
Still the same mistake. This will match characters like '³' and '²'.
|
|
|
|
|
Hall of Fame
       
Group: General Forum Members
Last Login: Monday, January 23, 2012 11:01 AM
Points: 3,539,
Visits: 5,118
|
|
I see what you mean, but my point wasn't the selection/omission criteria in the where clause. In fact, I learned something from the earlier posts and made a mental note to be aware of potential differences in collation.
The point is the slow performance of a user-defined function compared to an inline table-valued function. At 100,000 rows I was seeing a difference of seconds versus minutes. Take that for what its worth. 
__________________________________________________
That guy's about as useless as a nvarchar(1). -- Original Against stupidity the gods themselves contend in vain. -- Friedrich Schiller Whenever possible, I prefer to eat the rude. -- Hannibal Lecter
|
|
|
|