Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Stripping out all non-numerical characters Expand / Collapse
Author
Message
Posted Tuesday, December 18, 2012 4:22 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, February 20, 2014 3:23 AM
Points: 50, Visits: 126

SELECT accountid,New_Column = REPLACE(telephone1,SUBSTRING(telephone1,PATINDEX('[^0-9]',telephone1),1),'')
INTO #YourNewResults
FROM #TelephoneTable

I have the script above which I thought would remove all non-numerical characters from the field but this not the case?
I have some fields with numbers+characters and some with characters only regardless of the combination I want all characters removed.

I have seen there are a lot of functions is this the best way? If so where would the function be applied in the script?
Post #1397685
Posted Tuesday, December 18, 2012 6:48 AM
Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Monday, April 07, 2014 1:34 AM
Points: 702, Visits: 2,163
Hi,

The best way in my opinion would be to use Regex and some CLR functions to replace the characters. Read this article:

http://www.simple-talk.com/sql/t-sql-programming/clr-assembly-regex-functions-for-sql-server-by-example/

Once you've got the functions installed then it's as simple as:

SELECT dbo.RegExReplace('ABC123ABC','[A-Z]','')





MCSA: SQL Server 2012
Follow me on Twitter: @WazzTheBadger
LinkedIn Profile: Simon Osborne
Post #1397768
Posted Tuesday, December 18, 2012 7:08 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:46 PM
Points: 35,959, Visits: 30,250
@Sachin,

I agree that the CLR Regex method should be the fastest although I've not seen any performance testing on it.

The questions I have for you is...

1. Can you use CLR or do you need a 100% T-SQL solution?
2. Is the maximum length of the data <= 8K bytes?


I no longer believe this. In the last 2 weeks, I've seen CLR Regex get its doors blown off by more than one example of some good, ol' fashioned T-SQL. CLR is still the fastest for splitting a string and it might be the fastest for cleaning a string, but CLR Regex probably won't be. Regex itself seems to have been over promised and under delivered. It appears that you pay in performance what you thought you gained in flexibility.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1397780
Posted Tuesday, December 18, 2012 7:16 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, February 20, 2014 3:23 AM
Points: 50, Visits: 126
1. I would prefer t-SQL as CLR is way beyond my understanding
2. At the moment I'm not concerned about performance
Post #1397783
Posted Tuesday, December 18, 2012 7:35 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:34 PM
Points: 15,442, Visits: 9,588
Here's one way to do it in T-SQL:

DECLARE @String VARCHAR(8000) = 'ABC12D34E56';

WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4)
SELECT (
SELECT SUBSTRING(@String, Number, 1)
FROM Numbers
WHERE Number <= LEN(@String)
AND SUBSTRING(@String, Number, 1) LIKE '[0-9]'
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'VARCHAR(8000)');

Alternatively, if you need to apply this to a table instead of a variable:

IF OBJECT_ID(N'tempdb..#T') IS NOT NULL 
DROP TABLE #T;

CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
Col1 VARCHAR(8000));

INSERT INTO #T
(Col1)
VALUES ('123A'),
('B1C2D3');

WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4)
SELECT *
FROM #T AS T
CROSS APPLY (SELECT (
SELECT SUBSTRING(Col1, Number, 1)
FROM Numbers
WHERE Number <= LEN(Col1)
AND SUBSTRING(Col1, Number, 1) LIKE '[0-9]'
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'VARCHAR(8000)') AS Stripped) AS Parser;

In either case, it uses a "Runtime Numbers Table" to parse the string into individual characters, then strips out anything other than digits 0-9, then uses For XML to put it all back together. (FOR XML PATH with a zero-length root indicated by (''), and no column name for the query, will concatenate strings together nicely. It's a documented trick that comes in very handy.)

Does that help?


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1397793
Posted Monday, December 24, 2012 6:39 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 6:03 PM
Points: 3,590, Visits: 5,098
This can also be done using the PatternSplitCM function described in the fourth article in my signature line (Splitting Strings Based on Patterns).

IF OBJECT_ID(N'tempdb..#T') IS NOT NULL 
DROP TABLE #T;

CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
Col1 VARCHAR(8000));

INSERT INTO #T
(Col1)
VALUES ('123A'),
('B1C2D3');

;WITH CTE AS (
SELECT *
FROM #T a
CROSS APPLY PatternSplitCM(a.Col1, '[0-9]')
WHERE [Matched] = 1)
SELECT ID, Col1=MAX(Col1), Col2=(
SELECT '' + Item
FROM CTE b
WHERE a.ID = b.ID
ORDER BY ItemNumber
FOR XML PATH(''))
FROM CTE a
GROUP BY ID

IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T;


In fact the inspiration for that article (described therein) was a forum-posted question that was quite similar to this one.

One caveat though. Since you're working in SQL 2005, you'll need to replace the numbers table with a Ben-Gan style Tally table something like this one (from the PatternSplitQU FUNCTION also in that article):

    WITH Nbrs_3(n) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1)
,Nbrs_2 (n) AS (SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2)
,Nbrs_1 (n) AS (SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2)
,Nbrs_0 (n) AS (SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2)
,Tally (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY n) As n FROM Nbrs_0)





My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1400011
Posted Monday, December 24, 2012 6:46 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 6:03 PM
Points: 3,590, Visits: 5,098
GSquared (12/18/2012)
Here's one way to do it in T-SQL:

DECLARE @String VARCHAR(8000) = 'ABC12D34E56';

WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4)
SELECT (
SELECT SUBSTRING(@String, Number, 1)
FROM Numbers
WHERE Number <= LEN(@String)
AND SUBSTRING(@String, Number, 1) LIKE '[0-9]'
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'VARCHAR(8000)');

Alternatively, if you need to apply this to a table instead of a variable:

IF OBJECT_ID(N'tempdb..#T') IS NOT NULL 
DROP TABLE #T;

CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
Col1 VARCHAR(8000));

INSERT INTO #T
(Col1)
VALUES ('123A'),
('B1C2D3');

WITH Seeds(Seed)
AS (SELECT *
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS V (C)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S1.Seed)
FROM Seeds AS S1
CROSS JOIN Seeds AS S2
CROSS JOIN Seeds AS S3
CROSS JOIN Seeds AS S4)
SELECT *
FROM #T AS T
CROSS APPLY (SELECT (
SELECT SUBSTRING(Col1, Number, 1)
FROM Numbers
WHERE Number <= LEN(Col1)
AND SUBSTRING(Col1, Number, 1) LIKE '[0-9]'
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'VARCHAR(8000)') AS Stripped) AS Parser;

In either case, it uses a "Runtime Numbers Table" to parse the string into individual characters, then strips out anything other than digits 0-9, then uses For XML to put it all back together. (FOR XML PATH with a zero-length root indicated by (''), and no column name for the query, will concatenate strings together nicely. It's a documented trick that comes in very handy.)

Does that help?


Gus - I think the OP may have a problem with your Tally table (Seeds CTE) for the same reason I noted above.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1400012
Posted Tuesday, December 25, 2012 8:58 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:46 PM
Points: 35,959, Visits: 30,250
Jeff Moden (12/18/2012)
@Sachin,

I agree that the CLR Regex method should be the fastest although I've not seen any performance testing on it.

The questions I have for you is...

1. Can you use CLR or do you need a 100% T-SQL solution?
2. Is the maximum length of the data <= 8K bytes?


Actually, I have to take that back. CLR would be the fastest method but only if you DON"T use Regex. I've recently seen many performance tests and either some dedicated CLR or some dedicated T-SQL will usually smoke RegEx according to the tests I've seen.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1400131
Posted Tuesday, December 25, 2012 9:01 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:46 PM
Points: 35,959, Visits: 30,250
Sachin 80451 (12/18/2012)
1. I would prefer t-SQL as CLR is way beyond my understanding
2. At the moment I'm not concerned about performance


I whole heartedly agree with #1 above. I never agree with #2 above because they always come back about the performance problem that a given solution is having. The only thing you should be more concerned about than performance is accuracy and the two should be virtually tied as being the most important. If you think not, spend an hour or two looking at the thousands of performance issues people are asking about on this forum alone.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1400132
Posted Tuesday, December 25, 2012 9:08 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 6:03 PM
Points: 3,590, Visits: 5,098
Jeff Moden (12/25/2012)
Jeff Moden (12/18/2012)
@Sachin,

I agree that the CLR Regex method should be the fastest although I've not seen any performance testing on it.

The questions I have for you is...

1. Can you use CLR or do you need a 100% T-SQL solution?
2. Is the maximum length of the data <= 8K bytes?


Actually, I have to take that back. CLR would be the fastest method but only if you DON"T use Regex. I've recently seen many performance tests and either some dedicated CLR or some dedicated T-SQL will usually smoke RegEx according to the tests I've seen.


I've got your back Jeff.

Here's one example: http://www.sqlservercentral.com/Forums/Topic1390297-3122-5.aspx



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1400133
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse