Replace non numeric characters in string

  • Hi,

    I know the following code will replace single defined character (*)

    REPLACE(Phone, '*', '')

    but I need to replace (remove) any non-numeric character found within a string. Can anyone help please?

    Thanks in advance,

    Neal

  • USE [Common]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER function [dbo].[AlphaRemove]

    (@String_in varchar(max))

    returns varchar(max)

    as

    begin

    /*

    *****Takes a string variable and turns it into a set of

    *****numbers separated by spaces.

    *****Despite the name, it also removes punctuation, not

    *****just letters.

    *****Input string must be simple ASCII, not Unicode.

    *****(No accented letters, etc.)

    */

    declare @sub char(1)

    --Letters

    while patindex('%[a-z]%', @string_in) > 0

    begin

    set @sub = substring(@string_in, patindex('%[a-z]%', @string_in), 1)

    set @string_in = replace(@string_in, @sub, '')

    end

    --Punctuation

    while patindex('%[!-)]%', @string_in) > 0

    begin

    set @sub = substring(@string_in, patindex('%[!-/]%', @string_in), 1)

    set @string_in = replace(@string_in, @sub, '')

    end

    while patindex('%[+-/]%', @string_in) > 0

    begin

    set @sub = substring(@string_in, patindex('%[!-/]%', @string_in), 1)

    set @string_in = replace(@string_in, @sub, '')

    end

    while patindex('%[:-=]%', @string_in) > 0

    begin

    set @sub = substring(@string_in, patindex('%[:-@]%', @string_in), 1)

    set @string_in = replace(@string_in, @sub, '')

    end

    while patindex('%[?-@]%', @string_in) > 0

    begin

    set @sub = substring(@string_in, patindex('%[:-@]%', @string_in), 1)

    set @string_in = replace(@string_in, @sub, '')

    end

    set @string_in = replace(@string_in, '[', '')

    while patindex('%[\-`]%', @string_in) > 0

    begin

    set @sub = substring(@string_in, patindex('%[\-`]%', @string_in), 1)

    set @string_in = replace(@string_in, @sub, '')

    end

    while patindex('%[{-~]%', @string_in) > 0

    begin

    set @sub = substring(@string_in, patindex('%[{-~]%', @string_in), 1)

    set @string_in = replace(@string_in, @sub, '')

    end

    while charindex(' ', @string_in, 0) > 0

    set @string_in = replace(@string_in, ' ', ' ')

    return @string_in

    end

    It's pretty RBAR, but it does work. One of these days, I'll rewrite it to be set-based. (Originally wrote this thing years ago. Was clueless on SQL.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Change the final While loop to get rid of any spaces (right now, it gets rid of double-spaces), and it should do what you need.

    If you have extended characters, you'll need to add the ranges for those to the thing.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I haven't tested this enough to be sure, but it should work:

    create function dbo.NumberClean

    (@String_in nvarchar(100))

    returns bigint

    as

    begin

    declare @NumberStr nvarchar(100), @Number_out bigint

    ;with Chars (Seq, Chr) as

    (select number, substring(@string_in, number, 1)

    from dbo.Numbers

    where number between 0 and len(@string_in))

    select @numberstr = coalesce(@numberstr + chr, chr)

    from chars

    where chr like '[0-9]'

    select @number_out = @numberstr

    return @number_out

    end

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • You can also use regular expression functions.

    regular expressions function

    CREATE FUNCTION dbo.regexReplace

    (

    @source varchar(5000),

    @regexp varchar(1000),

    @replace varchar(1000),

    @globalReplace bit = 0,

    @ignoreCase bit = 0

    )

    RETURNS varchar(1000) AS

    BEGIN

    DECLARE @hr integer

    DECLARE @objRegExp integer

    DECLARE @result varchar(5000)

    EXECUTE @hr = sp_OACreate 'VBScript.RegExp', @objRegExp OUTPUT

    IF @hr <> 0 BEGIN

    EXEC @hr = sp_OADestroy @objRegExp

    RETURN NULL

    END

    EXECUTE @hr = sp_OASetProperty @objRegExp, 'Pattern', @regexp

    IF @hr <> 0 BEGIN

    EXEC @hr = sp_OADestroy @objRegExp

    RETURN NULL

    END

    EXECUTE @hr = sp_OASetProperty @objRegExp, 'Global', @globalReplace

    IF @hr <> 0 BEGIN

    EXEC @hr = sp_OADestroy @objRegExp

    RETURN NULL

    END

    EXECUTE @hr = sp_OASetProperty @objRegExp, 'IgnoreCase', @ignoreCase

    IF @hr <> 0 BEGIN

    EXEC @hr = sp_OADestroy @objRegExp

    RETURN NULL

    END

    EXECUTE @hr = sp_OAMethod @objRegExp, 'Replace', @result OUTPUT, @source, @replace

    IF @hr <> 0 BEGIN

    EXEC @hr = sp_OADestroy @objRegExp

    RETURN NULL

    END

    EXECUTE @hr = sp_OADestroy @objRegExp

    IF @hr <> 0 BEGIN

    RETURN NULL

    END

    RETURN @result

    END

    GO

    the code to get the result:

    select dbo.regexReplace(MyColumn, '[^0-9]', '', 1, 1 )

    from MyTable

    Link with a list of regular expression functions:

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=27205

  • How about this?

    declare @pos smallint

    declare @string varchar(100)

    set @string = '1109A><":{$%^&*4DSE2@!~$%^&567KJHGT'

    while isnumeric(@string+'e0') = 0

    begin

    set @pos = (select patindex('%[^0-9]%',@string))

    set @string = (select replace(@string,substring(@string,@pos,1),''))

    end

    select @string

    there is a reson to add 'e0' to the string. check this out,

    select IsNumeric('1002e0')

    in this case IsNumeric returns 1. So in the above example @String never get replaced with ''. Adding 'e0' again would resolve that.

    read more about this:

    http://www.tek-tips.com/faqs.cfm?fid=6423

  • GSquared (3/17/2008)


    I haven't tested this enough to be sure, but it should work:

    Absolutely the right idea with the numbers table, Gus 🙂 but you don't need the CTE or a derived table.

    CREATE FUNCTION dbo.fnDigitsOnly (@pString VARCHAR(8000))

    -- Tally table can be found at http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/

    RETURNS VARCHAR(8000) AS

    BEGIN

    DECLARE @CleanString VARCHAR(8000)

    SELECT @CleanString = ISNULL(@CleanString,'')+SUBSTRING(@pString,N,1)

    FROM dbo.Tally WITH (NOLOCK)

    WHERE N<=LEN(@pString)

    AND SUBSTRING(@pString,N,1) LIKE ('[0-9]')

    RETURN @CleanString

    END

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks to all concerned.

    Sorry I'm late getting back, but I'll try them out now.

    Very much appreciated!

  • Thanks Jeff. That's definitely better. (As mentioned, mine was thrown together and tested against one case.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Sorry Gus... I actually didn't mean it that way (better)... I just meant it's different and you don't need the CTE if you don't want it. They both run pretty darned fast compared to the While loop methods.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/18/2008)


    Sorry Gus... I actually didn't mean it that way (better)... I just meant it's different and you don't need the CTE if you don't want it. They both run pretty darned fast compared to the While loop methods.

    Whether that's what you meant or not, when I said, "better", I meant "better". Here's why:

    alter FUNCTION dbo.NumberClean1 (@pString VARCHAR(8000))

    -- Tally table can be found at http://www.sqlservercentral.com/scripts/Advanced+SQL/62486/

    RETURNS VARCHAR(8000) AS

    BEGIN

    DECLARE @CleanString VARCHAR(8000)

    SELECT @CleanString = ISNULL(@CleanString,'')+SUBSTRING(@pString,Number,1)

    FROM dbo.Numbers WITH (NOLOCK)

    WHERE Number<=LEN(@pString)

    AND SUBSTRING(@pString,Number,1) LIKE ('[0-9]')

    RETURN @CleanString

    END;

    go

    create function dbo.NumberClean2

    (@String_in nvarchar(100))

    returns bigint

    as

    begin

    declare @NumberStr nvarchar(100), @Number_out bigint

    ;with Chars (Seq, Chr) as

    (select number, substring(@string_in, number, 1)

    from dbo.Numbers

    where number between 0 and len(@string_in))

    select @numberstr = coalesce(@numberstr + chr, chr)

    from chars

    where chr like '[0-9]'

    select @number_out = @numberstr

    return @number_out

    end

    Creates both functions. I did this on my test box in my ProofOfConcept database.

    Here's the test set-up:

    create table NumberClean (

    Number bigint,

    Clean varchar(100))

    go

    set nocount on

    insert into dbo.numberclean(number, clean)

    select 100, '100 ' union all

    select 5, 'number 5' union all

    select 15, '643' union all

    select 1234567890123456, '1234567890123456 pounds'

    go 1000000

    The idea is the Number column is what it should return, based on the string in the Clean column. Each row was repeated 1-million times, giving 4-million rows of data. I deliberately included one where the "expected result" (15) was not at all what should be returned (643), since I strong believe that any test that doesn't check for wrong results from bad input is an incomplete test.

    The test:

    set statistics io on

    set statistics time on

    select count(*)

    from dbo.numberclean

    where dbo.numberclean1(clean) != number

    Deliberately forcing a RBAR situation, to stress-test the function at its worst. Again, this is intentional. The second test was identical, just changed the 1 to a 2 at the end of the function name.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • (Had to split this post in half to avoid an error on the boards. Not sure why.)

    Results:

    ===================================================

    Proc 1 Run 1

    ===================================================

    SQL Server parse and compile time:

    CPU time = 1266 ms, elapsed time = 1445 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    (1 row(s) affected)

    Table 'NumberClean'. Scan count 1, logical reads 15626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 620422 ms, elapsed time = 1278261 ms.

    ===================================================

    Proc 1 Run 2

    ===================================================

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    (1 row(s) affected)

    Table 'NumberClean'. Scan count 1, logical reads 15626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 619547 ms, elapsed time = 1261285 ms.

    ===================================================

    Proc 2 Run 1

    ===================================================

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 6 ms.

    (1 row(s) affected)

    Table 'NumberClean'. Scan count 1, logical reads 15626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 748844 ms, elapsed time = 1552001 ms.

    ===================================================

    Proc 2 Run 2

    ===================================================

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 6 ms.

    (1 row(s) affected)

    Table 'NumberClean'. Scan count 1, logical reads 15626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 757657 ms, elapsed time = 1548412 ms.

    ===================================================

    As you can see, the IO stats are the same for all four runs, but the second function (my version with the CTE) takes a little more than 10% longer in CPU time. Yours is, indeed, slightly better.

    The difference is minor, but definite.

    So, when I said it was "better", don't appologize, just realize I mean it. 🙂

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • For my next test, to get rid of some of the RBAR (though not eliminating it completely), I added an ID column to the table, and then ran:

    ;with

    CTE1 (ID, Position, Val) as

    (select id, numbers.number, substring(reverse(rtrim(clean)), numbers.number, 1)

    from dbo.numbercleanb

    inner join dbo.numbers

    on numbers.number between 1 and len(clean)),

    CTE2 (ID, Pos, Val) as

    (select id, row_number() over (partition by id order by position), val

    from cte1

    where val like '[0-9]'),

    CTE3 (ID, Number) as

    (select id, sum(power(cast(10 as bigint), cast(pos-1 as bigint)) * cast(val as bigint))

    from cte2

    group by id)

    select count(*)

    from cte3

    inner join dbo.numbercleanb

    on cte3.id = numbercleanb.id

    and cte3.number != numbercleanb.number

    And got:

    ================================

    Script 3 Run 1

    ================================

    SQL Server parse and compile time:

    CPU time = 32 ms, elapsed time = 32 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    (1 row(s) affected)

    Table 'Numbers'. Scan count 4000014, logical reads 12500078, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'NumberCleanB'. Scan count 3, logical reads 12018874, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 174547 ms, elapsed time = 136324 ms.

    ================================

    Script 3 Run 2

    ================================

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    (1 row(s) affected)

    Table 'Numbers'. Scan count 4000014, logical reads 12500078, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'NumberCleanB'. Scan count 3, logical reads 12018874, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 175969 ms, elapsed time = 138154 ms.

    =====================================

    As you can see, a LOT more IO, but it also finishes in a little over 2 minutes instead of a little over 21 minutes (which was the better of the two inline functions; the other inline function took over 25 minutes). The CPU time is approximately 1/4th of the better of the two inline functions.

    Assuming the Numbers table is in memory, even that many scans and reads shouldn't hurt too much.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • don't MAKE me whip out a CLR UDF for regex....:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Heh... bring it on... you and I have both been bragging about how your Regex stuff is the only CLR that we've seen so far that could actually beat T-SQL solutions. I think this is a great place to prove it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 81 total)

You must be logged in to reply to this topic. Login to reply