Urgent! - How to extract number from a string

  • Hi,

    I try to extract number from a string, is there a SQL Server function to do that? If not, how to get such function?

    Thanks a lot

  • I'm sure you will find as many methods to do this as there are developers/DBAs, but this should work for you.  Feel free to put this into a UDF for ease of use.

     

    DECLARE @string varchar(100)

    SET @string = 'sk123;fja4567afj;as890'

    WHILE PATINDEX('%[^0-9]%',@string) <> 0

        SET @string = STUFF(@string,PATINDEX('%[^0-9]%',@string),1,'')

    SELECT @string

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Actually in my case I need to select street number from a address string, which means if the string is '1234 dummy789 road', I only want to get '1234', not '1234789'.

    Thanks

  • OK, so can you define in English exactly what you want?  Are you looking for all numeric characters prior to the first blank character ' '?  Or all numeric characters prior to the first non-numeric?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • all numeric characters prior to the first blank character

  • DECLARE @string varchar(100)

    SET @string = '12345 Test Address Dr.'

    SELECT LEFT(@String,CHARINDEX(' ', @string))

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Here it is with the PATINDEX used also:

    DECLARE @string varchar(100)

    SET @string = '12345 Test Address Dr.'

    SELECT LEFT(@String,CHARINDEX(' ', @string)),

        LEFT(@string,PATINDEX('%[^0-9]%',@string))

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Not sure if this process could ever by 100% accurate, there will always be some clever dick who sticks random characters and numbers into a house name for example, there are also going to be addresses where the house number is prefixed with "Number" or "No." or a flat with the number 12A etc.

    The following should find the first occurrence of a number in a string, although there is never going to be a guarantee that this is the house number.

     

    DECLARE @string varchar(100),

     @start int,

     @end int,

     @len int

    SET @string = 'the hollies 12345 Test Address Dr.'

    set @string = replace(@string, ' ' , '')

    set @len = len(@string)

    set @start =  PATINDEX('%[0-9]%',@string)

    set @end =  PATINDEX('%[^0-9]%',substring(@string, @start, @len))-1

    print substring(@string, @start, @end)

  • Fishbarnriots has a good point. (what is with that nickname?)

    To get close to 100%, I'd actually go through a cleaning process. See if you can wipe out stuff before the house number. You may find some other cases and I'd actually recommend extracting this into a temp or working table (or new column) and running a few through a few passes to get it down to just the house number.

    If you are using this for some other process regularly, I'd actually see if you could add a new column to store this.

  • I was just reading thru this thread and I have a similar issue. I'm trying to extract the number string from a StarRating of Hotels column. For example the column currently has

    StarRating:

    0

    1

    1.5

    NA

    2

    2 STAR

    3.00

    4-STAR

    4

    NULL

    What I would like to see as the results in this column is:

    0

    1

    1.5

    0

    2

    2

    0

    3.00

    4

    4

    0

    And if possible replace the blank,NA and NULL with 0.

    Could someone please assist.

  • I've been beating the heck out of this drum lately - but an actual regular expression will help you find all of the various issues you're looking at here.

    If you're using 2000, someone has been kind enough to create REGEX support in XP's which can then be added to your server. If you're on 2005 - build a CLR function (I've posted 4 versions in the last 2-3 days), which will definitively pick up these various issues patterns you're looking for.

    Check out the scripts in here:

    http://www.sqlservercentral.com/Forums/Topic419472-65-1.aspx

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt is correct... he's beat the snare right off that drum... and guess what? He's right! Matt and I have done a huge amout of testing and using a RegEx "xp" in SQL Server 2000 is definitely a great way to go...

    On the off chance that you have a DBA that refuses to allow a "non-MS XP", there's a fairly easy way to do it... the basis of the method is covered in the same thread that Matt sited. But, for everyone's convenience, here it is...

    First, you need a Tally table... it's nothing more than a table with a single column of well indexed sequential numbers... you can make it as a Temp Table or, better yet, add it to your database as a permanent table. It's got a lot of uses so I recommend the latter. Here's how to make one...

    --===== Create and populate the Tally table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    Next, you'll need a function... here's a pretty useful generic function that you can "program" to return your desired result...

    CREATE FUNCTION dbo.fnLeaveOnly

    /***********************************************************************************

    Purpose:

    This function accepts a string, the "LIKE" pattern that all characters must match

    to be "kept", and a value to return if a NULL is the result of the function.

    -- Jeff Moden

    ***********************************************************************************/

    --===== Define the I/O Parameters

    (

    @String VARCHAR(8000), --String to be cleaned

    @CharPattern VARCHAR(100), --Pattern a character must meet to keep

    @NullValue VARCHAR(100) --Return this if a NULL is the result

    )

    RETURNS VARCHAR(8000) --The "cleaned" string

    AS

    BEGIN

    --===== Declare the return variable

    DECLARE @Return VARCHAR(8000)

    --===== Clean the string leaving only what's in the character pattern

    SELECT @Return = ISNULL(@Return,'')+SUBSTRING(@String,N,1)

    FROM dbo.Tally

    WHERE N <= LEN(@String)

    AND SUBSTRING(@String,N,1) LIKE @CharPattern

    --===== Return the "Cleaned" string substituting the null value if result is null

    RETURN ISNULL(@Return,@NullValue)

    END

    Now, let's setup a test with all the values you stated and see what happens... do notice the "pattern" used...

    DECLARE @yourtable TABLE (StarRating VARCHAR(100))

    INSERT INTO @yourtable (StarRating)

    SELECT '0' UNION ALL

    SELECT '1' UNION ALL

    SELECT '1.5' UNION ALL

    SELECT 'NA' UNION ALL

    SELECT '2' UNION ALL

    SELECT '2 STAR' UNION ALL

    SELECT ' ' UNION ALL

    SELECT '3.00' UNION ALL

    SELECT '4-STAR' UNION ALL

    SELECT '4' UNION ALL

    SELECT NULL UNION ALL

    SELECT '~`!@#$%^&*()-_=+\|[{]};:''", /?a1_b2C3$'

    SELECT StarRating AS Original,

    dbo.fnLeaveOnly(StarRating,'[0-9.]',0)

    FROM @yourtable

    Result:

    (12 row(s) affected)

    Original Cleaned

    ----------------------------------------- ----------

    0 0

    1 1

    1.5 1.5

    NA 0

    2 2

    2 STAR 2

    0

    3.00 3.00

    4-STAR 4

    4 4

    NULL 0

    ~`!@#$%^&*()-_=+\|[{]};:'", /?a1_b2C3$ .123

    (12 row(s) affected)

    Hope that helps...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you.

  • Thats beautiful work guys!

    What if the record has the string '4-Stars out of 5'. Woud this return 45?

    I was thinking about creating a function which loops through the string one character at a time and does some math with the results.

    1. You would need a try catch block to error trap and then exit out.

    2. You would have to store the previous result to use when an error is found so you could roll back to it.

    3. You may have to check for sequences such as 10E which might get interpreted as Scientific Notation.

  • Heh... Yes... it would return "45" on "4 out of 5 stars"... it wasn't designed to make "descisions"... current function was only designed to return characters that meet a pattern.

    I agree... you could add functionality to return numbers as separate numbers in a table variable if they are separated by any non-numeric characters. You could even make it find things like "10R3" and "3D5" and do other ISNUMERIC extractions. And, you can do it all without an explicit loop as this function does or, you can use a loop.

    Just remember, the more functionality you add, the slower the function will be... better to write a more specific function to meet your expected needs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply