Extract phone numbers from a string

  • Hello,

    I am looking to extract phone numbers with the following specifications, from a string:
    - 10 consecutive numbers, that can be separated by either space or '-'
    -  (3213) followed by minimum 6 numbers (that can also be separated by either space or '-')

    The issue is that the space or '-' can be anywhere in between those numbers.
    So it can be like: 'text text 123 45 - 6789101112 text'  and in this case I would need to identify '123 45 - 67891' as the phone number.
    Or it can be like: 'text text 1 2 3 4 5 - 6789101112 text'  and in this case I would need to identify '1 2 3 4 5 - 67891' as the phone number.

    So basically, I need to identify, in a string, a sequence of n numbers, that can be separated by specific characters, and the position and the number of the specific characters is unknown.

    Thank you so much for any help provided!


  • SELECT *,
        CASE WHEN string2 LIKE '%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%'
             THEN SUBSTRING(string2, PATINDEX('%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', string2), 10)
             ELSE NULL END AS phone#
    FROM ( VALUES('text text 123 45 - 6789101112 text'),('text text 1 2 3 4 5 - 6789101112 text') ) AS test(string)
    CROSS APPLY (
         SELECT REPLACE(REPLACE(string, ' ', ''), '-', '') AS string2
    ) AS calc1

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thank you so much for your answer!

    The problem is that when I return it I need to return with all the special characters in it. Like  '123 45 - 67891' or '1 2 3 4 5 - 67891'.

    Thank you again!

  • ralu_k_17 - Friday, April 27, 2018 10:48 AM

    Thank you so much for your answer!

    The problem is that when I return it I need to return with all the special characters in it. Like  '123 45 - 67891' or '1 2 3 4 5 - 67891'.

    Thank you again!

    Will the surrounding text have any numbers in it?

  • It is possible, yes, unfortunately!
    But if it's not 10 consecutive numbers (with/ without the space or '-') then it should not be returned.

    Also, there might be multiple phone numbers (I forgot to mention that) and they should all be returned.

  • ralu_k_17 - Friday, April 27, 2018 11:09 AM

    It is possible, yes, unfortunately!
    But if it's not 10 consecutive numbers (with/ without the space or '-') then it should not be returned.

    Also, there might be multiple phone numbers (I forgot to mention that) and they should all be returned.

    Then it being possible depends rather heavily on there being a consistent way to identify the start of a phone number, as well as the end of one.  Such a condition must be 100% consistent across all rows, or you just don't have a solvable problem unless it's a one-time event and doing more than just a query is both sufficient and practical....

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Yes, I understand!
    The way to identify the beginning and end, if I would go char by char in a string, would be: at the first occurrence of a number, a count of 10 consecutive numbers should start. The problem is with the characters space and '-'
    When identifying if it is a "phone number" the 10 consecutive numbers, can have any occurrence of a space or '-', and it should be ignored in a way, in order to know where the phone number stops.
    So, if I would have the string:  'text text 123 45 - 6789101112 text' 
    At '1' I would start counting and ignoring all the spaces and '-'. When I find 10 numbers, I stop.

    The issue is though, when I have to return it I have to return it in its "original" form, so with the space and '-' characters.
    So: '123 45 - 67891' 
    Or in case of: 'text text 1 2 3 4 5 - 6789101112 text' the returned text should be: '1 2 3 4 5 - 67891' -> 10 numbers, that allow space and '-', regardless of its position.
     Thank you!

  • I've been trying something like this:


    DECLARE @STR VARCHAR(MAX)
    DECLARE @validchars VARCHAR(MAX)

    SET @STR = 'text text 12 4 5 - 6789101112 text'
    SET @validchars = '[0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -][0-9 -]'
    DECLARE @idx INT
    SET @idx = PATINDEX('%'+ @validchars +'%',@str)
    IF @idx > 0 AND (@idx = LEN(@str)-9 OR PATINDEX(SUBSTRING(@str,@idx+10,1),'[0-9]')=0)
      SET @STR=SUBSTRING(@str ,PATINDEX('%'+ @validchars +'%',@str), 10)
    ELSE SET @STR = ''
    SELECT @STR

    But I cannot figure out how to return the 10 numbers out of this, not just the first 10 characters.
    So, what the code above is returning is:
    ' 12 4 5 - '
    And what I need is:
    '12 4 5 - 678910'

  • ralu_k_17 - Friday, April 27, 2018 11:55 AM

    Yes, I understand!
    The way to identify the beginning and end, if I would go char by char in a string, would be: at the first occurrence of a number, a count of 10 consecutive numbers should start. The problem is with the characters space and '-'
    When identifying if it is a "phone number" the 10 consecutive numbers, can have any occurrence of a space or '-', and it should be ignored in a way, in order to know where the phone number stops.
    So, if I would have the string:  'text text 123 45 - 6789101112 text' 
    At '1' I would start counting and ignoring all the spaces and '-'. When I find 10 numbers, I stop.

    The issue is though, when I have to return it I have to return it in its "original" form, so with the space and '-' characters.
    So: '123 45 - 67891' 
    Or in case of: 'text text 1 2 3 4 5 - 6789101112 text' the returned text should be: '1 2 3 4 5 - 67891' -> 10 numbers, that allow space and '-', regardless of its position.
     Thank you!

    Then tell me how to determine that within "text text 1 2 3 4 5 - 6789101112 text" there aren't at least 5 different phone numbers, just depending on where you start?   Conditions that are positionally dependent are not easily queried and might require a scalar function to process just to avoid a performance problem.   How many rows of these nightmares do you have?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • There will definitely be some letters in between the phone numbers.
    So, from this:
     'text text 123 45 - 6789101112 text text text 1 2 3 4 5 - 6789101112 text' 
    The first phone number would be:
     '123 45 - 67891' 
    and the second one would be:
     '1 2 3 4 5 - 67891' 

    From the first batch of numbers I will take the first 10 numbers (with space and '-' if it exists) and from the second batch of numbers the same. The batches will definitely be separated by letters.
    Also, I have to do this across a column in a table of datatype (varchar(max)) with absolutely no data constraints on it. So it will be plain text where anything could be added.

  • ralu_k_17 - Friday, April 27, 2018 12:16 PM

    There will definitely be some letters in between the phone numbers.
    So, from this:
     'text text 123 45 - 6789101112 text text text 1 2 3 4 5 - 6789101112 text' 
    The first phone number would be:
     '123 45 - 67891' 
    and the second one would be:
     '1 2 3 4 5 - 67891' 

    From the first batch of numbers I will take the first 10 numbers (with space and '-' if it exists) and from the second batch of numbers the same. The batches will definitely be separated by letters.
    Also, I have to do this across a column in a table of datatype (varchar(max)) with absolutely no data constraints on it. So it will be plain text where anything could be added.

    How many rows?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • It's a few thousand. But the query can run for a few minutes if needed. It doesn't need to work really fast.

  • ralu_k_17 - Friday, April 27, 2018 12:49 PM

    It's a few thousand. But the query can run for a few minutes if needed. It doesn't need to work really fast.

    If you can dump the data out and parse it somewhere else that's pretty simple in power shell and i'm sure you can do it pretty easily in any language with full regex support including making a CLR function if that's your thing or in SSIS.

    $string = "aaaaaa12345 67-- 8 90aaaaaa555----55 67-- 8 90dfkjsghkj888dfkjsh7777777787" ;
    $matches = ([regex]'[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9]').Matches($string);
    Write-Output $matches

  • Thank you so much for everyone's interest.

    In the end, I found a few more possible patterns and I used the following regex to manage to identify all of them:

    dbo.RegExReplace(
            dbo.RegExReplace(
            dbo.RegExReplace(
            dbo.RegExReplace(
            dbo.RegExReplace(
            dbo.RegExReplace(
                dbo.RegExReplace(
                    dbo.RegExReplace(
                        dbo.RegExReplace(
                            dbo.RegExReplace(@str,'\+4[0-9][ -]{0,1}[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9]\d{0-7}', '')
                        , '\(3213\)[ ]{0,1}[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9]*', '')
                    ,'\(342\)[ ]{0,1}[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9][ -]*[0-9]*', '')
                ,'00[0-9][0-9][ (0-9) ]{0,5}[0-9 -/]{5,14}', '')            
            ,'[0-9 -/[^\.]]{14}', '')
            ,'[0-9 -/[^\.]]{13}', '')
            ,'mobil[ ]{0,1}[0-9 -/]{9}', '')
            ,'mobil[ ]{0,1}[0-9 -/]{8}', '')
            ,'[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9]*', '')
            ,'[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9][ -/]*[0-9]', '')

    Sorry for the ugly code, but it worked in my case.

    Thanks again, everyone!

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply