Using PATINDEX to find all numeric values

  • cdun2

    SSCertifiable

    Points: 7893

    Hello,

    I forget. How can PATINDEX be used to find column values where the data is all numeric? I've tried, the following:

    select distinct acct from tbl_CYProcessedSales

    where PatIndex('%[0-9]%',Acct) > 0

    order by acct

    Acct is varchar(8). What am I doing wrong?

    Thank you for your help!

    CSDunn

  • Adam Haines

    SSC-Insane

    Points: 23197

    I would use ISNUMERIC

    select distinct acct

    from tbl_CYProcessedSales

    where ISNUMERIC(Acct) > 0

    order by acct

  • Jeff Moden

    SSC Guru

    Points: 994239

    Oh... be careful, now... IsNumeric cannot be equated to IsAllDigits. IsNumeric will allow currency symbols, $, comma, decimal point, spaces, tabs, and a host of other individual characters. I will also allow for combinations of charcters...

    SELECT ISNUMERIC('2d2'), ISNUMERIC('2e2')

    In fact, I wrote a pretty hefty article about this on another web site...

    ISNUMERIC is not “ALL DIGITS”Submitted by Jeff Moden, 03 Jun 2006

    All rights reserved.

    Introduction:

    There are many cases where you need to ensure that the string data you are working with includes only numeric digits. Most Developers will use the built in ISNUMERIC function to make such a check. Here’s why that’s a bad idea and what to do about it.

    What is ISNUMERIC?

    “Books OnLine” summarizes the description of the ISNUMERIC function as:

    “Determines whether an expression is a valid numeric type.”

    and that’s a 100% accurate description that leaves much to be desired. Just what is a “valid numeric type”? Reading further in BOL (short for “Books OnLine), we find additional information:

    “ISNUMERIC returns 1 when the input expression evaluates to a valid integer, floating point number, money or decimal type; otherwise it returns 0. A return value of 1 guarantees that expression can be converted to one of these numeric types.”

    Again, read the wording… “when the input expression evaluates to a valid integer”, etc, etc. And, that’s the catch. There are many different things that you may not expect that will evaluate to one of the data types listed in the description of ISNUMERIC and a lot of them are NOT the digits 0-9. ISNUMERIC will return a “1” for all of them.

    Let’s consider the most obvious… what will ISNUMERIC(‘-10’) return? What will ISNUMERIC(‘1,000’) return? And how about the not-so-obvious… what will ISNUMERIC('0d1234') or ISNUMERIC('13e20') return? There are many different combinations of letters, numbers, and symbols that can actually be converted to numeric data types and ISNUMERIC will return a “1” for all of them. It’s not a flaw… that’s the way it’s supposed to work!

    What IS Actually Considered “Numeric” by ISNUMERIC?

    This code will show all of the single characters that ISNUMERIC thinks of as “Numeric”…

    --===== Return all characters that ISNUMERIC thinks is numeric

    -- (uses values 0-255 from the undocumented spt_Values table

    -- instead of a loop from 0-255)

    SELECT [Ascii Code] = STR(Number),

    [Ascii Character] = CHAR(Number),

    [ISNUMERIC Returns] = ISNUMERIC(CHAR(Number))

    FROM Master.dbo.spt_Values

    WHERE Name IS NULL

    AND ISNUMERIC(CHAR(Number)) = 1

    That code produces the following list of characters…

    Ascii Code Ascii Character ISNUMERIC Returns

    ---------- --------------- -----------------

    9 1

    10

    1

    11

    1

    12 1

    13

    1

    36 $ 1

    43 + 1

    44 , 1

    45 - 1

    46 . 1

    48 0 1

    49 1 1

    50 2 1

    51 3 1

    52 4 1

    53 5 1

    54 6 1

    55 7 1

    56 8 1

    57 9 1

    128 € 1

    160 1

    163 £ 1

    164 ¤ 1

    165 ¥ 1

    What are these characters?

    Ascii 9 is a TAB character and is included because a column of numbers is frequently delimited by a TAB.

    Ascii 10 is a Line Feed character and is included because the last column of numbers is frequently terminated by a Line Feed character.

    Ascii 11 is a Vertical Tab character and is included because the last column of numbers is frequently terminated by a Vertical Tab character.

    Ascii 12 is a Form Feed character and is included because the last column numbers of the last row is sometimes terminated by a Form Feed character.

    Ascii 13 is a Carriage Return character and is included because the last column of numbers is frequently terminated by a Carriage Return character.

    Ascii 36 (Dollar sign), 128 (Euro sign), 163 (British Pound sign), and 164 (Yen sign) are included because they are frequently used as enumerators to identify the type of number or, in this case, the currency type the number is meant to represent.

    Ascii 43 (Plus sign), 44 (Comma), 45 (Minus sign), and 46 (Decimal place) are included because they are frequently included in numeric columns to mark where on the number line the number appears and for simple formatting.

    Ascii 160 is a special "hard space" and is included because it is frequently used to left pad numeric columns so the column of numbers appears to be right justified.

    Ascii 32 is a "soft space" and is not included because a single space does not usually represent a column of numbers. Ascii 32 is, however, a valid numeric character when used to create right justified numbers as is Ascii 160 but a single Ascii 32 character is NOT numeric. In fact, a string of Ascii 32 spaces is not considered to be numeric but a string of spaces with even a single digit in it is considered to be numeric.

    Ascii 164 is a special character and is included because it is frequently used by accountants and some software to indicate a total or subtotal of some type. It is also used by some to indicate they don't know what the enumerator is.

    Ascii 48-59 are included because they represent the digits 0 through 9

    Set of Characters Treated as “Numeric” by ISNUMERIC

    Do notice that "e" and "d" (everybody forgets about this) are not included as numeric in the results because a single "e" or "d is NOT considered to be numeric. HOWEVER, these letters are for two different forms of scientific notation. So, if you have anything that looks like the following, ISNUMERIC will identify them as “Numeric”…

    SELECT ISNUMERIC('0d2345')

    SELECT ISNUMERIC('12e34')

    The “Rational” Solution

    Hopefully, I’ve proven that ISNUMERIC is NOT the way to determine if a value or a column of values IS ALL DIGITS. So, what to do? We could write something really complex that loops through each character to see if it’s a digit… or … we can use a very simple rational expression to do the dirty work for us. The formula is…

    NOT LIKE '%[^0-9]%'

    … and it can be used directly (preferred method for performance reasons)…

    SELECT *

    FROM sometable

    WHERE somecolumn NOT LIKE '%[^0-9]%'

    … or, if you don’t mind the performance hit, you can create your own “IsAllDigits” function…

    CREATE FUNCTION dbo.IsAllDigits

    /********************************************************************

    Purpose:

    This function will return a 1 if the string parameter contains only

    numeric digits and will return a 0 in all other cases.

    --Jeff Moden

    ********************************************************************/

    --===== Declare the I/O parameters

    (@MyString VARCHAR(8000))

    RETURNS INT

    AS

    BEGIN

    RETURN (SELECT CASE

    WHEN @MyString NOT LIKE '%[^0-9]%'

    THEN 1

    ELSE 0

    END)

    END

    If you have any questions on this article, please don’t hesitate to post them. And thanks for taking the time to read it and, perhaps, get your favorable vote ?.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Adam Haines

    SSC-Insane

    Points: 23197

    Good article Jeff.

    I guess the method of choice comes down to knowing your data. If his column is what I think it is, an integer column converted to a varchar; he should be alright, but you never know.

  • Julie Breutzmann

    Hall of Fame

    Points: 3587

    In fact, I wrote a pretty hefty article about this on another web site...

    ISNUMERIC is not “ALL DIGITS”Submitted by Jeff Moden, 03 Jun 2006

    All rights reserved.

    Thanks Jeff. Great article and very understandable, even for those of us who are NOT DBA's.

    Julie

  • Jeff Moden

    SSC Guru

    Points: 994239

    Thanks for the feedback, Julie... hmm... maybe I should publish it here, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Sean Peters-452786

    SSC Enthusiast

    Points: 170

    Of course this isnt a perfect solution:

    What you would really like to isolate are values that won't cast as float like '11,22,33' or 1234e456', etc.

    Your new function returns a value of "0" for values like '1,000,000' or '1 e -23' which will actuall cast as float just fine. Which are valid numeric values.

    Now if somebody can find an efficient way of doing that short of opening a cursor and cast each value that would be usefull . . .

  • Jeff Moden

    SSC Guru

    Points: 994239

    Heh... of course it's a perfect solution... but only for what it was designed for... it's an IsAllDigits function and wasn't meant to be anything else. If you want to be able to detect anything that will convert to Float or Money, then IsNumeric works just fine. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Sean Peters-452786

    SSC Enthusiast

    Points: 170

    No IsNumeric('11,22,33') = 1 but cast ('11,22,33' as float) will raise an error. So isnumeric is not the perfect way to select data that will cast as float, money, or numeric, etc. from a char datatype and all its cousins.

    You do have a point about working as designed. I guess I just don't see exactly how it could be useful if it's not a better mouse trap. Say hypothetically I needed a way to eliminate all char type values from a field before I casted them as float. This query:

    Select Cast(columnA as float) From Table1 Where Isnumeric(ColumnA)

    falls short because of the issue I raised above. I can still raise an error converting certain char values that pass the isnumeric sniff test but will not actually cast as float. If I use your function I get the opposite:

    Select Cast(columnA float) From Table1 Where IsAllDigits(ColumnA)

    Here I run the risk of omitting values that could safely cast as float. So in the end, IsAllDigits is fine as far as working as designed. It just may be as useful as it seems.

    Nice write up though, don't get me wrong . . . =)

  • thegreatsamarth

    Mr or Mrs. 500

    Points: 578

    cdun2 (2/14/2008)


    Hello,

    I forget. How can PATINDEX be used to find column values where the data is all numeric? I've tried, the following:

    select distinct acct from tbl_CYProcessedSales

    where PatIndex('%[0-9]%',Acct) > 0

    order by acct

    Acct is varchar(8). What am I doing wrong?

    Thank you for your help!

    CSDunn

    As mentioned, u want to find all the columns where data is all NUMERIC.... so will it include special symbols???

    Anywayz, If u have to use PATINDEX, refer below query:

    This Query exclude the data having special symbols.

    SELECT DISTINCT acct

    FROM tbl_CYProcessedSales

    WHERE patindex('%[~`!@#$%^&*()_=+\|{};",<>/?a-z]%',acct)=0

    ORDER BY acct

    I hope this is what you want...

    --Samarth 🙂

  • Jeff Moden

    SSC Guru

    Points: 994239

    Sean Peters (7/31/2008)


    No IsNumeric('11,22,33') = 1

    Ummm... nope... it isn't...

    SELECT ISNUMERIC('11,22,33')

    -----------

    0

    (1 row(s) affected)

    I will agree, though, that ISNUMERIC isn't the panacea that some make it out to be. If it looks like the MONEY datatype (has comma's in the correct spots or a currency symbol), it won't convert to float (or anything except MONEY) even if ISNUMERIC says it's a convertable number.

    And, thanks for the nice compliment about the write up. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • mweaver

    Old Hand

    Points: 336

    I think this handles all values that will convert to float (except those that use 'D' for scientific notation):

    create function dbo.fn_to_number

    (

    @num varchar(64)

    )

    returns float

    begin

    return case

    -- not empty

    when len(@num) > 0

    -- no illegal characters

    and patindex('%[^0-9.e-]%', @num) = 0

    -- at most one decimal

    and charindex('.',@num, charindex('.',@num) + 1) = 0

    -- if there is a decimal, it must not be first or last (e.g. ".2" and "2." are both invalid)

    and (charindex('.', @num) = 0 or charindex('.',@num) between 2 and len(@num) - 1)

    -- at most one "e"

    and charindex('e', @num, charindex('e', @num) + 1) = 0

    -- if there is an "e", it must not be first or last (e.g. "e2" and "2e" are both invalid)

    and (charindex('e', @num) = 0 or charindex('e',@num) between 2 and len(@num) - 1)

    -- if both decimal and "e" exist, decimal must be first

    and (charindex('e', @num) = 0 or charindex('.', @num) <= charindex('e', @num))

    -- at most two "-" (e.g. "-2.01e-4")

    and charindex('-',@num, charindex('-',@num, charindex('-',@num) + 1) + 2) = 0

    -- the first "-" must either: not exist, be first, or immediately follow an "e"

    and (charindex('-',@num) <=1 or substring(@num,charindex('-',@num) - 1 ,1) = 'e')

    -- the second "-" must either not exist or immediately follow an "e"

    and (charindex('-', @num, charindex('-',@num) + 1) = 0 or substring(@num,charindex('-',@num, charindex('-',@num) + 1) - 1, 1) = 'e')

    then

    cast(@num as float)

    else

    null

    end

    end

    go

  • Jeff Moden

    SSC Guru

    Points: 994239

    Jeff Moden (8/1/2008)


    Sean Peters (7/31/2008)


    No IsNumeric('11,22,33') = 1

    Ummm... nope... it isn't...

    SELECT ISNUMERIC('11,22,33')

    -----------

    0

    (1 row(s) affected)

    I will agree, though, that ISNUMERIC isn't the panacea that some make it out to be. If it looks like the MONEY datatype (has comma's in the correct spots or a currency symbol), it won't convert to float (or anything except MONEY) even if ISNUMERIC says it's a convertable number.

    And, thanks for the nice compliment about the write up. 🙂

    I ran into my own old post and thought I'd update it even though it's years late. I was still stuck on SQL Server 2000 at the time and it worked as posted. The same exact code does, in fact, produce a 1 as of SQL Server 2005.

    I guess this also shows that you have to really do have to regression test to a somewhat fuller extent that you might imagine. This is why I don't have the time for cumulative updates and I'll generally skip a service pack here and there unless they have something really useful in them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply