Identify letter vs. number

  • I have a character column which will have varying data, and potentially varying lengths. It needs to be standardized to 4 characters, adding in zeroes either to the front or middle, depending if the value starts with a letter(s) or not.

    A012

    A12

    12

    1

    AB12

    AB2

    etc.

    Then I would be looking at evaluating them to update them all to

    A012

    A012

    0012

    0001

    AB12

    AB02

    I know how I want to script the actual update, but first I need to know how many combinations there are in the data, so I know what to accomodate for.

    Is there a simple way to poll this information to tell me how many combinations there are? Eg. something that would come back and tell me if each character is a letter or a number and return me the combo like:

    letter-num-num-num

    letter-num-num

    num-num

    etc.

    I know there's a hard way to do this, but I've got to prep the information to give to some one else to actually gather the answer for me, so I want to make it as easy as possible.

  • You can do it a few ways. Personally I'm a fan of using .NET regular expressions to do just that, but that involves allowing CLR integration, which some are leery to allow.

    Failing that - you can use something like patindex to help figure it out. You'll have to run 8 separate searches, or columns.

    Patindex allows you to do something like

    select count(*),

    sum(case when patindex('%[a-zA-Z][0-9][0-9][0-9]%',myfield)=1 then 1 else 0 end) as A000,

    sum(case when patindex('%[a-zA-Z][a-zA-Z][0-9][0-9]%',myfield)=1 then 1 else 0 end) as AA00,

    etc...

    from

    table1

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • You'll have to run 8 separate searches, or columns

    Or... we can cheat like hell 😛

    --===== This just creates demonstrable test data and is NOT part of the solution

    SET NOCOUNT ON

    DECLARE @TestData TABLE(String VARCHAR(10))

    INSERT INTO @TestData (String)

    SELECT 'A012' UNION ALL

    SELECT 'A12' UNION ALL

    SELECT '12' UNION ALL

    SELECT '1' UNION ALL

    SELECT 'AB12' UNION ALL

    SELECT 'AB2'

    --===== This is the solution for the known constraints on the data as posted

    SELECT Original = String,

    Modified = CASE

    WHEN d.LastLetter = 0 THEN RIGHT('0000'+d.String,4)

    WHEN d.LastLetter = 1 THEN LEFT(d.String,d.LastLetter)

    + RIGHT('0000'+SUBSTRING(d.String,d.FirstDigit,4),3)

    WHEN d.LastLetter = 2 THEN LEFT(d.String,d.LastLetter)

    + RIGHT('0000'+SUBSTRING(d.String,d.FirstDigit,4),2)

    END

    FROM (--==== Derived table "d" finds the interface between letters/digits

    SELECT String,

    LastLetter = PATINDEX('%[a-z][0-9]%',String),

    FirstDigit = PATINDEX('%[0-9]%',String)

    FROM @TestData

    )d

    RESULTS:

    =======

    Original Modified

    ---------- -------------

    A012 A012

    A12 A012

    12 0012

    1 0001

    AB12 AB12

    AB2 AB02

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ...or, perhaps, something a bit more "auto-magic"... 😀

    --===== This just creates demonstrable test data and is NOT part of the solution

    SET NOCOUNT ON

    DECLARE @TestData TABLE(String VARCHAR(10))

    INSERT INTO @TestData (String)

    SELECT 'A012' UNION ALL

    SELECT 'A12' UNION ALL

    SELECT '12' UNION ALL

    SELECT '1' UNION ALL

    SELECT 'AB12' UNION ALL

    SELECT 'AB2'

    --===== This is the solution for the known constraints on the data as posted

    DECLARE @MaxWidth TINYINT

    SET @MaxWidth = (SELECT MAX(LEN(String)) FROM @TestData)

    SELECT Original = String,

    Modified = LEFT(d.String,d.LastLetter)

    + RIGHT('0000'+SUBSTRING(d.String,d.FirstDigit,4),@MaxWidth-d.LastLetter)

    FROM (--==== Derived table "d" finds the interface between letters/digits

    SELECT String,

    LastLetter = PATINDEX('%[a-z][0-9]%',String),

    FirstDigit = PATINDEX('%[0-9]%',String)

    FROM @TestData

    )d

    RESULTS:

    ========

    Original Modified

    ---------- ------------------

    A012 A012

    A12 A012

    12 0012

    1 0001

    AB12 AB12

    AB2 AB02

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • :)Or in a simple query:

    SELECTstring Original

    ,CASEWHEN ISNUMERIC(string) = 0

    THEN LEFT(string, PATINDEX('%[0-9]%',string)-1)

    ELSE '' END-- Characters first

    + REPLICATE('0', 4-LEN(string))-- Stuff with zeros

    + SUBSTRING(string, PATINDEX('%[0-9]%',string), 4)-- Add numeric remainder

    ASModified

    FROM @TestData

    Best Regards,

    Chris Büttner

  • Drat... both of ours come up with a little problem when we have data like this...

    --===== This just creates demonstrable test data and is NOT part of the solution

    SET NOCOUNT ON

    DECLARE @TestData TABLE(String VARCHAR(10))

    INSERT INTO @TestData (String)

    SELECT 'A012' UNION ALL

    SELECT 'A12' UNION ALL

    SELECT '12' UNION ALL

    SELECT '1' UNION ALL

    SELECT 'AB12' UNION ALL

    SELECT 'AB2' UNION ALL

    SELECT 'A' UNION ALL

    SELECT ''

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This takes care of the single letter and blank situations...

    --===== This is the solution for the known constraints on the data as posted

    DECLARE @MaxWidth TINYINT

    SET @MaxWidth = (SELECT MAX(LEN(String)) FROM @TestData)

    SELECT Original = String,

    Modified = LEFT(d.String,d.LastLetter)

    + RIGHT(

    REPLICATE('0',@MaxWidth)

    + SUBSTRING(d.String,d.LastLetter+1,@MaxWidth)

    , @MaxWidth-d.LastLetter)

    FROM (--==== Derived table "d" finds the interface between letters/digits

    SELECT String,

    LastLetter = PATINDEX('%[a-z][0-9]%',String+'0')

    FROM @TestData

    )d

    If ya really gotta have it as a single query, replace @MaxWidth with 4 and replace d.LastLetter with the formula for LastLetter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Interesting, definitely some syntax in there I have to learn..

    Jeff, your "auto magic" one came the closest but I got some extra zeros I haven't quite figured yet -

    Original Modified

    -------------------

    A14 A00014

    B2 B0002

    OBT 0000OBT

    YH3 YH0003

    XX 0000XX

    So - too many zeroes, the whole string must be 4. And if there was no digit found it puts the zeros at the beginning of the string rather than after the letters.

    Your query accomodating for blanks didn't give me any modification, similarly the earlier non-auto magic query didn't produce a modification.

    And the other query offered, the simple one -

    SELECTString Original

    ,CASEWHEN ISNUMERIC(Page) = 0

    THEN LEFT(Page, PATINDEX('%[0-9]%',String)-1)

    ELSE '' END-- Characters first

    + REPLICATE('0', 4-LEN(Page))-- Stuff with zeros

    + SUBSTRING(String, PATINDEX('%[0-9]%',String), 4)-- Add numeric remainder

    ASModified

    FROM @TestData

    This one gives me some results for short page lengths but none of the more complex ones are returned and I got an error "invalid length parameter passed to the substring function."

    The query would only be evaluating records where the field has content (so no worries about blanks) and where the content is less than 4 characters. If it has 4 or more it will be ignored in the update.

  • The following query should also take care of values without digits:

    SELECTstring Original

    ,CASEWHEN PATINDEX('%[0-9]%',string) > 0

    THEN LEFT(string, PATINDEX('%[0-9]%',string)-1)

    ELSE RTRIM(string) END-- Characters first

    + REPLICATE('0', 4-LEN(string))-- Stuff with zeros

    + CASEWHEN PATINDEX('%[0-9]%',string) > 0

    THEN SUBSTRING(string, PATINDEX('%[0-9]%',string), 4)

    ELSE '' END-- Numeric Trailer

    ASModified

    FROM @TestData

    Best Regards,

    Chris Büttner

  • Jeff, your "auto magic" one came the closest but I got some extra zeros I haven't quite figured yet -

    Original Modified

    -------------------

    A14 A00014

    B2 B0002

    OBT 0000OBT

    YH3 YH0003

    XX 0000XX

    So - too many zeroes, the whole string must be 4. And if there was no digit found it puts the zeros at the beginning of the string rather than after the letters.

    Yes... in my first attempt, that's what happens... did you try the second attempt? For your convenience, here it is again...

    --===== This just creates demonstrable test data and is NOT part of the solution

    SET NOCOUNT ON

    DECLARE @TestData TABLE(String VARCHAR(10))

    INSERT INTO @TestData (String)

    SELECT 'A012' UNION ALL

    SELECT 'A12' UNION ALL

    SELECT '12' UNION ALL

    SELECT '1' UNION ALL

    SELECT 'AB12' UNION ALL

    SELECT 'AB2' UNION ALL

    SELECT 'A' UNION ALL

    SELECT '' UNION ALL

    SELECT 'A14' UNION ALL

    SELECT 'B2' UNION ALL

    SELECT 'OBT' UNION ALL

    SELECT 'YH3' UNION ALL

    SELECT 'XX'

    --===== This is the solution for the known constraints on the data as posted

    DECLARE @MaxWidth TINYINT

    SET @MaxWidth = (SELECT MAX(LEN(String)) FROM @TestData)

    SELECT Original = String,

    Modified = LEFT(d.String,d.LastLetter)

    + RIGHT(

    REPLICATE('0',@MaxWidth)

    + SUBSTRING(d.String,d.LastLetter+1,@MaxWidth)

    , @MaxWidth-d.LastLetter)

    FROM (--==== Derived table "d" finds the interface between letters/digits

    SELECT String,

    LastLetter = PATINDEX('%[a-z][0-9]%',String+'0')

    FROM @TestData

    )d

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes I did try that one, and it didn't appear to make a modification, but I ran it again, here's a snippet of the results:

    Original Modified

    -----------------------------------

    B10 B0

    B101 B01

    B11 B1

    B12 B2

    Here's a test sample of what my data looks like, this should get you the same results as I see.

    DECLARE @TestData TABLE(page char(32))

    insert @testdata (page)

    select 'G18' union all

    select 'G2' union all

    select 'D99' union all

    select 'A08' union all

    select '16' union all

    select 'ENT' union all

    select 'M183' union all

    select 'PAGE1'

    DECLARE @MaxWidth TINYINT

    SET @MaxWidth = (SELECT MAX(LEN(Page)) FROM Arch1)

    SELECT Original = Page,

    Modified = LEFT(d.Page,d.LastLetter)

    + RIGHT(

    REPLICATE('0',@MaxWidth)

    + SUBSTRING(d.Page,d.LastLetter+1,@MaxWidth)

    , @MaxWidth-d.LastLetter)

    FROM (--==== Derived table "d" finds the interface between letters/digits

    SELECT Page,

    LastLetter = PATINDEX('%[a-z][0-9]%',Page+'0')

    FROM @testdata

    )d

    group by page, d.lastletter

    order by page

  • Mindy, the problem is that you have the data stored in a CHAR(32)... trailing blanks come into play there. Either change the column to a VARCHAR(32) or modify my code as follows...

    DECLARE @TestData TABLE(page char(32))

    insert @testdata (page)

    select 'G18' union all

    select 'G2' union all

    select 'D99' union all

    select 'A08' union all

    select '16' union all

    select 'ENT' union all

    select 'M183' union all

    select 'PAGE1'

    --===== This is the solution for the known constraints on the data as posted

    DECLARE @MaxWidth TINYINT

    SET @MaxWidth = (SELECT MAX(LEN(page)) FROM @TestData)

    SELECT Original = page,

    Modified = LEFT(d.page,d.LastLetter)

    + RIGHT(

    REPLICATE('0',@MaxWidth)

    + SUBSTRING(d.page,d.LastLetter+1,@MaxWidth)

    , @MaxWidth-d.LastLetter)

    FROM (--==== Derived table "d" finds the interface between letters/digits

    SELECT Page = RTRIM(page),

    LastLetter = PATINDEX('%[a-z][0-9]%',RTRIM(page)+'0')

    FROM @TestData

    )d

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Aaah, well I don't have control of the schema but that makes sense. Thanks! 🙂

  • Like I said... use the code I just posted... it takes the CHAR(32) into account...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's still putting in an additional zero, but I'll figure that out. The syntax is a little over my head so this is a good project for me.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply