upper and lower case pattern detection?

  • Continuing my crusade to tidy up our contacts database, I wanted to write something which would detect a name like this :

    DeBeers

    AND keep it like this, i.e. not 'proper case' it to Debeers

    SO... I thought I needed to search for a string that was basically uppercase, lowercase, uppercase, lowercase...

    The field I am searching on has a CI collation - so as I understand it, case insensitive.

    I thought I could use something like

    IF

    CHARINDEX([A-Z],fieldtocheck,1)>0

    AND

    CHARINDEX([a-z],fieldtocheck,2)>0

    AND

    CHARINDEX([A-Z],fieldtocheck,3)>0

    AND

    CHARINDEX([a-z],fieldtocheck,4)>0

    THEN

    SET @string = @string

    BUT... 1. I'm not sure if that would work (my knowledge of actual programming / creating functions is pretty basic) and 2. how I deal with the fact that the collation is case insensitive - is there a convert to CS or something?

    Comments welcome!

  • Even in a case-insensitive collation, you can convert a string to it's component ASCII code numbers, and check for patterns in those. Since upper-case letters are a different range of numbers than lower-case, it will work.

    For example:

    declare @String varchar(100)

    select @String = 'DeBeers'

    select ascii(substring(@string, number, 1))

    from dbo.numbers

    where number between 1 and len(@string)

    (This assumes you have a Numbers table. If not, it's a good idea to create one.)

    You might also be able to do this with a CLR Regex. I'm not skilled at those (yet), but there are others here who are (Matt Miller, for example).

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hmm, I kinda see what your getting at, however, I've tinkered and came up with something that works, all be it not overly elegant!

    declare @string varchar(50)

    select @string = 'Mike'

    IF

    NOT

    (ASCII(substring(@string,1,1)) between 65 AND 90

    AND

    ASCII(substring(@string,2,1)) between 97 AND 122

    AND

    ASCII(substring(@string,3,1)) between 65 AND 90

    AND

    ASCII(substring(@string,4,1)) between 97 AND 122)

    print 'Doing function'

    else print @string

    Gives me 'Doing function'

    and changing @string = 'DeBeers' gives me DeBeers

    So, I think I can modify my name correction function and see if it works any better 🙂

  • Here's another option...

    declare @t table (name varchar(20))

    insert @t

    select 'DeBeers'

    union all select 'Smith'

    union all select 'debeers'

    union all select 'McTavish'

    union all select 'Mchammer'

    union all select 'lo'

    select * from @t where name like '[ABCDEFGHIJKLMNOPQRSTUVWXYZ][abcdefghijklmnopqrstuvwxyz][ABCDEFGHIJKLMNOPQRSTUVWXYZ][abcdefghijklmnopqrstuvwxyz]%' collate Latin1_General_CS_AS

    /*

    name

    --------------------

    DeBeers

    McTavish

    */Note that [ABCDEFGHIJKLMNOPQRSTUVWXYZ] <> [A-Z] since [A-Z] = [AbBcCdD...zZ], confusingly!

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Not being 100% what you were looking for, I am going to assume two things. First, that you are looking for any name that has multiple upper case characters and second that you are wanting to end up with the name with single/first character upper case.

    Both can be done with simple adjustments to Ryan's code above:

    declare @t table (name varchar(20))

    insert @t

    select 'DeBeers'

    union all select 'Smith'

    union all select 'debeers'

    union all select 'McTavish'

    union all select 'Mchammer'

    union all select 'lo'

    union all select 'MacTierney' -- Example of a valid name that would not match first code sample

    union all select 'MacTiernEy' -- Example of a mistyped name that would not match first code sample

    select left(name,1) + lower(right(name,len(name)-1)) as Corrected_Name from @t

    where name like '[ABCDEFGHIJKLMNOPQRSTUVWXYZ]%[ABCDEFGHIJKLMNOPQRSTUVWXYZ]%'

    collate Latin1_General_CS_AS

    GL!

  • This function may help you out. It will resolve all capitals in string that are not in the beginning and will capitalize

    the first letter of every word passed in.

    CREATE FUNCTION udf_TitleCase (@InputString varchar(4000) )

    RETURNS VARCHAR(4000)

    AS

    BEGIN

    DECLARE @index INT

    DECLARE @Char CHAR(1)

    DECLARE @OutputString VARCHAR(255)

    SET @OutputString = LOWER(@InputString)

    SET @index = 2

    SET @OutputString =

    STUFF(@OutputString, 1, 1,UPPER(SUBSTRING(@InputString,1,1)))

    WHILE @index <= LEN(@InputString)

    BEGIN

    SET @Char = SUBSTRING(@InputString, @index, 1)

    IF @Char IN (' ', ';', ':', '!', '?', ',', '.', '_', '-', '/', '&','''', '(')

    IF @index + 1 <= LEN(@InputString)

    BEGIN

    IF @Char != '''' OR

    UPPER(SUBSTRING(@InputString, @index + 1, 1)) != 'S'

    SET @OutputString =

    STUFF(@OutputString, @index + 1, 1,UPPER(SUBSTRING(@InputString, @index + 1, 1)))

    END

    SET @index = @index + 1

    END

    RETURN ISNULL(@OutputString,'')

    END

  • RyanRandall (5/16/2008)


    *Note that [ABCDEFGHIJKLMNOPQRSTUVWXYZ] <> [A-Z] since [A-Z] = [AbBcCdD...zZ], confusingly!

    Very nice job... You just didn't use the correct Collation, Ryan... 🙂

    declare @t table (name varchar(20))

    insert @t

    select 'DeBeers'

    union all select 'Smith'

    union all select 'debeers'

    union all select 'McTavish'

    union all select 'Mchammer'

    union all select 'lo'

    select * from @t where name like '[A-Z][a-z][A-Z][a-z]%' collate Latin1_General_BIN

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Note that [ABCDEFGHIJKLMNOPQRSTUVWXYZ] <> [A-Z] since [A-Z] = [AbBcCdD...zZ], confusingly!

    I didn't know that, but it does make a lot of sense with something I was just doing - so thanks for that!

    Ill give your methods a try at some point, for now the one I posted seems to be working reasonably well, I just need to look at my contacts now and see how well this whole name correction exercise is going!

  • peitech (5/19/2008)


    Note that [ABCDEFGHIJKLMNOPQRSTUVWXYZ] <> [A-Z] since [A-Z] = [AbBcCdD...zZ], confusingly!

    I didn't know that, but it does make a lot of sense with something I was just doing - so thanks for that!

    Ill give your methods a try at some point, for now the one I posted seems to be working reasonably well, I just need to look at my contacts now and see how well this whole name correction exercise is going!

    Please look at my post immediately above yours... especially the code... what has been said is not always true. It depends on the collation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi peitech,

    One more 🙂

    DECLARE @t TABLE (name varchar(20))

    INSERT @t

    SELECT 'DeBeers'

    UNION ALL

    SELECT 'debeers'

    UNION ALL

    SELECT 'Test'

    UNION ALL

    SELECT 'Debeers'

    SELECT * FROM @t WHERE NAME COLLATE Latin1_General_CS_AS like 'DeBeers'

    ----

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply