Tame Those Strings! Part 5 - Proper Casing Strings

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/sjones/tamethosestringspart5propercasingstrings.asp

  • Thank you for sharing your experience with this problem. I'm curious why you had to replace the ' ' (space) character with an '@' symbol. Would your example fail without this conversion?

    quote:


    Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/sjones/tamethosestringspart5propercasingstrings.asp


  • I had a similar problem and initially came up with a similar solution. However in practice our database contains names like MacDoanld, McDonald, Mace. What i've ended up with is a huge CASE statement of exceptions IF name = nnn then xxx after the exceptions have been handled a solution similar to yours is used.

    Ryan.

  • Yikes, that could be a really large case statement. I'd look at some other method of handling this.

    rgibbons: I used the @ symbol to give me something that wasn't used in the string as a placeholder. Otherwise I'd be continuously searching the same items.

    Sorry for the delayed reponse.

    Steve Jones

    steve@dkranch.net

  • We had a similar problem, but it also extended to addresses. Addresses were coming in from a legacy system in all caps, and needed to be converted to proper case. We also had the "MacDonald" problem, and "Henry VIII" as well. So far, our solution is to use logic that works for most, and leave it to the user to change the result via online edit if necessary.

  • I'd always leave it to users to change things, but try to standardize whereever possible for storage. especially for loads of data from an external source.

    Steve Jones

    steve@dkranch.net

  • I just had to search this out so that I could work through this issue. Thanks for doing much of the leg work for me.

    OBTW - Great Article / Series!

    David

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Thanks, glad it helped.

    Steve Jones

    steve@dkranch.net

  • I see no one has mentioned our double-barrelled friends yet!

  • The string series has been very helpful.  I wanted to reread the 'substring' article, but the link doesn't work, although all the others do.  If I go back to the previous article, the link http://www.sqlservercentral.com/columnists/sjones/tamestrings1.asp works fine.

  • Thanks for the link catch. It's fixed.

  • First off, I'd like to say that this was an "OK" example.  This is definately NOT how I would approach this issue of Proper Case.  Since most of the time I'd like to convert a Name to Proper Case either during Insertion of a Record, or maybe I'd like to Batch Update an entire Table, this example may end up being hard to use from a programmer's perspective.

    Instead, I created my own algorithm using SQL's User Defined Functions.  And yes, this even handles Irish names.  I hope you are amazed at the simplicity of this routine, as I even thought it would be much harder .  This was all created in 0.5 hours, so I might have missed something.  Please let me know if there are other special cases I've missed.


    CREATE FUNCTION dbo.udf_ProperCase
    (

         @Value VarChar(255)
    )
    RETURNS VarChar(255)
    AS
    BEGIN
    -- Variables
    Declare @index Int
     
    -- Convert the Case of the Words
    Set @index = 0
    Set @index = CharIndex(' ', @Value, 1)
    While (@Index <> 0)
        BEGIN
        Set @index = @index + 1
        -- Handle special cases (this can be modified for different languages)
        If (Lower(SubString(@Value, @index, 2)) = 'mc') -- Irish Names Like McDonald
            Set @Value = SubString(@Value, 1, @index - 1)
                + Upper(SubString(@Value, @index, 1))
                + SubString(@Value, @index + 1, 1)
                + Upper(SubString(@Value, @index + 2, 1))
                + SubString(@Value, @index + 3, Len(@Value) - @index - 2)
     

        Else If (Lower(SubString(@Value, @index, 3)) = 'mac') -- Irish Names Like MacDonald
            Set @Value = SubString(@Value, 1, @index - 1)
                + Upper(SubString(@Value, @index, 1))
                + SubString(@Value, @index + 1, 2)
                + Upper(SubString(@Value, @index + 3, 1))
                + SubString(@Value, @index + 4, Len(@Value) - @index - 3)
        Else
            Set @Value = SubString(@Value, 1, @index - 1) + Upper(SubString(@Value, @index, 1)) + SubString(@Value, @index + 1, Len(@Value) - @index)
     
        -- Determine the Next Index
        Set @index = CharIndex(' ', @Value, @index + 1)
        END
     
    -- Convert any special cases
    Set @index = 0
    Set @index = CharIndex('''', @Value, 1) -- Irish Names Like O'Brien
    While (@Index <> 0)
        BEGIN
        Set @Value = SubString(@Value, 1, @index - 2)
            + Upper(SubString(@Value, @index - 1, 1))
            + SubString(@Value, @index, 1)
            + Upper(SubString(@Value, @index + 1, 1))
            + SubString(@Value, @index + 2, Len(@Value) - @index)
     
        -- Determine the Next Index
        Set @index = CharIndex('''', @Value, @index + 1)
        END
     
    -- Convert the First Word (because it will not happen above)
    -- Also, Just in case there is only one word
    Set @Value = Upper(SubString(@Value, 1, 1)) + SubString(@Value, 2, Len(@Value) - 1)
    RETURN @Value
    END


    Using this is much simplier:

    Select FirstName + ' ' + LastName As UserName, dbo.udf_ProperCase(FirstName + ' ' + LastName) As ProperCased From Users
     
    Update Users
    Set FirstName = dbo.udf_ProperCase(FirstName),
    LastName = dbo.udf_ProperCase(LastName)
     

    Enjoy!!!

  • Many thanks to Steve for providing the basics and for tymberwyld mentioning the "Special Cases".
    It took somewhat longer than half an hour but for about four hours of refinement and testing,
    the udf below does what I need, including special cases, and does it in one loop.
    Disclaimer: Author has only programmed in SQL Server for 4 months, so there may be efficiencies
    of which he is unaware.
    Hope this is of benefit:
    SET QUOTED_IDENTIFIER OFF 
    GO
    SET ANSI_NULLS ON 
    GO
    /* =============================================================================
    ** Convert the entire input field to lowercase then set the first letter of each
    ** 'word' to uppercase.
    ** Useful in making names and addresses appear in everyday capitalization.
    ** Re-capitalization, e.g., for "USMC", is the responsibility of the User.
    ** BTW, Don't use this with Turkish words - much offense could be caused!
    ** Author: Stephen L. Anslow (Yorba Linda, CA)
    ** ========================================================================== */
    CREATE FUNCTION dbo.udf_GetProperCase
    (
       /* --------------------------------------------------------------------------
       ** @InWords  : Required, defined to accept a large input field. Limited to
       **             3999 because the Delimiter is prepended to make the WHILE loop
       **             handle the first word too, in case it happens to be one of the
       **             "special Cases" the function recognizes.
       ** @Delimiter: Optional, defaults to 'blank' but may be any nchar(1) value.
       ** ----------------------------------------------------------------------- */
       @InWords   nvarchar(3999)
     , @Delimiter nchar(1)=NULL
    )
    RETURNS nvarchar(3999)
    AS
    BEGIN
       /* --------------------------------------------------------------------------
       ** Bail if it's a NULL field.
       ** ----------------------------------------------------------------------- */
       IF @InWords IS NULL
          RETURN @InWords
       /* --------------------------------------------------------------------------
       ** Local variables to hold the in-process string and 'where we are in the
       ** transposition process' string pointers and lengths.
       ** ----------------------------------------------------------------------- */
       DECLARE @INTERIM   nvarchar(3999)    -- Holds the data as it is being changed.
       DECLARE @DELIMPOS  int               -- 'Next' delimiter in the string.
       DECLARE @WORDSTART int               -- 1st char of the next 'word'.
       DECLARE @STRINGLEN int, @ORIGLEN int -- Length of the input string and the
                                            -- trimmed string.
       DECLARE @DELIM     nchar(1)          -- The actual Delimiter to use.
       /* --------------------------------------------------------------------------
       ** Determine the delimiter to use.
       ** ----------------------------------------------------------------------- */
       SELECT @DELIM      = ISNULL(@Delimiter,' ')
       /* --------------------------------------------------------------------------
       ** Save the length of the ORIGINAL string trimming ONLY the right 'end' in
       ** case the delimiter is a blank and there are leading blanks - we need to
       ** reinstate them after transposing is complete.
       ** Bail if the input is all blank...
       ** ----------------------------------------------------------------------- */
       SELECT @ORIGLEN    = LEN(RTRIM(@InWords))
       IF @ORIGLEN = 0
          RETURN @InWords
       /* --------------------------------------------------------------------------
       ** Get the fully trimmed string and its length.
       ** ----------------------------------------------------------------------- */
       SELECT @INTERIM    = LTRIM(RTRIM(LOWER(@InWords)))
       SELECT @STRINGLEN  = LEN(@INTERIM)
       /* --------------------------------------------------------------------------
       ** Prepended one delimiter and adjust the saved length that's used in the
       ** WHILE stop condition.
       ** ----------------------------------------------------------------------- */
       SELECT @INTERIM    = @DELIM + @INTERIM
       SELECT @STRINGLEN  = @STRINGLEN + 1
       /* --------------------------------------------------------------------------
       ** Seed the WHILE loop's begin-location.
       ** ----------------------------------------------------------------------- */
       SELECT @DELIMPOS   = 1
       /* --------------------------------------------------------------------------
       ** WHILE only runs if a delimiter is located AND there's data after the
       ** delimiter.
       ** ----------------------------------------------------------------------- */
       WHILE (     @DELIMPOS > 0
               AND @DELIMPOS + 1 <= @STRINGLEN )
          BEGIN
             /* --------------------------------------------------------------------
             ** Locate the 1st letter.  Even if it's a blank, we don't care.
             ** ----------------------------------------------------------------- */
             SELECT @WORDSTART = @DELIMPOS + 1
             /* --------------------------------------------------------------------
             ** Replace the 1st letter with its UPPER equivalent.
             ** ----------------------------------------------------------------- */
             SELECT @INTERIM   = STUFF(@INTERIM,@WORDSTART,1,UPPER(SUBSTRING(@INTERIM,@WORDSTART,1)))
             /* --------------------------------------------------------------------
             ** Is this a 'Mc', 'X.x', 'C/o', 'MacD', 'PO Box' or O'x situation?
             ** << Add other "Special Cases" here... >>
             ** ----------------------------------------------------------------- */
             IF      @STRINGLEN - @WORDSTART >= 1 AND SUBSTRING(@INTERIM,@WORDSTART,2) = 'Mc'
                   SELECT @INTERIM   = STUFF(@INTERIM,@WORDSTART+2,1,UPPER(SUBSTRING(@INTERIM,@WORDSTART+2,1)))
             ELSE IF @STRINGLEN - @WORDSTART >= 2 AND SUBSTRING(@INTERIM,@WORDSTART+1,1) = '.'
                   SELECT @INTERIM   = STUFF(@INTERIM,@WORDSTART+2,1,UPPER(SUBSTRING(@INTERIM,@WORDSTART+2,1)))
             ELSE IF @STRINGLEN - @WORDSTART >= 2 AND SUBSTRING(@INTERIM,@WORDSTART+1,1) = '/'
                   SELECT @INTERIM   = STUFF(@INTERIM,@WORDSTART+2,1,UPPER(SUBSTRING(@INTERIM,@WORDSTART+2,1)))
             ELSE IF @STRINGLEN - @WORDSTART >= 1 AND SUBSTRING(@INTERIM,@WORDSTART+1,1) = ''''
                   SELECT @INTERIM   = STUFF(@INTERIM,@WORDSTART+2,1,UPPER(SUBSTRING(@INTERIM,@WORDSTART+2,1)))
             ELSE IF @STRINGLEN - @WORDSTART >= 3 AND SUBSTRING(@INTERIM,@WORDSTART,4) = 'Macd'
                   SELECT @INTERIM   = STUFF(@INTERIM,@WORDSTART+3,1,UPPER(SUBSTRING(@INTERIM,@WORDSTART+3,1)))
             ELSE IF @STRINGLEN - @WORDSTART >= 5 AND SUBSTRING(@INTERIM,@WORDSTART,6) = 'Po box'
                   SELECT @INTERIM   = STUFF(@INTERIM,@WORDSTART+1,1,UPPER(SUBSTRING(@INTERIM,@WORDSTART+1,1)))
             /* --------------------------------------------------------------------
             ** Locate the next delimiter.
             ** ----------------------------------------------------------------- */
             SELECT @DELIMPOS  = CHARINDEX(@DELIM,SUBSTRING(@INTERIM,@WORDSTART,(@STRINGLEN-@WORDSTART+1)))
             /* --------------------------------------------------------------------
             ** If the Delimiter was found, it was in a substring, so set the true
             ** location in the entire string.
             ** ----------------------------------------------------------------- */
             IF @DELIMPOS > 0
                SELECT @DELIMPOS = @DELIMPOS + @WORDSTART - 1
          END
       /* --------------------------------------------------------------------------
       ** Remove the prepended delimiter.
       ** ----------------------------------------------------------------------- */
       SELECT @INTERIM    = RIGHT(@INTERIM,@STRINGLEN-1)
       SELECT @STRINGLEN  = @STRINGLEN - 1
       /* --------------------------------------------------------------------------
       ** If there were leading blanks, reinstate them.
       ** ----------------------------------------------------------------------- */
       IF @ORIGLEN <> @STRINGLEN
          SELECT @INTERIM = REPLICATE(' ',(@ORIGLEN - @STRINGLEN)) + @INTERIM
       /* --------------------------------------------------------------------------
       ** Done!
       ** ----------------------------------------------------------------------- */
       RETURN @INTERIM
    END
    GO
    SET QUOTED_IDENTIFIER OFF 
    GO
    SET ANSI_NULLS ON 
    GO
    
  • http://haveworld.blogspot.com/2004/08/sql-tip-proper-case.html

    take a look and let me know what you think.

    URL -- http://www.haveworld.blogspot.com

  • Steve, A great article. Two questions. If you are replacing every SPACE with a @ then what is the difference between searching for spaces or for @'s? I'm sure I am missing something here and want to be sure. Also, what about O'Reilley?

    Thanks for your work.

    David Abineri

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply