ProperCase Function

  • I'm having an issue converting a scalar function into Table-Valued function and was hoping some of the minds on this forum would be able to help.

    Here is the original Function:

    CREATE FUNCTION [Utility].[f_ProperCase]

    (

    @strIn VARCHAR(255)

    )

    RETURNS VARCHAR(255)

    AS

    BEGIN

    IF @strIn IS NULL

    RETURN NULL

    DECLARE

    @strOut VARCHAR(255),

    @i INT,

    @Up BIT,

    @C VARCHAR(2)

    SELECT

    @strOut = '',

    @i = 0,

    @Up = 1

    WHILE @i <= DATALENGTH(@strIn)

    BEGIN

    SET @C = SUBSTRING(@strIn,@i,1)

    IF @C IN (' ','-','''')

    BEGIN

    SET @strOut = @strOut + @C

    SET @Up = 1

    END

    ELSE

    BEGIN

    IF @up = 1

    SET @C = UPPER(@c)

    ELSE

    SET @C = LOWER(@c)

    SET @strOut = @strOut + @C

    SET @Up = 0

    END

    SET @i = @i + 1

    END

    RETURN @strOut

    END

    Here is my attempt at a Table Valued Function to perform the same task:

    CREATE FUNCTION [dbo].[ProperCase]

    (

    @StrIn VARCHAR(255)

    )

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    SELECTProperCaseString = (

    SELECTCASE

    WHEN GN.N = 1 THEN UPPER(SUBSTRING(@StrIn,GN.N,1))

    WHEN SUBSTRING(@StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(@StrIn,GN.N,1))

    ELSE LOWER(SUBSTRING(@StrIn,GN.N,1))

    END AS [text()]

    FROMdbo.GetNums(1,ISNULL(LEN(@StrIn),1)) AS GN

    FOR XML PATH('')

    );

    GO

    Part of the issue is that the XML concatenation is converting the blanks spaces into "& # x 2 0;" . I think I've found a way to correct this and get the output i would like (Shown in the below code):

    DECLARE @StringTest TABLE

    (

    StrIn VARCHAR(255)

    )

    INSERT INTO @StringTest(StrIn)

    VALUES ('thIS iS A StRing'),

    ('ANOTHEr STRING'),

    ('again another string')

    --Converts special characters

    SELECTCA1.ProperCase AS ProperCase

    FROM@StringTest AS ST

    CROSS APPLY (

    SELECTCASE

    WHEN GN.N = 1 THEN UPPER(SUBSTRING(ST.StrIn,GN.N,1))

    WHEN SUBSTRING(ST.StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(ST.StrIn,GN.N,1))

    WHEN SUBSTRING(ST.StrIn,GN.N,1) = ' ' THEN CHAR(32)

    ELSE LOWER(SUBSTRING(ST.StrIn,GN.N,1))

    END AS [text()]

    FROMdbo.GetNums(1,ISNULL(LEN(ST.StrIn),1)) AS GN

    FOR XML PATH('')

    ) AS CA1 (ProperCase)

    --Shows special characters i.e. space as a space

    SELECTCA1.ProperCase.value('/MyString[1]','varchar(8000)') AS ProperCase

    FROM@StringTest AS ST

    CROSS APPLY (

    SELECTCASE

    WHEN GN.N = 1 THEN UPPER(SUBSTRING(ST.StrIn,GN.N,1))

    WHEN SUBSTRING(ST.StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(ST.StrIn,GN.N,1))

    WHEN SUBSTRING(ST.StrIn,GN.N,1) = ' ' THEN CHAR(32)

    ELSE LOWER(SUBSTRING(ST.StrIn,GN.N,1))

    END AS [text()]

    FROMdbo.GetNums(1,ISNULL(LEN(ST.StrIn),1)) AS GN

    FOR XML PATH(''), root('MyString'), type

    ) AS CA1 (ProperCase)

    The main part of the issue though, and the long winded point of this post, is that i cannot seem to reflect this in a function

    CREATE FUNCTION [dbo].[ProperCase]

    (

    @StrIn VARCHAR(255)

    )

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    SELECTProperCaseString = (

    SELECT SQ1.[text()].value('/MyString[1]','varchar(8000)')

    FROM

    (SELECTCASE

    WHEN GN.N = 1 THEN UPPER(SUBSTRING(@StrIn,GN.N,1))

    WHEN SUBSTRING(@StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(@StrIn,GN.N,1))

    WHEN SUBSTRING(@StrIn,GN.N,1) = ' ' THEN ' '

    ELSE LOWER(SUBSTRING(@StrIn,GN.N,1))

    END AS [text()]

    FROMdbo.GetNums(1,ISNULL(LEN(@StrIn),1)) AS GN

    FOR XML PATH(''), root('MyString'), type

    ) AS SQ1

    );

    GO

    I get the follow error when I try to create the function:

    Msg 8155, Level 16, State 2, Procedure ProperCase, Line 24

    No column name was specified for column 1 of 'SQ1'.

    Msg 207, Level 16, State 1, Procedure ProperCase, Line 13

    Invalid column name 'text()'.

    Any suggestions/Improvements welcome!

  • Think I may have figured it out:

    CREATE FUNCTION [dbo].[ProperCase]

    (

    @StrIn VARCHAR(255)

    )

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    SELECTProperCaseString = (

    (SELECTCASE

    WHEN GN.N = 1 THEN UPPER(SUBSTRING(@StrIn,GN.N,1))

    WHEN SUBSTRING(@StrIn,GN.N-1,1) IN (' ','-','''') THEN UPPER(SUBSTRING(@StrIn,GN.N,1))

    ELSE LOWER(SUBSTRING(@StrIn,GN.N,1))

    END AS [text()]

    FROMdbo.GetNums(1,ISNULL(LEN(@StrIn),1)) AS GN

    FOR XML PATH(''), TYPE

    ).value('.','varchar(8000)')

    );

    Now to see if it actually is an improvement!

    And if anyone has anything to add please do.

  • You have definitely peaked my interest with this one. I've been playing with adapting Jeff Moden's DelimitedSplit8K function covered in the article at http://www.sqlservercentral.com/articles/Tally+Table/72993, but I don't have your solution yet. I'm running into the case where multiple trailing spaces is killing it. Then there's the matter of allowing for more than one delimiter at once. :w00t: I don't know if I'm running down a rabbit hole with no hope here, but it is definitely cool stuff. 😀

    I'm curious as to the performance when using the XML approach versus the tally table approach, so please post your performance testing results.

  • Hi,

    I'm not sure what your GetNums function looks like. I used the DelimitedSplit8K which you can find in the following article which explains how it works. http://www.sqlservercentral.com/articles/Tally+Table/72993

    Here's my option for you that seems to be doing what you need.

    CREATE FUNCTION [dbo].[ProperCase]

    (

    @StrIn VARCHAR(255)

    )

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    SELECT STUFF((SELECT ' ' + UPPER( LEFT( split.Item, 1)) + LOWER(SUBSTRING( split.Item, 2, 255)) word

    FROM dbo.DelimitedSplit8K( @StrIn, ' ') split

    FOR XML PATH(''),TYPE).value('.', 'varchar(255)'), 1, 1, '') AS ProperCased

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Can you post GetNums? There have been a few different proper case functions around here over the years. They are always a challenge because sometime you want every word capitalized and sometimes you want Title Case.

    Parade Of The Horribles

    Parade of the Horribles

    Subtle difference but makes a huge difference.

    What about things like Washington D.C.?

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • dbo.GetNums is as per below, straight from one of Jeff's discussions/articles I believe:

    CREATE FUNCTION [dbo].[GetNums]

    (

    @low AS BIGINT,

    @high AS BIGINT

    )

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    WITH

    L0 AS (SELECT c FROM (VALUES(1),(1)) AS D(c)),

    L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

    L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

    L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

    L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

    L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

    Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum

    FROM L5)

    SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n

    FROM Nums

    ORDER BY rownum;

    For our purposes capitalising each Letter of a word is acceptable.

  • Trying to piece together some test data, some Uppercase letters with some spaces inserted:

    IF OBJECT_ID('tempdb..#ProperTest','U') IS NOT NULL

    DROP TABLE #ProperTest;

    SELECTSTUFF(STUFF(CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),N%16+1,1,' '),N%3+1,1,' ') AS String

    INTO #ProperTest

    FROMdbo.GetNums(1,300000);

    Run the Code

    PRINT 'Old Method Scalar Function'

    SELECTPT.String,

    Utility.f_ProperCase(PT.String) AS Proper

    FROM#ProperTest AS PT;

    PRINT 'Dohsan'

    SELECTPT.String,

    CA1.ProperCaseString

    FROM#ProperTest AS PT

    CROSS

    APPLYdbo.ProperCase(PT.String) AS CA1;

    PRINT 'Luis'

    SELECTPT.String,

    CA1.ProperCased

    FROM#ProperTest AS PT

    CROSS

    APPLYdbo.ProperCase1(PT.String) AS CA1;

    DECLARE @TestVar VARCHAR(16);

    DECLARE @StartTime DATETIME;

    PRINT 'Old Method Scalar Function'

    SELECT @StartTime = GETDATE();

    SELECT@TestVar = Utility.f_ProperCase(PT.String)

    FROM#ProperTest AS PT;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Dohsan'

    SELECT @StartTime = GETDATE();

    SELECT@TestVar = CA1.ProperCaseString

    FROM#ProperTest AS PT

    CROSS

    APPLYdbo.ProperCase(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Luis'

    SELECT @StartTime = GETDATE();

    SELECT@TestVar = CA1.ProperCased

    FROM#ProperTest AS PT

    CROSS

    APPLYdbo.ProperCase1(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    Results:

    Old Method Scalar Function

    47843 MilliSeconds

    Dohsan

    18906 MilliSeconds

    Luis

    4710 MilliSeconds

  • That's nice Dohsan,

    Just make sure my function gets the correct results as it won't check for characters different than spaces when trying to capitalize the first letter of each word.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Yes, I'll have to play around with it a little more when I have some time tomorrow.

    I must say i hadn't thought about using the splitter to separate the words out, so +1 for you and ed wagner for having that idea!

  • Luis, your function is much simpler than mine was working out to be. I was working on creating a derivative on the DelimitedSplit8K function and was most of the way there, but the presence of multiple trailing spaces was giving me fits during testing. Also, the multiple delimiters were definitely impacting performance.

    I like the way you used the unaltered and well-tested function and manipulated the output of it. Mine would have required testing to make sure all bases were covered.

  • Ed, you gave me an idea an here's what I came out with after playing with the Jeff's splitter. Dohsan would be able to change the word delimiters as he considers correct without much problem.

    It's a simple query but can be easily converted in a iTVF.

    DECLARE @pString VARCHAR(255) = 'this is a 3d-printer from wahsington d.c.';

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover NVARCHAR(4000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each word)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) IN(' ', '.', '-', '''')

    )

    SELECT (SELECT CASE WHEN s.N1 IS NULL

    THEN LOWER(SUBSTRING(@pString,t.N,1))

    ELSE UPPER(SUBSTRING(@pString,t.N,1)) END

    FROM cteTally t

    LEFT JOIN cteStart s ON t.N = s.N1

    ORDER BY N

    FOR XML PATH(''),TYPE).value('.', 'varchar(255)') ProperCaseString

    ;

    EDIT: I reduced the datatypes for GREAT performance improvement.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Here's one similar to Luis, but handles detecting the start of a word differently and works on nvarchar(4000) input.

    create function [dbo].[propercase](@text nvarchar(4000))

    returns table with schemabinding

    as

    return (

    with seed1 (a)

    as

    (

    select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all

    select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1

    ),

    numbers (n) as

    (

    select top (datalength(@text)) row_number() over (order by (select null))

    from seed1 s1, seed1 s2, seed1 s3

    )

    select a.b.value('(./text())[1]', 'nvarchar(4000)') as [text]

    from (

    select

    case

    when n = 1 then upper(substring(@text, n, 1))

    when substring(@text, n - 1, 2) like N'[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1))

    else lower(substring(@text, n, 1))

    end

    from numbers

    for xml path (''), type

    ) a (b)

    )

    On the small test strings it performs about 50% slower than Luis original one, but that is just because of the NVARCHAR(4000) compatibility.

    Below is a varchar(255) version that is comparable to Luis, but with the extra word start checks.

    create function [dbo].[propercase](@text varchar(255))

    returns table with schemabinding

    as

    return (

    with seed1 (a)

    as

    (

    select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1

    ),

    numbers (n) as

    (

    select top (datalength(@text)) row_number() over (order by (select null))

    from seed1 s1, seed1 s2, seed1 s3

    )

    select a.b.value('(./text())[1]', 'varchar(255)') as [text]

    from (

    select

    case

    when n = 1 then upper(substring(@text, n, 1))

    when substring(@text, n - 1, 2) like '[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1))

    else lower(substring(@text, n, 1))

    end

    from numbers

    for xml path (''), type

    ) a (b)

    )

    EDIT: modified to cope with Accénted characters...Damn you Luis! I wanted to sleep:-D

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I just love when people continue to post to get better solutions.

    Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (1/16/2014)


    I just love when people continue to post to get better solutions.

    Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.

    Thanks Luis, I agree there are compromises, as mine will handle O'Brien as a name, where others posted so far don't. I agree though that accented characters should probably be handled - even in English - we are a very multicultural society these days and it's not uncommon to get almost any name possible. It's just a bit late now for me to be bothered :w00t:

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Perhaps something like this might help?

    CREATE FUNCTION [dbo].[ProperCase]

    (

    @MyStr VARCHAR(8000) = NULL

    )

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    SELECT ProperCaseStr=

    (

    SELECT CASE WHEN [Matched] = 1 THEN STUFF(Item, 1, 1, UPPER(LEFT(Item, 1))) ELSE Item END

    FROM dbo.PatternSplitCM(@MyStr, '[a-zA-Z]') b

    ORDER BY ItemNumber

    FOR XML PATH(''), TYPE

    ).value('.', 'varchar(8000)')

    ;

    My ProperCase iTVF uses PatternSplitCM which can be found in the 4th article in my signature links.

    WITH SampleData (MyStr) AS

    (

    SELECT 'The cat in the hat'

    UNION ALL SELECT 'the mouse is in the parlour. but the dog is out to lunch'

    )

    SELECT MyStr, ProperCaseStr

    FROM SampleData

    CROSS APPLY dbo.ProperCase(MyStr);

    You'd need to verify of course that it handles all of the cases you want.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Viewing 15 posts - 1 through 15 (of 30 total)

    You must be logged in to reply to this topic. Login to reply