SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


ProperCase Function


ProperCase Function

Author
Message
Luis Cazares
Luis Cazares
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16230 Visits: 19059
Ed, you gave me an idea an here's what I came out with after playing with the Jeff's splitter. Dohsan would be able to change the word delimiters as he considers correct without much problem.

It's a simple query but can be easily converted in a iTVF.


DECLARE @pString VARCHAR(255) = 'this is a 3d-printer from wahsington d.c.';

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each word)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) IN(' ', '.', '-', '''')
)
SELECT (SELECT CASE WHEN s.N1 IS NULL
THEN LOWER(SUBSTRING(@pString,t.N,1))
ELSE UPPER(SUBSTRING(@pString,t.N,1)) END
FROM cteTally t
LEFT JOIN cteStart s ON t.N = s.N1
ORDER BY N
FOR XML PATH(''),TYPE).value('.', 'varchar(255)') ProperCaseString
;



EDIT: I reduced the datatypes for GREAT performance improvement.


Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?


How to post data/code on a forum to get the best help: Option 1 / Option 2
mister.magoo
mister.magoo
SSCarpal Tunnel
SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)

Group: General Forum Members
Points: 4082 Visits: 7865
Here's one similar to Luis, but handles detecting the start of a word differently and works on nvarchar(4000) input.

create function [dbo].[propercase](@text nvarchar(4000))
returns table with schemabinding
as
return (
with seed1 (a)
as
(
select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1
),
numbers (n) as
(
select top (datalength(@text)) row_number() over (order by (select null))
from seed1 s1, seed1 s2, seed1 s3
)
select a.b.value('(./text())[1]', 'nvarchar(4000)') as [text]
from (
select
case
when n = 1 then upper(substring(@text, n, 1))
when substring(@text, n - 1, 2) like N'[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1))
else lower(substring(@text, n, 1))
end
from numbers
for xml path (''), type
) a (b)
)



On the small test strings it performs about 50% slower than Luis original one, but that is just because of the NVARCHAR(4000) compatibility.

Below is a varchar(255) version that is comparable to Luis, but with the extra word start checks.


create function [dbo].[propercase](@text varchar(255))
returns table with schemabinding
as
return (
with seed1 (a)
as
(
select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1
),
numbers (n) as
(
select top (datalength(@text)) row_number() over (order by (select null))
from seed1 s1, seed1 s2, seed1 s3
)
select a.b.value('(./text())[1]', 'varchar(255)') as [text]
from (
select
case
when n = 1 then upper(substring(@text, n, 1))
when substring(@text, n - 1, 2) like '[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1))
else lower(substring(@text, n, 1))
end
from numbers
for xml path (''), type
) a (b)
)



EDIT: modified to cope with Accénted characters...Damn you Luis! I wanted to sleep:-D

MM


select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);




  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Luis Cazares
    Luis Cazares
    SSCoach
    SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

    Group: General Forum Members
    Points: 16230 Visits: 19059
    I just love when people continue to post to get better solutions.
    Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.


    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?


    How to post data/code on a forum to get the best help: Option 1 / Option 2
    mister.magoo
    mister.magoo
    SSCarpal Tunnel
    SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)

    Group: General Forum Members
    Points: 4082 Visits: 7865
    Luis Cazares (1/16/2014)
    I just love when people continue to post to get better solutions.
    Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.


    Thanks Luis, I agree there are compromises, as mine will handle O'Brien as a name, where others posted so far don't. I agree though that accented characters should probably be handled - even in English - we are a very multicultural society these days and it's not uncommon to get almost any name possible. It's just a bit late now for me to be bothered w00t

    MM


    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);




  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • dwain.c
    dwain.c
    SSCertifiable
    SSCertifiable (7.2K reputation)SSCertifiable (7.2K reputation)SSCertifiable (7.2K reputation)SSCertifiable (7.2K reputation)SSCertifiable (7.2K reputation)SSCertifiable (7.2K reputation)SSCertifiable (7.2K reputation)SSCertifiable (7.2K reputation)

    Group: General Forum Members
    Points: 7205 Visits: 6431
    Perhaps something like this might help?


    CREATE FUNCTION [dbo].[ProperCase]
    (
    @MyStr VARCHAR(8000) = NULL
    )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN

    SELECT ProperCaseStr=
    (
    SELECT CASE WHEN [Matched] = 1 THEN STUFF(Item, 1, 1, UPPER(LEFT(Item, 1))) ELSE Item END
    FROM dbo.PatternSplitCM(@MyStr, '[a-zA-Z]') b
    ORDER BY ItemNumber
    FOR XML PATH(''), TYPE
    ).value('.', 'varchar(8000)')
    ;




    My ProperCase iTVF uses PatternSplitCM which can be found in the 4th article in my signature links.


    WITH SampleData (MyStr) AS
    (
    SELECT 'The cat in the hat'
    UNION ALL SELECT 'the mouse is in the parlour. but the dog is out to lunch'
    )
    SELECT MyStr, ProperCaseStr
    FROM SampleData
    CROSS APPLY dbo.ProperCase(MyStr);




    You'd need to verify of course that it handles all of the cases you want.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
    Since random numbers are too important to be left to chance, let's generate some!
    Learn to understand recursive CTEs by example.
    Splitting strings based on patterns can be fast!
    My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
    Dohsan
    Dohsan
    Mr or Mrs. 500
    Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)

    Group: General Forum Members
    Points: 578 Visits: 3434
    Some excellent work


    DECLARE @TestVar VARCHAR(16);
    DECLARE @StartTime DATETIME;

    PRINT 'Old Method Scalar Function'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = Utility.f_ProperCase(PT.String)
    FROM #ProperTest AS PT;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Dohsan'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.ProperCaseString
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Luis'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.ProperCased
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase1(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Dwain'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.ProperCaseStr
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase2(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Magoo varchar255'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.[text]
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase3(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';


    PRINT 'Luis V2'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.ProperCaseString
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase4(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';



    Here as the results:

    300k

    Old Method Scalar Function
    47306 MilliSeconds
    Dohsan
    19113 MilliSeconds
    Luis
    5946 MilliSeconds
    Dwain
    54486 MilliSeconds
    Magoo varchar255
    5440 MilliSeconds
    Luis V2
    6433 MilliSeconds

    1million

    Old Method Scalar Function
    160710 MilliSeconds
    Dohsan
    63750 MilliSeconds
    Luis
    15450 MilliSeconds
    Dwain
    175366 MilliSeconds
    Magoo varchar255
    15513 MilliSeconds
    Luis V2
    17843 MilliSeconds



    How do they handle some special cases?


    DECLARE @SingleTestString VARCHAR(255) = 'ándre, ándre, luís and o''brien had a low-budget 3d printer in washington d.c.',
    @TestVar2 VARCHAR(255);

    PRINT 'Original String'
    PRINT @SingleTestString;

    PRINT 'Old Method Scalar Function'
    SELECT @TestVar2 = Utility.f_ProperCase(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Dohsan'
    SELECT @TestVar2 = ProperCaseString FROM dbo.ProperCase(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Luis'
    SELECT @TestVar2 = ProperCased FROM dbo.ProperCase1(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Dwain'
    SELECT @TestVar2 = ProperCaseStr FROM dbo.ProperCase2(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Magoo varchar255'
    SELECT @TestVar2 = [text] FROM dbo.ProperCase3(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Luis V2'
    SELECT @TestVar2 = ProperCaseString FROM dbo.ProperCase4(@SingleTestString);
    PRINT @TestVar2



    Results:

    Original String
    ándre, ándre, luís and o'brien had a low-budget 3d printer in washington d.c.
    Old Method Scalar Function
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.c.
    Dohsan
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.c.
    Luis
    Ándre, Ándre, Luís And O'brien Had A Low-budget 3d Printer In Washington D.c.
    Dwain
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3D Printer In Washington D.C.
    Magoo varchar255
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3D Printer In Washington D.C.
    Luis V2
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.C.

    Dohsan
    Dohsan
    Mr or Mrs. 500
    Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)Mr or Mrs. 500 (578 reputation)

    Group: General Forum Members
    Points: 578 Visits: 3434
    Original Test strings were small (16 chars), the original function was only for 255 although a few of the functions presented have been able to handle far more. So upping it to nearer the 255 limit to see how they cope (so all functions can be run)


    IF OBJECT_ID('tempdb..#ProperTest','U') IS NOT NULL
    DROP TABLE #ProperTest;

    SELECT REPLICATE(STUFF(STUFF(CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),N%16+1,1,' '),N%3+1,1,' '),14) AS String
    INTO #ProperTest
    FROM dbo.GetNums(1,300000);



    Results:



    Old Method Scalar Function
    617393 MilliSeconds
    Dohsan
    162690 MilliSeconds
    Luis
    36680 MilliSeconds
    Dwain
    424590 MilliSeconds
    Magoo varchar255
    44063 MilliSeconds
    Luis V2
    49436 MilliSeconds

    Ed Wagner
    Ed Wagner
    SSCoach
    SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

    Group: General Forum Members
    Points: 16429 Visits: 10050
    I ran some similar comparisons, but I had run them up to handle varchar(8000) and ran against a million-row test table with the strings filled to capacity. They were running pretty close to each other (with MM being the best at ~28 seconds), but I haven't tested everything with special characters or multiple trailing spaces. Dropping down to a max length of 255 definitely improved performance (dropping the best to ~22 seconds), but I hadn't completed testing last night before I needed to get some sleep. :-P

    All in all, I think the work done on this thread has been excellent.


    Tally Tables - Performance Personified
    String Splitting with True Performance
    Best practices on how to ask questions
    mister.magoo
    mister.magoo
    SSCarpal Tunnel
    SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)

    Group: General Forum Members
    Points: 4082 Visits: 7865
    That is some excellent work on the testing there Dohsan - well done!:-)

    MM


    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);




  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Jeff Moden
    Jeff Moden
    SSC Guru
    SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)

    Group: General Forum Members
    Points: 84695 Visits: 41069
    BWAAA-HAAA!!!! Dare I say IT? DARE I? Oh, what the heck...

    "Ahhhhhhh MAGOO! You've done it again." :-P (Sorry ol' friend... I've been holding it back for more than a month and couldn't resist).

    --Jeff Moden

    RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

    Helpful Links:
    How to post code problems
    How to post performance problems
    Forum FAQs
    Go


    Permissions

    You can't post new topics.
    You can't post topic replies.
    You can't post new polls.
    You can't post replies to polls.
    You can't edit your own topics.
    You can't delete your own topics.
    You can't edit other topics.
    You can't delete other topics.
    You can't edit your own posts.
    You can't edit other posts.
    You can't delete your own posts.
    You can't delete other posts.
    You can't post events.
    You can't edit your own events.
    You can't edit other events.
    You can't delete your own events.
    You can't delete other events.
    You can't send private messages.
    You can't send emails.
    You can read topics.
    You can't vote in polls.
    You can't upload attachments.
    You can download attachments.
    You can't post HTML code.
    You can't edit HTML code.
    You can't post IFCode.
    You can't post JavaScript.
    You can post emoticons.
    You can't post or upload images.

    Select a forum

































































































































































    SQLServerCentral


    Search