## ProperCase Function

 Author Message Luis Cazares SSCrazy Eights Group: General Forum Members Points: 8488 Visits: 18099 Ed, you gave me an idea an here's what I came out with after playing with the Jeff's splitter. Dohsan would be able to change the word delimiters as he considers correct without much problem.It's a simple query but can be easily converted in a iTVF.`DECLARE @pString VARCHAR(255) = 'this is a 3d-printer from wahsington d.c.';--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover NVARCHAR(4000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ),cteStart(N1) AS (--==== This returns N+1 (starting position of each word) SELECT 1 UNION ALL SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) IN(' ', '.', '-', '''') )SELECT (SELECT CASE WHEN s.N1 IS NULL THEN LOWER(SUBSTRING(@pString,t.N,1)) ELSE UPPER(SUBSTRING(@pString,t.N,1)) ENDFROM cteTally tLEFT JOIN cteStart s ON t.N = s.N1ORDER BY NFOR XML PATH(''),TYPE).value('.', 'varchar(255)') ProperCaseString;`EDIT: I reduced the datatypes for GREAT performance improvement. Luis C.General Disclaimer:Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?How to post data/code on a forum to get the best help: Option 1 / Option 2 mister.magoo SSCrazy Group: General Forum Members Points: 2266 Visits: 7824 Here's one similar to Luis, but handles detecting the start of a word differently and works on nvarchar(4000) input.`create function [dbo].[propercase](@text nvarchar(4000))returns table with schemabindingasreturn ( with seed1 (a) as ( select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 ), numbers (n) as ( select top (datalength(@text)) row_number() over (order by (select null)) from seed1 s1, seed1 s2, seed1 s3 ) select a.b.value('(./text())[1]', 'nvarchar(4000)') as [text] from ( select case when n = 1 then upper(substring(@text, n, 1)) when substring(@text, n - 1, 2) like N'[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1)) else lower(substring(@text, n, 1)) end from numbers for xml path (''), type ) a (b))`On the small test strings it performs about 50% slower than Luis original one, but that is just because of the NVARCHAR(4000) compatibility.Below is a varchar(255) version that is comparable to Luis, but with the extra word start checks.`create function [dbo].[propercase](@text varchar(255))returns table with schemabindingasreturn ( with seed1 (a) as ( select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 ), numbers (n) as ( select top (datalength(@text)) row_number() over (order by (select null)) from seed1 s1, seed1 s2, seed1 s3 ) select a.b.value('(./text())[1]', 'varchar(255)') as [text] from ( select case when n = 1 then upper(substring(@text, n, 1)) when substring(@text, n - 1, 2) like '[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1)) else lower(substring(@text, n, 1)) end from numbers for xml path (''), type ) a (b))`EDIT: modified to cope with Accénted characters...Damn you Luis! I wanted to sleep:-D MM`select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);`Forum Etiquette: How to post Reporting Services problemsForum Etiquette: How to post data/code on a forum to get the best help - by Jeff ModenHow to Post Performance Problems - by Gail Shaw Luis Cazares SSCrazy Eights Group: General Forum Members Points: 8488 Visits: 18099 I just love when people continue to post to get better solutions. Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance. Luis C.General Disclaimer:Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?How to post data/code on a forum to get the best help: Option 1 / Option 2 mister.magoo SSCrazy Group: General Forum Members Points: 2266 Visits: 7824 Luis Cazares (1/16/2014)I just love when people continue to post to get better solutions. Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.Thanks Luis, I agree there are compromises, as mine will handle O'Brien as a name, where others posted so far don't. I agree though that accented characters should probably be handled - even in English - we are a very multicultural society these days and it's not uncommon to get almost any name possible. It's just a bit late now for me to be bothered MM`select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);`Forum Etiquette: How to post Reporting Services problemsForum Etiquette: How to post data/code on a forum to get the best help - by Jeff ModenHow to Post Performance Problems - by Gail Shaw dwain.c SSCarpal Tunnel Group: General Forum Members Points: 4249 Visits: 6431 Perhaps something like this might help?`CREATE FUNCTION [dbo].[ProperCase] ( @MyStr VARCHAR(8000) = NULL )RETURNS TABLE WITH SCHEMABINDING AS RETURNSELECT ProperCaseStr= ( SELECT CASE WHEN [Matched] = 1 THEN STUFF(Item, 1, 1, UPPER(LEFT(Item, 1))) ELSE Item END FROM dbo.PatternSplitCM(@MyStr, '[a-zA-Z]') b ORDER BY ItemNumber FOR XML PATH(''), TYPE ).value('.', 'varchar(8000)');`My ProperCase iTVF uses PatternSplitCM which can be found in the 4th article in my signature links.`WITH SampleData (MyStr) AS( SELECT 'The cat in the hat' UNION ALL SELECT 'the mouse is in the parlour. but the dog is out to lunch')SELECT MyStr, ProperCaseStrFROM SampleDataCROSS APPLY dbo.ProperCase(MyStr);`You'd need to verify of course that it handles all of the cases you want. My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!My thought question: Have you ever been told that your query runs too fast?My advice:INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.Need to UNPIVOT? Why not CROSS APPLY VALUES instead?Since random numbers are too important to be left to chance, let's generate some!Learn to understand recursive CTEs by example.Splitting strings based on patterns can be fast!My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables Dohsan Mr or Mrs. 500 Group: General Forum Members Points: 510 Visits: 3384 Some excellent work`DECLARE @TestVar VARCHAR(16);DECLARE @StartTime DATETIME; PRINT 'Old Method Scalar Function' SELECT @StartTime = GETDATE();SELECT @TestVar = Utility.f_ProperCase(PT.String)FROM #ProperTest AS PT; PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds'; PRINT 'Dohsan' SELECT @StartTime = GETDATE();SELECT @TestVar = CA1.ProperCaseStringFROM #ProperTest AS PTCROSSAPPLY dbo.ProperCase(PT.String) AS CA1; PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds'; PRINT 'Luis' SELECT @StartTime = GETDATE();SELECT @TestVar = CA1.ProperCasedFROM #ProperTest AS PTCROSSAPPLY dbo.ProperCase1(PT.String) AS CA1; PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds'; PRINT 'Dwain' SELECT @StartTime = GETDATE();SELECT @TestVar = CA1.ProperCaseStrFROM #ProperTest AS PTCROSSAPPLY dbo.ProperCase2(PT.String) AS CA1; PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds'; PRINT 'Magoo varchar255' SELECT @StartTime = GETDATE();SELECT @TestVar = CA1.[text]FROM #ProperTest AS PTCROSSAPPLY dbo.ProperCase3(PT.String) AS CA1; PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds'; PRINT 'Luis V2' SELECT @StartTime = GETDATE();SELECT @TestVar = CA1.ProperCaseStringFROM #ProperTest AS PTCROSSAPPLY dbo.ProperCase4(PT.String) AS CA1; PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';`Here as the results:`300kOld Method Scalar Function47306 MilliSecondsDohsan19113 MilliSecondsLuis5946 MilliSecondsDwain54486 MilliSecondsMagoo varchar2555440 MilliSecondsLuis V26433 MilliSeconds1millionOld Method Scalar Function160710 MilliSecondsDohsan63750 MilliSecondsLuis15450 MilliSecondsDwain175366 MilliSecondsMagoo varchar25515513 MilliSecondsLuis V217843 MilliSeconds`How do they handle some special cases?`DECLARE @SingleTestString VARCHAR(255) = 'ándre, ándre, luís and o''brien had a low-budget 3d printer in washington d.c.', @TestVar2 VARCHAR(255); PRINT 'Original String' PRINT @SingleTestString; PRINT 'Old Method Scalar Function' SELECT @TestVar2 = Utility.f_ProperCase(@SingleTestString); PRINT @TestVar2 PRINT 'Dohsan' SELECT @TestVar2 = ProperCaseString FROM dbo.ProperCase(@SingleTestString); PRINT @TestVar2 PRINT 'Luis' SELECT @TestVar2 = ProperCased FROM dbo.ProperCase1(@SingleTestString); PRINT @TestVar2 PRINT 'Dwain' SELECT @TestVar2 = ProperCaseStr FROM dbo.ProperCase2(@SingleTestString); PRINT @TestVar2 PRINT 'Magoo varchar255' SELECT @TestVar2 = [text] FROM dbo.ProperCase3(@SingleTestString); PRINT @TestVar2 PRINT 'Luis V2' SELECT @TestVar2 = ProperCaseString FROM dbo.ProperCase4(@SingleTestString); PRINT @TestVar2`Results:`Original Stringándre, ándre, luís and o'brien had a low-budget 3d printer in washington d.c.Old Method Scalar FunctionÁndre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.c.DohsanÁndre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.c.LuisÁndre, Ándre, Luís And O'brien Had A Low-budget 3d Printer In Washington D.c.DwainÁndre, Ándre, Luís And O'Brien Had A Low-Budget 3D Printer In Washington D.C.Magoo varchar255Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3D Printer In Washington D.C.Luis V2Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.C.` Dohsan Mr or Mrs. 500 Group: General Forum Members Points: 510 Visits: 3384 Original Test strings were small (16 chars), the original function was only for 255 although a few of the functions presented have been able to handle far more. So upping it to nearer the 255 limit to see how they cope (so all functions can be run)`IF OBJECT_ID('tempdb..#ProperTest','U') IS NOT NULLDROP TABLE #ProperTest;SELECT REPLICATE(STUFF(STUFF(CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65),N%16+1,1,' '),N%3+1,1,' '),14) AS StringINTO #ProperTestFROM dbo.GetNums(1,300000);`Results:`Old Method Scalar Function617393 MilliSecondsDohsan162690 MilliSecondsLuis36680 MilliSecondsDwain424590 MilliSecondsMagoo varchar25544063 MilliSecondsLuis V249436 MilliSeconds` Ed Wagner SSChampion Group: General Forum Members Points: 10220 Visits: 9551 I ran some similar comparisons, but I had run them up to handle varchar(8000) and ran against a million-row test table with the strings filled to capacity. They were running pretty close to each other (with MM being the best at ~28 seconds), but I haven't tested everything with special characters or multiple trailing spaces. Dropping down to a max length of 255 definitely improved performance (dropping the best to ~22 seconds), but I hadn't completed testing last night before I needed to get some sleep. :-PAll in all, I think the work done on this thread has been excellent. Tally Tables - Performance PersonifiedString Splitting with True PerformanceBest practices on how to ask questions mister.magoo SSCrazy Group: General Forum Members Points: 2266 Visits: 7824 That is some excellent work on the testing there Dohsan - well done!:-) MM`select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);`Forum Etiquette: How to post Reporting Services problemsForum Etiquette: How to post data/code on a forum to get the best help - by Jeff ModenHow to Post Performance Problems - by Gail Shaw Jeff Moden SSC-Forever Group: General Forum Members Points: 44996 Visits: 39880 BWAAA-HAAA!!!! Dare I say IT? DARE I? Oh, what the heck..."Ahhhhhhh MAGOO! You've done it again." :-P (Sorry ol' friend... I've been holding it back for more than a month and couldn't resist). --Jeff ModenRBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.Although change is inevitable, change for the better is not.Just because you can do something in PowerShell, doesnt mean you should. Helpful Links:How to post code problemsHow to post performance problemsForum FAQs