Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««1234»»»

ProperCase Function Expand / Collapse
Author
Message
Posted Thursday, January 16, 2014 2:43 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 10:43 AM
Points: 3,477, Visits: 7,518
Ed, you gave me an idea an here's what I came out with after playing with the Jeff's splitter. Dohsan would be able to change the word delimiters as he considers correct without much problem.

It's a simple query but can be easily converted in a iTVF.

DECLARE @pString VARCHAR(255) = 'this   is a 3d-printer from wahsington d.c.';

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover NVARCHAR(4000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each word)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) IN(' ', '.', '-', '''')
)
SELECT (SELECT CASE WHEN s.N1 IS NULL
THEN LOWER(SUBSTRING(@pString,t.N,1))
ELSE UPPER(SUBSTRING(@pString,t.N,1)) END
FROM cteTally t
LEFT JOIN cteStart s ON t.N = s.N1
ORDER BY N
FOR XML PATH(''),TYPE).value('.', 'varchar(255)') ProperCaseString
;

EDIT: I reduced the datatypes for GREAT performance improvement.



Luis C.
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1531822
Posted Thursday, January 16, 2014 4:59 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: 2 days ago @ 3:40 PM
Points: 1,787, Visits: 5,692
Here's one similar to Luis, but handles detecting the start of a word differently and works on nvarchar(4000) input.

create function [dbo].[propercase](@text nvarchar(4000))
returns table with schemabinding
as
return (
with seed1 (a)
as
(
select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1
),
numbers (n) as
(
select top (datalength(@text)) row_number() over (order by (select null))
from seed1 s1, seed1 s2, seed1 s3
)
select a.b.value('(./text())[1]', 'nvarchar(4000)') as [text]
from (
select
case
when n = 1 then upper(substring(@text, n, 1))
when substring(@text, n - 1, 2) like N'[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1))
else lower(substring(@text, n, 1))
end
from numbers
for xml path (''), type
) a (b)
)

On the small test strings it performs about 50% slower than Luis original one, but that is just because of the NVARCHAR(4000) compatibility.

Below is a varchar(255) version that is comparable to Luis, but with the extra word start checks.

create function [dbo].[propercase](@text varchar(255))
returns table with schemabinding
as
return (
with seed1 (a)
as
(
select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1
),
numbers (n) as
(
select top (datalength(@text)) row_number() over (order by (select null))
from seed1 s1, seed1 s2, seed1 s3
)
select a.b.value('(./text())[1]', 'varchar(255)') as [text]
from (
select
case
when n = 1 then upper(substring(@text, n, 1))
when substring(@text, n - 1, 2) like '[^a-z][a-z]' collate Latin1_General_CI_AI then upper(substring(@text, n, 1))
else lower(substring(@text, n, 1))
end
from numbers
for xml path (''), type
) a (b)
)

EDIT: modified to cope with Accénted characters...Damn you Luis! I wanted to sleep


MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1531866
    Posted Thursday, January 16, 2014 5:30 PM


    Hall of Fame

    Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

    Group: General Forum Members
    Last Login: Today @ 10:43 AM
    Points: 3,477, Visits: 7,518
    I just love when people continue to post to get better solutions.
    Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.



    Luis C.
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    Forum Etiquette: How to post data/code on a forum to get the best help
    Post #1531873
    Posted Thursday, January 16, 2014 5:52 PM


    SSCommitted

    SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

    Group: General Forum Members
    Last Login: 2 days ago @ 3:40 PM
    Points: 1,787, Visits: 5,692
    Luis Cazares (1/16/2014)
    I just love when people continue to post to get better solutions.
    Mr. Magoo, it's really nice how you handle the start of a word. However, it might generate problems with accentuated words (which are uncommon in English but might occur with words "imported" from other languages). It seems that changing the collation to Latin1_General_CI_AI can solve the problem but might reduce the performance.


    Thanks Luis, I agree there are compromises, as mine will handle O'Brien as a name, where others posted so far don't. I agree though that accented characters should probably be handled - even in English - we are a very multicultural society these days and it's not uncommon to get almost any name possible. It's just a bit late now for me to be bothered


    MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1531879
    Posted Thursday, January 16, 2014 11:44 PM


    Hall of Fame

    Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

    Group: General Forum Members
    Last Login: Today @ 4:14 AM
    Points: 3,618, Visits: 5,254
    Perhaps something like this might help?

    CREATE FUNCTION [dbo].[ProperCase] 
    (
    @MyStr VARCHAR(8000) = NULL
    )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN

    SELECT ProperCaseStr=
    (
    SELECT CASE WHEN [Matched] = 1 THEN STUFF(Item, 1, 1, UPPER(LEFT(Item, 1))) ELSE Item END
    FROM dbo.PatternSplitCM(@MyStr, '[a-zA-Z]') b
    ORDER BY ItemNumber
    FOR XML PATH(''), TYPE
    ).value('.', 'varchar(8000)')
    ;


    My ProperCase iTVF uses PatternSplitCM which can be found in the 4th article in my signature links.

    WITH SampleData (MyStr) AS
    (
    SELECT 'The cat in the hat'
    UNION ALL SELECT 'the mouse is in the parlour. but the dog is out to lunch'
    )
    SELECT MyStr, ProperCaseStr
    FROM SampleData
    CROSS APPLY dbo.ProperCase(MyStr);


    You'd need to verify of course that it handles all of the cases you want.



    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
    Since random numbers are too important to be left to chance, let's generate some!
    Learn to understand recursive CTEs by example.
    Splitting strings based on patterns can be fast!
    Post #1531931
    Posted Friday, January 17, 2014 3:35 AM


    Old Hand

    Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

    Group: General Forum Members
    Last Login: Today @ 8:22 AM
    Points: 354, Visits: 2,563
    Some excellent work

    DECLARE @TestVar VARCHAR(16);
    DECLARE @StartTime DATETIME;

    PRINT 'Old Method Scalar Function'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = Utility.f_ProperCase(PT.String)
    FROM #ProperTest AS PT;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Dohsan'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.ProperCaseString
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Luis'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.ProperCased
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase1(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Dwain'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.ProperCaseStr
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase2(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    PRINT 'Magoo varchar255'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.[text]
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase3(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';


    PRINT 'Luis V2'
    SELECT @StartTime = GETDATE();

    SELECT @TestVar = CA1.ProperCaseString
    FROM #ProperTest AS PT
    CROSS
    APPLY dbo.ProperCase4(PT.String) AS CA1;

    PRINT CAST(DATEDIFF(ms,@StartTime,GETDATE()) AS VARCHAR) + ' MilliSeconds';

    Here as the results:
    300k

    Old Method Scalar Function
    47306 MilliSeconds
    Dohsan
    19113 MilliSeconds
    Luis
    5946 MilliSeconds
    Dwain
    54486 MilliSeconds
    Magoo varchar255
    5440 MilliSeconds
    Luis V2
    6433 MilliSeconds

    1million

    Old Method Scalar Function
    160710 MilliSeconds
    Dohsan
    63750 MilliSeconds
    Luis
    15450 MilliSeconds
    Dwain
    175366 MilliSeconds
    Magoo varchar255
    15513 MilliSeconds
    Luis V2
    17843 MilliSeconds



    How do they handle some special cases?

    DECLARE @SingleTestString VARCHAR(255) = 'ándre, ándre, luís and o''brien had a low-budget 3d printer in washington d.c.',
    @TestVar2 VARCHAR(255);

    PRINT 'Original String'
    PRINT @SingleTestString;

    PRINT 'Old Method Scalar Function'
    SELECT @TestVar2 = Utility.f_ProperCase(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Dohsan'
    SELECT @TestVar2 = ProperCaseString FROM dbo.ProperCase(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Luis'
    SELECT @TestVar2 = ProperCased FROM dbo.ProperCase1(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Dwain'
    SELECT @TestVar2 = ProperCaseStr FROM dbo.ProperCase2(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Magoo varchar255'
    SELECT @TestVar2 = [text] FROM dbo.ProperCase3(@SingleTestString);
    PRINT @TestVar2
    PRINT 'Luis V2'
    SELECT @TestVar2 = ProperCaseString FROM dbo.ProperCase4(@SingleTestString);
    PRINT @TestVar2

    Results:
    Original String
    ándre, ándre, luís and o'brien had a low-budget 3d printer in washington d.c.
    Old Method Scalar Function
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.c.
    Dohsan
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.c.
    Luis
    Ándre, Ándre, Luís And O'brien Had A Low-budget 3d Printer In Washington D.c.
    Dwain
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3D Printer In Washington D.C.
    Magoo varchar255
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3D Printer In Washington D.C.
    Luis V2
    Ándre, Ándre, Luís And O'Brien Had A Low-Budget 3d Printer In Washington D.C.
    Post #1531983
    Posted Friday, January 17, 2014 3:55 AM


    Old Hand

    Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

    Group: General Forum Members
    Last Login: Today @ 8:22 AM
    Points: 354, Visits: 2,563
    Original Test strings were small (16 chars), the original function was only for 255 although a few of the functions presented have been able to handle far more. So upping it to nearer the 255 limit to see how they cope (so all functions can be run)

    IF OBJECT_ID('tempdb..#ProperTest','U') IS NOT NULL
    DROP TABLE #ProperTest;

    SELECT REPLICATE(STUFF(STUFF(CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),N%16+1,1,' '),N%3+1,1,' '),14) AS String
    INTO #ProperTest
    FROM dbo.GetNums(1,300000);

    Results:


    Old Method Scalar Function
    617393 MilliSeconds
    Dohsan
    162690 MilliSeconds
    Luis
    36680 MilliSeconds
    Dwain
    424590 MilliSeconds
    Magoo varchar255
    44063 MilliSeconds
    Luis V2
    49436 MilliSeconds
    Post #1531989
    Posted Friday, January 17, 2014 5:43 AM
    SSCarpal Tunnel

    SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

    Group: General Forum Members
    Last Login: Today @ 10:50 AM
    Points: 4,139, Visits: 3,169
    I ran some similar comparisons, but I had run them up to handle varchar(8000) and ran against a million-row test table with the strings filled to capacity. They were running pretty close to each other (with MM being the best at ~28 seconds), but I haven't tested everything with special characters or multiple trailing spaces. Dropping down to a max length of 255 definitely improved performance (dropping the best to ~22 seconds), but I hadn't completed testing last night before I needed to get some sleep.

    All in all, I think the work done on this thread has been excellent.



    Tally Tables - Performance Personified
    String Splitting with True Performance
    Best practices on how to ask questions
    Post #1532010
    Posted Friday, January 17, 2014 8:06 AM


    SSCommitted

    SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

    Group: General Forum Members
    Last Login: 2 days ago @ 3:40 PM
    Points: 1,787, Visits: 5,692
    That is some excellent work on the testing there Dohsan - well done!

    MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1532074
    Posted Friday, January 17, 2014 3:52 PM


    SSC-Dedicated

    SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

    Group: General Forum Members
    Last Login: Today @ 9:09 AM
    Points: 36,941, Visits: 31,443
    BWAAA-HAAA!!!! Dare I say IT? DARE I? Oh, what the heck...

    "Ahhhhhhh MAGOO! You've done it again." (Sorry ol' friend... I've been holding it back for more than a month and couldn't resist).


    --Jeff Moden
    "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

    (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

    Helpful Links:
    How to post code problems
    How to post performance problems
    Post #1532320
    « Prev Topic | Next Topic »

    Add to briefcase ««1234»»»

    Permissions Expand / Collapse