This is still a work-in-progress and hasn't been tested thoroughly, though it seems to work pretty well. I've been planning to write an article on the proper-casing problem and this would be the core function.
The function is an inline table-valued function so it can be cross applied to any database column that needs proper-casing applied. I've added what I think are some unique features that aren't documented at this stage. I'll try to explain the parameters briefly, then you guys can play around with it, make suggestions, and/or shoot it full of holes!
Since different situations require different proper-casing rules there are several options. Also, specific exceptions can be entered as a pipe-delimited list of [str1~str1replacement|str2~str2replacement|...]. So specifically here in the OPs example he has an address with "P.O. Box". When "proper-cased" the function would normally return "P.o. Box" so a strVariation of 'P.O.~P.O.' will replace the value with the one chosen. This can also be used if, for example, you want iPod to stay iPod and not Ipod. Just set strVariation to whatever variations are likely to occur such as 'iPod~iPod|IPOD~iPod|ipod~iPod'
The parameters:
-- @ID [for future use; leave it as NULL for now)
-- @InputString NVARCHAR(4000) [the string to be parsed]
-- @strVariations VARCHAR(8000) [put in exeptions here; examples below]
-- @strType VARCHAR(50) [proper case "rules" to follow.
-- strTypes: NAME, ADDRESS, SENTENCE, PARAGRAPH, BULLETLIST, LISTITEM
-- and LOWER (lower-case everything) or UPPER (upper-case everything)
SELECT * FROM [dbo].[itvfTextToProperCase]
(
@ID INT
,@InputString NVARCHAR(4000)
,@strVariations VARCHAR(8000)
,@strType VARCHAR(50)
)
Note that these examples seem to work even with embedded double-dashes. For
safety, I'd replace any double-dashes with the em dash (—) before
using it as input or risk losing part of the string.
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'Associate/relief Manager','','NAME')
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'9529 Fredericksburg Rd Apt—9529','RD~Rd','ADDRESS')
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'PO Box 2153','','ADDRESS')
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'P.O. Box 2153','P.O.~P.O.','ADDRESS')
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'I love my ipod',''iPod~iPod|IPOD~iPod|ipod~iPod' ','SENTENCE')
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'i love my ipod.','iPod~iPod|IPOD~iPod|ipod~iPod','SENTENCE')
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'I love my ipod.','iPod~iPod|IPOD~iPod|ipod~iPod','PARAGRAPH')
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'i love my ipod','iPod~iPod|IPOD~iPod|ipod~iPod','UPPER')
SELECT * FROM [dbo].[itvfTextToProperCase](NULL,'I LOVE MY IPOD','iPod~iPod|IPOD~iPod|ipod~iPod','LOWER')
The function:
CREATE FUNCTION [dbo].[itvfTextToProperCase]
(
@ID INT
,@InputString NVARCHAR(4000)
,@strVariations VARCHAR(8000)
,@strType VARCHAR(50)
)
RETURNS TABLE
AS
RETURN
(
WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@InputString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@InputString,t.N,1) = '|' OR t.N = 0)
)
,InputString (ID,FormattedOutput)
AS
(SELECT
ISNULL(@ID,0)
,(SELECT
(ISNULL(REPLACE(REPLACE(
(CASE
WHEN @strType = 'NAME' THEN
CASE
WHEN N = 1 OR NULLIF(SUBSTRING(@InputString,N-1,1),'') IS NULL
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN LOWER(SUBSTRING(@InputString,N-1,2)) NOT LIKE '%[!AEIOULMYaeioulmy][AEIOUCaeiouc]%'
AND SUBSTRING(@InputString,N-2,1) = ' '
AND SUBSTRING(@InputString,N+1,1) = ' '
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN SUBSTRING(@InputString,N,1) IN (' ',';',':','!','?',',','.','/','&','''','','(',CHAR(9))
THEN LOWER(SUBSTRING(@InputString,N,1))
WHEN LOWER(SUBSTRING(@InputString,N-2,2)) IN ('mc') --optional addition: ,'de' or any 2-letter prefix
AND SUBSTRING(@InputString,N-3,1) = ' '
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN LOWER(SUBSTRING(@InputString,N-3,3)) IN ('mac','. ','? ','! ') --optional addition: ,'von' or any 3-letter prefix
AND SUBSTRING(@InputString,N-4,1) = ' '
AND SUBSTRING(@InputString,N+1,1) <> ' '
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN SUBSTRING(@InputString,N-1,1) IN ('''','')
AND (UPPER(SUBSTRING(@InputString,N,1))) IN ('S')
THEN LOWER(SUBSTRING(@InputString,N,1))
WHEN SUBSTRING(@InputString,N-1,1) IN ('''','','-','/')
AND (UPPER(SUBSTRING(@InputString,N,1))) NOT IN ('S')
THEN UPPER(SUBSTRING(@InputString,N,1))
ELSE
LOWER(SUBSTRING(@InputString,N,1))
END
WHEN @strType = 'ADDRESS' THEN
CASE
WHEN N = 1 OR NULLIF(SUBSTRING(@InputString,N-1,1),'') IS NULL
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN LOWER(SUBSTRING(@InputString,N-1,2)) NOT LIKE '%[!AEIOULMYaeioulmy][AEIOUCaeiouc]%'
AND SUBSTRING(@InputString,N-2,1) = ' '
AND SUBSTRING(@InputString,N+1,1) = ' '
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN SUBSTRING(@InputString,N,1) IN (' ',';',':','!','?',',','.','/','&','''','','(',CHAR(9))
THEN LOWER(SUBSTRING(@InputString,N,1))
WHEN LOWER(SUBSTRING(@InputString,N-2,2)) IN ('mc') --optional addition: ,'de' or any 2-letter prefix
AND SUBSTRING(@InputString,N-3,1) = ' '
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN LOWER(SUBSTRING(@InputString,N-3,3)) IN ('mac','. ','? ','! ') --optional addition: ,'von' or any 3-letter prefix
AND SUBSTRING(@InputString,N-4,1) = ' '
AND SUBSTRING(@InputString,N+1,1) <> ' '
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN UPPER(SUBSTRING(@InputString,N-1,2)) IN ('AB','BC','MB','NB','NL','NS','NT','NV','ON','PE','QC','SK','YT','AK','AL','AR','AZ','CA','CO','CT','DC','DE','FL','GA','HI','IA','ID','IL','IN','KS','KY','LA','MA','MD','ME','MI','MN','MO','MS','MT','NC','ND','NE','NH','NJ','NM','NV','NY','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VA','VT','WA','WI','WV','WY')
AND SUBSTRING(@InputString,N-2,1) = ' '
AND SUBSTRING(@InputString,N+1,1) = ' '
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN UPPER(SUBSTRING(@InputString,N-1,2)) IN ('PO')
AND SUBSTRING(@InputString,N-2,1) = ' '
AND LOWER(SUBSTRING(@InputString,N+1,4)) = ' box'
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN SUBSTRING(@InputString,N-1,1) IN ('''','')
AND (UPPER(SUBSTRING(@InputString,N,1))) IN ('S')
THEN LOWER(SUBSTRING(@InputString,N,1))
WHEN SUBSTRING(@InputString,N-1,1) IN ('''','','-','/')
AND (UPPER(SUBSTRING(@InputString,N,1))) NOT IN ('S')
THEN UPPER(SUBSTRING(@InputString,N,1))
ELSE
LOWER(SUBSTRING(@InputString,N,1))
END
WHEN @strType = 'SENTENCE'
AND
(N = 1
OR SUBSTRING(@InputString,N-2,2) IN ('. ','? ','! ')
OR SUBSTRING(@InputString,N-3,3) IN ('. ','? ','! '))
THEN
CASE
WHEN SUBSTRING(@InputString,N,1) IN (' ',';',':','!','?',',','.','/','&','''','','(',CHAR(9))
THEN LOWER(SUBSTRING(@InputString,N,1))
ELSE
UPPER(SUBSTRING(@InputString,N,1))
END
WHEN @strType = 'PARAGRAPH'
AND
(N = 1
OR SUBSTRING(@InputString,N-2,2) IN ('. ','? ','! ')
OR SUBSTRING(@InputString,N-3,3) IN ('
','. ','? ','! ')
OR SUBSTRING(@InputString,N-4,4) IN ('
')
OR SUBSTRING(@InputString,N-5,5) IN (' ')
OR SUBSTRING(@InputString,N-6,6) IN (' ')
OR SUBSTRING(@InputString,N-8,8) IN ('
')
OR SUBSTRING(@InputString,N-10,10) IN ('
')
OR SUBSTRING(@InputString,N-12,12) IN ('
'))
THEN
CASE
WHEN SUBSTRING(@InputString,N,1) IN (' ',';',':','!','?',',','.','/','&','''','','(',CHAR(9))
THEN LOWER(SUBSTRING(@InputString,N,1))
ELSE
UPPER(SUBSTRING(@InputString,N,1))
END
WHEN @strType IN ('BULLETLIST','LISTITEM')
AND
(N = 1
OR SUBSTRING(@InputString,N-2,2) IN ('• ')
OR SUBSTRING(@InputString,N-4,4) IN ('<li>')
OR SUBSTRING(@InputString,N-5,5) IN ('<li> ')
OR SUBSTRING(@InputString,N-7,7) IN ('• ')
OR SUBSTRING(@InputString,N-8,8) IN ('• '))
THEN
CASE
WHEN SUBSTRING(@InputString,N,1) IN (' ',';',':','!','?',',','.','/','&','''','','(',CHAR(9))
THEN LOWER(SUBSTRING(@InputString,N,1))
ELSE
UPPER(SUBSTRING(@InputString,N,1))
END
WHEN @strType = 'LOWER'
THEN LOWER(SUBSTRING(@InputString,N,1))
WHEN @strType = 'UPPER'
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN NULLIF(@strType,'') IS NULL THEN
CASE
WHEN N = 1 OR NULLIF(SUBSTRING(@InputString,N-1,1),'') IS NULL
THEN UPPER(SUBSTRING(@InputString,N,1))
WHEN SUBSTRING(@InputString,N,1) IN (' ',';',':','!','?',',','.','/','&','''','','(',CHAR(9))
THEN LOWER(SUBSTRING(@InputString,N,1))
WHEN (SUBSTRING(@InputString,N-1,1) IN ('''','')) AND (UPPER(SUBSTRING(@InputString,N,1))) IN ('S')
THEN LOWER(SUBSTRING(@InputString,N,1))
WHEN (SUBSTRING(@InputString,N-1,1) IN ('''','','-','/')) AND (UPPER(SUBSTRING(@InputString,N,1))) NOT IN ('S')
THEN UPPER(SUBSTRING(@InputString,N,1))
ELSE
LOWER(SUBSTRING(@InputString,N,1))
END
ELSE
LOWER(SUBSTRING(@InputString,N,1))
END)
,' ',' '),CHAR(9),' '),''))
FROM
cteTally
WHERE
N > 0
AND N <= LEN(@InputString)
FOR XML PATH(''))
)
,Exceptions (itemnumber,item1,item2)
AS
(
SELECT
ItemNumber
,(CASE
WHEN NULLIF(@ID,'') IS NULL THEN 0
ELSE Item1
END) AS Item1
,Item2
FROM
(
SELECT
ItemNumber
,Item1
,Item2 = REPLACE(Item2,Item1+'~','')
FROM
(
SELECT
ItemNumber = ROW_NUMBER() OVER (ORDER BY st.N1)
,Item1 = SUBSTRING(@strVariations,st.N1,ISNULL(NULLIF(CHARINDEX('~',@strVariations,st.N1),0)-st.N1,8000))
,Item2 = SUBSTRING(@strVariations,st.N1,ISNULL(NULLIF(CHARINDEX('|',@strVariations,st.N1),0)-st.N1,8000))
FROM cteStart st
) e1
) e
)
SELECT
(CASE
WHEN @strType = 'ADDRESS'
THEN REPLACE(REPLACE(FormattedOutput,' POB ',' PO Box '),' P.O. Box ',' PO Box ')
ELSE
FormattedOutput
END)
AS FormattedOutput
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY show DESC) AS ShowOrder
,REPLACE(
REPLACE(
REPLACE(
REPLACE(
FormattedOutput
,' ',' '+CHAR(7))
,CHAR(7)+' ','')
,CHAR(7),'')
,' ,',',')
AS FormattedOutput
,Show
FROM
(
SELECT
FormattedOutput
,(CASE
WHEN CHARINDEX(r.Item2,r.FormattedOutput) > 0
THEN RowNum
WHEN Item2 IS NULL
THEN 1
ELSE 0
END) AS Show
FROM
(
SELECT DISTINCT
ROW_NUMBER() OVER (ORDER BY e.Item1) AS RowNum
,i.ID
,(CASE
WHEN e.ItemNumber IS NULL
THEN LTRIM(RTRIM(REPLACE(REPLACE(REPLACE(REPLACE(i.FormattedOutput,' ',' '),'<','<'),'>','>'),'&','&')))
ELSE
LTRIM(RTRIM(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(i.FormattedOutput,' ',' '),'<','<'),'>','>'),'&','&'),e.Item2,e.Item2)))
END) AS FormattedOutput
,e.Item1
,(CASE
WHEN CHARINDEX(e.Item2,i.FormattedOutput) = 0
THEN NULL
ELSE e.Item2
END) AS Item2
FROM
InputString i
LEFT OUTER JOIN
Exceptions e
ON i.ID = e.Item1
CROSS APPLY
Exceptions e1
) r
) r1
) r2
WHERE
ShowOrder = 1
)