SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Avoid characters that viloate UTF-8 Encoding


Avoid characters that viloate UTF-8 Encoding

Author
Message
Khades
Khades
Old Hand
Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)

Group: General Forum Members
Points: 385 Visits: 390
I'm storing information in an XML file that is UTF-8.

I have a stored procedure I use to query some tables and structure the data in XML format, which I then save into an XML file with UTF-8.

I noticed that there are some characters that make it into this XML file, that violate the UTF-8 encoding. How do I avoid that?
Khades
Khades
Old Hand
Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)

Group: General Forum Members
Points: 385 Visits: 390
Looks like this might be in the wrong section, how do I move it to the XML secion?
Lowell
Lowell
SSC Guru
SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)

Group: General Forum Members
Points: 72917 Visits: 40959
ok, i just googled "Non UTF-8 Characters", and the only thing i can find is when something got malformed; otherwise, It looks like UTF covers teh whole range of possible characters.

http://stackoverflow.com/questions/1379416/insert-utf8-data-into-a-ms-sql-server-2008
http://magp.ie/2011/01/06/remove-non-utf8-characters-from-string-with-php/

the two links above are describing fixes for malformed strings;
can you be more specific about what it is you want to remove?

do you really mean high ascii characters, ie >127 like some of these?
('ÀAlbèert ËEîinstêeiìn ÌInstìitúutëe - MPG')
do you mean escaping special characters that xml requires to be htmlized? like > to &l t ;?

Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
Steven Willis
Steven Willis
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2071 Visits: 1721
I didn't write this code and can't remember where I got it so apologies to the original author. In my case I only cared about non-ASCII characters outside the range 32-127 because the SQL XML parser handles things like '<' and '>' etc. If you need to convert other characters or tags then the case statement will need a little tweaking. Undesirable characters are replaced by '?' and then optionally all question marks are replaced with blanks.



CREATE FUNCTION dbo.svfCleanXML
(
@OriginalText VARCHAR(8000)
)
RETURNS VARCHAR(8000)
BEGIN

/* This function requires a Tally or Numbers table */
/* http://www.sqlservercentral.com/articles/T-SQL/62867/ */

DECLARE @CleanedText VARCHAR(8000)

SELECT
@CleanedText = ISNULL(@CleanedText,'')
+ CASE
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 32 AND 127
THEN SUBSTRING(@OriginalText,Tally.N,1)
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) <= 32
THEN CONVERT(VARCHAR,ASCII(SUBSTRING(@OriginalText,Tally.N,1)))
WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) >= 128
THEN CONVERT(VARCHAR,ASCII(SUBSTRING(@OriginalText,Tally.N,1)))
END
FROM
dbo.Tally
WHERE
Tally.N <= LEN(@OriginalText)

SET @CleanedText = REPLACE(@CleanedText,'?','') --optional

RETURN @CleanedText

/*

SELECT dbo.svfCleanXML('[xml_string_here]') AS cleanString
SELECT CONVERT(XML,dbo.svfCleanXML('[xml_string_here]')) AS cleanXML

*/

END



Lowell
Lowell
SSC Guru
SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)

Group: General Forum Members
Points: 72917 Visits: 40959
and here's a similar one i wrote long ago which converts things like © to the html entity ©

I think it was specifically due to XML:


CREATE FUNCTION HTMLEncode(@OriginalText VARCHAR(4000))
RETURNS VARCHAR(8000)
BEGIN
DECLARE @CleanedText VARCHAR(8000);
WITH E01(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1), -- 10 or 10E01 rows
E02(N) AS (SELECT 1 FROM E01 a, E01 b), -- 100 or 10E02 rows
E04(N) AS (SELECT 1 FROM E02 a, E02 b), -- 10,000 or 10E04 rows
-- E08(N) AS (SELECT 1 FROM E04 a, E04 b), --100,000,000 or 10E08 rows
-- E16(N) AS (SELECT 1 FROM E08 a, E08 b), --10E16 or more rows than you'll EVER need
Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E04)
SELECT @CleanedText = ISNULL(@CleanedText,'') +
CASE
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =34) THEN '"' --" " " quotation mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =39) THEN ''' --' ' ' apostrophe
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =38) THEN '&' --& & & ampersand
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =60) THEN '<' --< < < less-than
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =62) THEN '>' --> > > greater-than
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =160) THEN ' ' --   non-breaking space
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =161) THEN '¡' --¡ ¡ ¡ inverted exclamation mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =162) THEN '¢' --¢ ¢ ¢ cent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =163) THEN '£' --£ £ £ pound
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =164) THEN '¤' --¤ ¤ ¤ currency
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =165) THEN '¥' --¥ ¥ ¥ yen
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =166) THEN '¦' --¦ ¦ ¦ broken vertical bar
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =167) THEN '§' --§ § § section
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =168) THEN '¨' --¨ ¨ ¨ spacing diaeresis
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =169) THEN '©' --© © © copyright
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =170) THEN 'ª' --ª ª ª feminine ordinal indicator
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =171) THEN '«' --« « « angle quotation mark (left)
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =172) THEN '¬' --¬ ¬ ¬ negation
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =173) THEN '­' --­ ­ ­ soft hyphen
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =174) THEN '®' --® ® ® registered trademark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =175) THEN '¯' --¯ ¯ ¯ spacing macron
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =176) THEN '°' --° ° ° degree
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =177) THEN '±' --± ± ± plus-or-minus
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =178) THEN '²' --² ² ² superscript 2
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =179) THEN '³' --³ ³ ³ superscript 3
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =180) THEN '´' --´ ´ ´ spacing acute
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =181) THEN 'µ' --µ µ µ micro
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =182) THEN '¶' --¶ ¶ ¶ paragraph
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =183) THEN '·' --· · · middle dot
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =184) THEN '¸' --¸ ¸ ¸ spacing cedilla
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =185) THEN '¹' --¹ ¹ ¹ superscript 1
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =186) THEN 'º' --º º º masculine ordinal indicator
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =187) THEN '»' --» » » angle quotation mark (right)
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =188) THEN '¼' --¼ ¼ ¼ fraction 1/4
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =189) THEN '½' --½ ½ ½ fraction 1/2
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =190) THEN '¾' --¾ ¾ ¾ fraction 3/4
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =191) THEN '¿' --¿ ¿ ¿ inverted question mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =215) THEN '×' --× × × multiplication
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =247) THEN '÷' --÷ ÷ ÷ division
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =192) THEN 'À' --À À À capital a, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =193) THEN 'Á' --Á Á Á capital a, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =194) THEN 'Â' --Â Â Â capital a, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =195) THEN 'Ã' --Ã Ã Ã capital a, tilde
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =196) THEN 'Ä' --Ä Ä Ä capital a, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =197) THEN 'Å' --Å Å Å capital a, ring
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =198) THEN 'Æ' --Æ Æ Æ capital ae
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =199) THEN 'Ç' --Ç Ç Ç capital c, cedilla
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =200) THEN 'È' --È È È capital e, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =201) THEN 'É' --É É É capital e, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =202) THEN 'Ê' --Ê Ê Ê capital e, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =203) THEN 'Ë' --Ë Ë Ë capital e, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =204) THEN 'Ì' --Ì Ì Ì capital i, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =205) THEN 'Í' --Í Í Í capital i, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =206) THEN 'Î' --Î Î Î capital i, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =207) THEN 'Ï' --Ï Ï Ï capital i, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =208) THEN 'Ð' --Ð Ð Ð capital eth, Icelandic
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =209) THEN 'Ñ' --Ñ Ñ Ñ capital n, tilde
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =210) THEN 'Ò' --Ò Ò Ò capital o, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =211) THEN 'Ó' --Ó Ó Ó capital o, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =212) THEN 'Ô' --Ô Ô Ô capital o, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =213) THEN 'Õ' --Õ Õ Õ capital o, tilde
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =214) THEN 'Ö' --Ö Ö Ö capital o, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =216) THEN 'Ø' --Ø Ø Ø capital o, slash
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =217) THEN 'Ù' --Ù Ù Ù capital u, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =218) THEN 'Ú' --Ú Ú Ú capital u, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =219) THEN 'Û' --Û Û Û capital u, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =220) THEN 'Ü' --Ü Ü Ü capital u, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =221) THEN 'Ý' --Ý Ý Ý capital y, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =222) THEN 'Þ' --Þ Þ Þ capital THORN, Icelandic
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =223) THEN 'ß' --ß ß ß small sharp s, German
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =224) THEN 'à' --à à à small a, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =225) THEN 'á' --á á á small a, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =226) THEN 'â' --â â â small a, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =227) THEN 'ã' --ã ã ã small a, tilde
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =228) THEN 'ä' --ä ä ä small a, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =229) THEN 'å' --å å å small a, ring
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =230) THEN 'æ' --æ æ æ small ae
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =231) THEN 'ç' --ç ç ç small c, cedilla
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =232) THEN 'è' --è è è small e, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =233) THEN 'é' --é é é small e, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =234) THEN 'ê' --ê ê ê small e, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =235) THEN 'ë' --ë ë ë small e, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =236) THEN 'ì' --ì ì ì small i, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =237) THEN 'í' --í í í small i, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =238) THEN 'î' --î î î small i, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =239) THEN 'ï' --ï ï ï small i, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =240) THEN 'ð' --ð ð ð small eth, Icelandic
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =241) THEN 'ñ' --ñ ñ ñ small n, tilde
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =242) THEN 'ò' --ò ò ò small o, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =243) THEN 'ó' --ó ó ó small o, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =244) THEN 'ô' --ô ô ô small o, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =245) THEN 'õ' --õ õ õ small o, tilde
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =246) THEN 'ö' --ö ö ö small o, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =248) THEN 'ø' --ø ø ø small o, slash
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =249) THEN 'ù' --ù ù ù small u, grave accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =250) THEN 'ú' --ú ú ú small u, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =251) THEN 'û' --û û û small u, circumflex accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =252) THEN 'ü' --ü ü ü small u, umlaut mark
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =253) THEN 'ý' --ý ý ý small y, acute accent
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =254) THEN 'þ' --þ þ þ small thorn, Icelandic
WHEN(ASCII(SUBSTRING(@OriginalText,Tally.N,1)) =255) THEN 'ÿ' --ÿ ÿ ÿ small y, umlaut mark
ELSE SUBSTRING(@OriginalText,Tally.N,1)
END
FROM Tally
WHERE Tally.N <= LEN(@OriginalText)
RETURN @CleanedText
END



Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
Lowell
Lowell
SSC Guru
SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)

Group: General Forum Members
Points: 72917 Visits: 40959
well, the forum inteprets the escaped values as something it should render...whoops.
link to the function as raw text here:

HTMLEncode.txt

Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
Khades
Khades
Old Hand
Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)

Group: General Forum Members
Points: 385 Visits: 390
The character that was giving me problems is â (0xE2)
Steven Willis
Steven Willis
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2071 Visits: 1721
Khades (8/31/2012)
The character that was giving me problems is â (0xE2)

Here's a revised version of the function I posted above which in addition to checking for normal ASCII characters will remove any other extended characters either in their native form (â) or encoded form (&#x). There are some examples to run at the bottom of the code.



CREATE FUNCTION [dbo].[svfHTMLEncodeXML]
(
@OriginalText VARCHAR(8000)
)
RETURNS VARCHAR(8000)
BEGIN

DECLARE
@CleanedText VARCHAR(8000)
,@pos INT
,@chunk VARCHAR(10)

SELECT @CleanedText = COALESCE(@CleanedText + '','')
+ CAST(CleanText.CharVal AS VARCHAR(8000))
FROM
(
SELECT
Result.CharVal
FROM
(
SELECT
N AS RowNum
,(CASE
WHEN ASCII(SUBSTRING(@OriginalText,t.N,1)) BETWEEN 32 AND 127
THEN CAST(SUBSTRING(@OriginalText,t.N,1) AS VARCHAR(50))
WHEN ASCII(SUBSTRING(@OriginalText,t.N,1)) < 32
THEN NULL
WHEN ASCII(SUBSTRING(@OriginalText,t.N,1)) >= 127
THEN NULL
END)
AS CharVal
FROM
dbo.Tally AS t
WHERE
t.N <= LEN(@OriginalText)
) AS Result
WHERE
Result.CharVal IS NOT NULL
) CleanText


SET @pos = PATINDEX('%[&#x]%',@CleanedText)

WHILE @pos <> 0
BEGIN
SET @chunk = SUBSTRING(@CleanedText,@pos,6)
SET @CleanedText = REPLACE(@CleanedText,@chunk,'')
SET @pos = PATINDEX('%[&#x]%',@CleanedText)
SET @chunk = NULL
END

SET @pos = PATINDEX('%[0x]%',@CleanedText)

WHILE @pos <> 0
BEGIN
SET @chunk = SUBSTRING(@CleanedText,@pos,6)
SET @CleanedText = REPLACE(@CleanedText,@chunk,'')
SET @pos = PATINDEX('%[0x]%',@CleanedText)
SET @chunk = NULL
END


/* This optional section removes tabs and excess spaces */

SET @CleanedText = REPLACE(@CleanedText,CHAR(9),' ') -- convert tabs to spaces

SET @CleanedText =
REPLACE(
REPLACE(
REPLACE(
@CleanedText
,' ',' '+CHAR(7))
,CHAR(7)+' ','')
,CHAR(7),'')

SET @CleanedText = REPLACE(@CleanedText,'> <','><')


RETURN @CleanedText


/*
SELECT dbo.svfHTMLEncodeXML('O&xxxx;Brien') -- sub '#x0D' for 'xxxx' to test
SELECT dbo.svfHTMLEncodeXML('OÇBrien')
SELECT dbo.svfHTMLEncodeXML('O¢Brien')
SELECT dbo.svfHTMLEncodeXML('O&xxxx;Brien') -- sub '#x0F' for 'xxxx' to test
SELECT dbo.svfHTMLEncodeXML('OâBrien')
*/

/* --this is text with an invisible line break character in it
SELECT dbo.svfHTMLEncodeXML('O
Brien')
*/

/* -- notice the â (0xE2) in the 2nd position which the function will remove

SELECT CONVERT(XML,dbo.svfHTMLEncodeXML(
'<âRoot type="demographics">
<PersonalData>
<PersonName type="primary">
<GivenName>John</GivenName>
<MiddleName>A</MiddleName>
<FamilyName>Doe</FamilyName>
</PersonName>
</PersonalData>
</Root>'))

*/

END



Khades
Khades
Old Hand
Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)Old Hand (385 reputation)

Group: General Forum Members
Points: 385 Visits: 390
Thank you once again. I appreciate your help.

What does the dbo.Tally table hold? I notice we keep querying the N column.
Steven Willis
Steven Willis
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2071 Visits: 1721
Khades (9/4/2012)
Thank you once again. I appreciate your help.

What does the dbo.Tally table hold? I notice we keep querying the N column.

A "Tally" table (sometimes called a "Numbers" table) is just a table with a large column of sequentially ordered numbers which can be joined to other tables to avoid using cursors or while loops.

This article will giet you started on the subject: http://www.sqlservercentral.com/articles/T-SQL/62867/

BTW, in my last post above I did include some WHILE loops which could almost certainly be replaced by joining to a tally table. But it was late when I posted that and I wanted to go home. :-P

Besides, in my particular case where I'm using a function very similar to this, the number of loops are likely to be minimal as these stray characters only occur infrequently and I'm using the function against a single block of code at a time and not in a SELECT statement. If using this function in a SELECT statement it would be worth the effort to try and replace them with some sort of non-looping construct.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search