Remove non ASCII characters from column

  • I have a Stored Procedure which returns XML to a traditional .asp web site which has a page where users can download an XML file of their data. Viewing the XML file in Internet Explorer, invariably there are errors that say something like 'an invalid character has been encountered'.

    I am under the impression SQL Server generates XML using UTF-8 and the asp file sets this as the character set.

    I figure the easiest thing to do is simply to remove all non ASCII characters from the column that seems to be causing the problem - a 255 varchar field.

    How can I remove all non-ASCII characters from the field?

    Thanks for any help.

  • Here's a function that accepts a unicode string and spits it back at you without the invalid ASCII characters. Just call it for each column you need to clean up.

    select RemoveNonASCII(col_name) as col_name

    CREATE FUNCTION RemoveNonASCII

    (

    @nstring nvarchar(255)

    )

    RETURNS varchar(255)

    AS

    BEGIN

    DECLARE @Result varchar(255)

    SET @Result = ''

    DECLARE @nchar nvarchar(1)

    DECLARE @position int

    SET @position = 1

    WHILE @position <= LEN(@nstring)

    BEGIN

    SET @nchar = SUBSTRING(@nstring, @position, 1)

    --Unicode & ASCII are the same from 1 to 255.

    --Only Unicode goes beyond 255

    --0 to 31 are non-printable characters

    IF UNICODE(@nchar) between 32 and 255

    SET @Result = @Result + @nchar

    SET @position = @position + 1

    END

    RETURN @Result

    END

    GO

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply