Finding a Unicode Value Greater Than 256

  • I am using the script below to determine whether there are any Unicode values greater than 256. My problem is that I am unsure how to only report the Unicode greater than 256, i simply do not want to know about any others.

    You will see that I have commented out a FROM statement, this is the table/column that may have as much as 4000 characters within each row. My main aim is to cross reference this table/column to determine Unicode greater than 256.

    The reason I am trying to do this is because an clinical application is using an unknown code page and storing data within the back end database using a different code page, I am trying to prove to the supplier that this is in fact a problem.

    I would appreciate any guidance you may offer, Steve.

    USE [tempdb]

    GO

    DECLARE @Position INT

    ,@String NCHAR(4000)

    ;

    SET @Position = 1

    ;

    SELECT @String = N'h•s' --ae.[ClinicalComments] FROM [Electronic_Discharge].[AandE].[Activity_AEAttendance_vw] AS ae WHERE ae.[UniqueID] IN (412503)

    ;

    PRINT LEN(@String)

    WHILE @Position <= LEN(@String)

    BEGIN

    ;

    SELECT @Position AS [Position]

    ,CONVERT(CHAR(1), SUBSTRING(@String, @Position, 1)) AS [Character]

    ,UNICODE(SUBSTRING(@String, @Position, 1)) AS [Unicode]

    WHERE UNICODE(SUBSTRING(@String, @Position, 1)) > 256

    ;

    SELECT @Position = @Position + 1

    ;

    END

    ;

  • It's not fully clear to me what result you are looking for. Can you post CREATE TABLE + INSERT statements with some sample data and what result you want?

    I can see one thing in the script that is wrong. You do

    CONVERT(CHAR(1), SUBSTRING(@String, @Position, 1)) AS [Character]

    char(1) is, well, char so that is not a Unicode data type, so this means that any character that is not in the code page for your collation will be converted to a fallback. And if your collation is based on Latin-1 (which I would assume since your name is Steve :-), this means any character > 256. You need to use nchar(1).

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply