How to check if a field contains unicode data

  • Hi,

    I want to check if a field contains Unicode characters or not. This is how I have been doing it:

    SELECT *

    FROM Tablename

    WHERE CAST(Fieldname AS VARCHAR(MAX)) <> Fieldname

    It *seems* to work when I do some basic tests, such as these below:

    --Test 1:

    DECLARE @text NVARCHAR(100)

    SET @text = N'This is non-Unicode text, in Unicode'

    IF CAST(@text AS VARCHAR(MAX)) <> @text

    PRINT 'Contains Unicode characters'

    ELSE

    PRINT 'No Unicode characters'

    GO

    --Test 2:

    DECLARE @text NVARCHAR(100)

    SET @text = N'This is Unicode (字) text, in Unicode'

    IF CAST(@text AS VARCHAR(MAX)) <> @text

    PRINT 'Contains Unicode characters'

    ELSE

    PRINT 'No Unicode characters'

    GO

    Do you think this a safe method to use or not?

    Thanks in advance.

  • Seems safe to me. I can't help but wonder why you need to know, though.

  • Before I populate a VARCHAR field from a NVARCHAR field, I want to check if it is possible to do without raising an error.

  • xnl28-574517 (3/10/2010)


    Before I populate a VARCHAR field from a NVARCHAR field, I want to check if it is possible to do without raising an error.

    Converting Unicode to ASCII in SQL Server does not raise an error. Any characters that don't map, get replaced by question marks.

  • You're right, the data gets clobbered.

    What I meant to say was that I want to perform a check without clobbering the data.

  • The type precidence between varchar and nvarchar on your code will upconvert the varchar to an nvarchar during the comparison. It will still work though. It's like you are converting and then immediately unconverting and seeing if they match.

    create table #test (

    uni nvarchar(100)

    )

    insert into #test (uni) values (N'????'),( 'asdf'), (N'Foo'), ('Bar')

    select *

    from #test

    where uni <> convert(varchar(max), uni)

  • I know, I know, this post is so old and why am I replying. 
    Because I need to do this exact same thing, and the "I can't help but wonder why" comment inspired me to tell you why I need to, FWIW:

    Many of the tables I design were designed by former developers, using varchar(max).  Considering that we're now encouraged to avoid max lengths when possible, I'm re-considering all data types rather than continue the old design.
    While doing so, I'm also considering nvarchar.  Even though I like nvarchar better (after reading a lot about codepages and opinions online), I'm having a hard time convincing our code approvers for me to use it.
    I'm searching for any non-Ascii characters in my tables, sometimes, to see if I can prove to them that there may be a reason (other than codepages and applications of the future which may only accept Unicode) that the previous table was designed using nvarchar.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply