Can varchar columns use UTF-8?

  • Hi all.

    Does SQL Server 2005 support storing text in UTF-8 in VARCHAR columns?

    Thanks

    --Dan

  • You can do this, but it may cause problem when you manage the text using string functions. By utf-8, some characters need one byte, some need 2 or more bytes to represent. String functions work by bytes (e.g. Len returns bytes of the string instead of the number of char in it) for varchar. So you may get wired results if some characters in the text need more than 1 byte to represent when you use those functions.

     

     

  • Thanks. I should have worded that question differently. The full question was "does SQL Server 2005 fully support UTF-8?" By "fully support" I mean that all of its string functions work correctly with UTF-8.

    So what you're saying is that the answer is NO, SQL Server 2005 does not support UTF-8, because none of its string functions (or search functions, presumably) work correctly if you store your text in UTF-8.

    Oh well.

    Thanks

    --Dan

  • SQL server definitely support various unicode encoding including utf-8, but you need to use nvarchar instead of varchar.

    nvarchar is unicode string data type.

     

  • Thanks. Does nvarchar really support UTF-8, or is it just for UCS2? It was my understanding that nvarchar was a two-byte/character column type that stored all its text data in UCS2. Is there a way to tell SQL Server 2005 to use UTF-8 for nvarchar columns instead of UCS2? And do all the string functions support that?

    --Dan

  • I understand what you mean now. The encoding of unicode string in sql server storage is UCS2 or utf-16, not utf-8. I don't think you can change it.

    What I said in my previosu post is that SQL accepts unicode string in any encoding, including utf-8. SQL server just converts it into utf-16.

    I don't know why you care SQL server's encoding for unicode. You can always convert it back to utf-8 after you read the text into your application, right? Since utf-16 takes two bytes for each char, the space required in database is doubled. Is this your concern?

     

     

  • No I'm not concerned with the extra byte per character. I just wanted to make sure there was no way to set a column to be UTF-8 before I went to the trouble of making my app convert the contents of a nvarchar to UTF-8 when displaying. Good to know that it will automatically convert when putting it in, though.

    Thanks for your help.

    --Dan

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply