Can varchar columns use UTF-8?

  • Dan Delaney-339421

    Old Hand

    Points: 310

    Hi all.

    Does SQL Server 2005 support storing text in UTF-8 in VARCHAR columns?

    Thanks

    –Dan

  • peterhe

    SSChampion

    Points: 11362

    You can do this, but it may cause problem when you manage the text using string functions. By utf-8, some characters need one byte, some need 2 or more bytes to represent. String functions work by bytes (e.g. Len returns bytes of the string instead of the number of char in it) for varchar. So you may get wired results if some characters in the text need more than 1 byte to represent when you use those functions.

     

     

  • Dan Delaney-339421

    Old Hand

    Points: 310

    Thanks. I should have worded that question differently. The full question was “does SQL Server 2005 fully support UTF-8?” By “fully support” I mean that all of its string functions work correctly with UTF-8.

    So what you’re saying is that the answer is NO, SQL Server 2005 does not support UTF-8, because none of its string functions (or search functions, presumably) work correctly if you store your text in UTF-8.

    Oh well.

    Thanks

    –Dan

  • peterhe

    SSChampion

    Points: 11362

    SQL server definitely support various unicode encoding including utf-8, but you need to use nvarchar instead of varchar.

    nvarchar is unicode string data type.

     

  • Dan Delaney-339421

    Old Hand

    Points: 310

    Thanks. Does nvarchar really support UTF-8, or is it just for UCS2? It was my understanding that nvarchar was a two-byte/character column type that stored all its text data in UCS2. Is there a way to tell SQL Server 2005 to use UTF-8 for nvarchar columns instead of UCS2? And do all the string functions support that?

    –Dan

  • peterhe

    SSChampion

    Points: 11362

    I understand what you mean now. The encoding of unicode string in sql server storage is UCS2 or utf-16, not utf-8. I don’t think you can change it.

    What I said in my previosu post is that SQL accepts unicode string in any encoding, including utf-8. SQL server just converts it into utf-16.

    I don’t know why you care SQL server’s encoding for unicode. You can always convert it back to utf-8 after you read the text into your application, right? Since utf-16 takes two bytes for each char, the space required in database is doubled. Is this your concern?

     

     

  • Dan Delaney-339421

    Old Hand

    Points: 310

    No I’m not concerned with the extra byte per character. I just wanted to make sure there was no way to set a column to be UTF-8 before I went to the trouble of making my app convert the contents of a nvarchar to UTF-8 when displaying. Good to know that it will automatically convert when putting it in, though.

    Thanks for your help.

    –Dan

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply