Unicode

Question

Unicode

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Mark D Powell SSCarpal Tunnel Points: 4775 More actions · Answer 1

Intering fact that different MS products use different versions of the Unicode standard. Looks like a Right hand not knowing what the Left hand is doing situation, and I can see how MS's failure to standardize on the most current version of the standard is going to cause pain to its customers.

-- Mark D Powell --

michael.kjorling SSC Enthusiast Points: 131 More actions · Answer 2

klaas114 (6/12/2009)
I read the answer, but I thoght a unicode character could be between one to 4 bytes. (a byte being 8 bits)

In UTF-8, that is exactly the case. UCS-2 characters are always 16 bits (and UCS-2 can thus only encode the BMP), UTF-16 can be 16 or 32 bits, and UTF-32 is always 32 bits. That's ignoring combining characters, so what shows up as a single character on the screen may be one or more characters to the computer (and the characters, in turn, can vary in data length).

Mark D Powell (6/12/2009)
Intering fact that different MS products use different versions of the Unicode standard. Looks like a Right hand not knowing what the Left hand is doing situation, and I can see how MS's failure to standardize on the most current version of the standard is going to cause pain to its customers.

I would be very surprised if they use different versions of Unicode, and saw no indication on that they do, but SSMS and VS do seem to default to different encodings of Unicode. Technical detail, yes, but since there's been two confusing Unicode-related QotDs just in the last few days...

Nadrek SSC-Insane Points: 20039 More actions · Answer 3

Honestly, I saw this question and refused to answer on the basis that "Unicode" has been essentially meaningless in this context for over a decade. UTF-16LE has meaning. UTF-8 has meaning. UTF-32BE has meaning. "Unicode" doesn't give enough information in and of itself.

From Unicode.org FAQ

Q: Is Unicode a 16-bit encoding?

A: No. The first version of Unicode was a 16-bit encoding, from 1991 to 1995, but starting with Unicode 2.0 (July, 1996), it has not been a 16-bit encoding. The Unicode Standard encodes characters in the range U+0000..U+10FFFF, which is roughly a 21-bit code space. Depending on the encoding form you choose (UTF-8, UTF-16, or UTF-32), each character will then be represented either as a sequence of one to four 8-bit bytes, one or two 16-bit code units, or a single 32-bit code unit.

Note that the current Unicode standard is 5.1.

Steve Jones - SSC Editor SSC Guru Points: 734459 More actions · Answer 4

OK, I have finally updated the database and corrected the question to say "SQL Server, and "unicode characters" as well as awarded points back.

My apologies for this.

Bhavesh_Patel SSCrazy Points: 2259 More actions · Answer 5

was easy one.

[font="System"]Bhavesh Patel[/font]

http://bhaveshgpatel.wordpress.com/