Unicode

  • Intering fact that different MS products use different versions of the Unicode standard. Looks like a Right hand not knowing what the Left hand is doing situation, and I can see how MS's failure to standardize on the most current version of the standard is going to cause pain to its customers.

    -- Mark D Powell --

  • klaas114 (6/12/2009)


    I read the answer, but I thoght a unicode character could be between one to 4 bytes. (a byte being 8 bits)

    In UTF-8, that is exactly the case. UCS-2 characters are always 16 bits (and UCS-2 can thus only encode the BMP), UTF-16 can be 16 or 32 bits, and UTF-32 is always 32 bits. That's ignoring combining characters, so what shows up as a single character on the screen may be one or more characters to the computer (and the characters, in turn, can vary in data length).

    Mark D Powell (6/12/2009)


    Intering fact that different MS products use different versions of the Unicode standard. Looks like a Right hand not knowing what the Left hand is doing situation, and I can see how MS's failure to standardize on the most current version of the standard is going to cause pain to its customers.

    I would be very surprised if they use different versions of Unicode, and saw no indication on that they do, but SSMS and VS do seem to default to different encodings of Unicode. Technical detail, yes, but since there's been two confusing Unicode-related QotDs just in the last few days...

  • Honestly, I saw this question and refused to answer on the basis that "Unicode" has been essentially meaningless in this context for over a decade. UTF-16LE has meaning. UTF-8 has meaning. UTF-32BE has meaning. "Unicode" doesn't give enough information in and of itself.

    From Unicode.org FAQ

    Q: Is Unicode a 16-bit encoding?

    A: No. The first version of Unicode was a 16-bit encoding, from 1991 to 1995, but starting with Unicode 2.0 (July, 1996), it has not been a 16-bit encoding. The Unicode Standard encodes characters in the range U+0000..U+10FFFF, which is roughly a 21-bit code space. Depending on the encoding form you choose (UTF-8, UTF-16, or UTF-32), each character will then be represented either as a sequence of one to four 8-bit bytes, one or two 16-bit code units, or a single 32-bit code unit.

    Note that the current Unicode standard is 5.1.

  • OK, I have finally updated the database and corrected the question to say "SQL Server, and "unicode characters" as well as awarded points back.

    My apologies for this.

  • was easy one.



    [font="System"]Bhavesh Patel[/font]

    http://bhaveshgpatel.wordpress.com/

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply