RE: Distinct languages in a database

Default port

Points: 1401

February 9, 2006 at 1:48 pm

#619930

Also, as you mentioned Chinese, be aware that the assumption that every Unicode character (codepoint) can be represented as two bytes, is an old assumption from a ten-year old Unicode encoding called UCS-2LE, which is what was used in Windows NT, and which is still used (I think) by Microsoft SQL Server, and *!* which is a safe assumption outside of the Chinese language.

But if you get into Chinese, you will have to decide if Unicode 2.0 (the old list that had no more than 65536 characters, and therefore could be represented in a fixed-size 2-byte encoding) is good enough -- the issue that may come up is GB18030, which is newer than that, and has more Chinese characters than fit in that old version.

UTF-8 is a modern encoding, and holds all Unicode characters, but, SQL Server 2000 at least cannot deal with it natively. I don't know about SQL Server 2005.