Globalization or Internationalization has become a buzzword in many places such as political circles, business forums, and elsewhere. Because of this mega trend, software also needs to cater to globalization. What are the aspects you need to cover in globalization? If you are a vendor who is supporting software packages for countries like China, Korea, Japan in Asia, or Germany, Spain, France, etc. in Europe, you will have to give support for your software in the native language. These languages are stored in Unicode characters.
Different time zones are another issue if you are dealing with multiple countries in a single project. This article does not address to that issue as it has already discussed in the article http://www.sqlservercentral.com/columnists/dasanka/datetimevaluesandtimezones.asp. This article describes only how to cater to different languages.
The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world. All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.
One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters.
Each Microsoft SQL Server collation has a code page that defines what patterns of bits represent each character in char, varchar, and text values. Individual columns and character constants can be assigned a different code page. Client computers use the code page associated with the operating system locale to interpret character bit patterns.
There are many different code pages. Some characters appear on some code pages, but not on others. Some characters are defined with one bit pattern on some code pages, and with a different bit pattern on other code pages. When you build international systems that must handle different languages, it becomes difficult to pick code pages for all the computers that meet the language requirements of multiple countries/regions. It is also difficult to ensure that every computer performs the correct translations when interfacing with a system using a different code page. A few of the code pages and their values are shown below.
The Unicode specification addresses this problem by using 2 bytes to encode each character. There are enough different patterns (65,536) in 2 bytes for a single specification covering the most common business languages. Because all Unicode systems consistently use the same bit patterns to represent all characters, there is no problem with characters being converted incorrectly when moving from one system to another. You can minimize character conversion issues by using Unicode data types throughout your system. In Microsoft SQL Server the nchar, nvarchar, and ntext data types support Unicode data. Those are the replacement for the char, varchar and text date types.
Note that the 'n' prefix for these data types come from the SQL-92 standard for National (Unicode) data types.
However, you should use Unicode characters only if it is going to store the Unicode characters. If not there will be unnecessary performance losses. Another point that you should remember is , the maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like in char and varchar.
According to BOL, the Set Language Specifies the language environment for the session. The session language determines the date time formats and system messages. If users need to change the language to us_english following T-SQL has to be executes;
SET LANGUAGE us_english
The following table contains the language names and its other parameters.
Microsoft SQL Server 2000 has support for multiple languages in different ways. Users can select the best option that fits their need or that will fit to their environments.
To read the rest of this article, and access thousands of other articles, we ask you to register on the site and subscribe to our newsletters.
We ask you to register on the site and subscribe to our newsletters. Subscribing to our newsletters gets you:
We ask that you give the newsletter a try for a week. Over 200,000 SQL Server Professionals a day find it entertaining and useful. If not, you are welcome to unsubscribe at anytime.
Steve Jones Editor, SQLServerCentral.com