Multiple Language Database Design - Additional Columns vs. Additional Tables...

  • Sorry for posting this in two spots. I posted in the strategy section before I saw this section...

    We are getting ready to create a new site for at least 3 langauges (English, French, Spanish), with an undefined amount to be determined later. We have tossed around creating a column for each of the languages in each table.

    For example:

    ID, Text_Eng, Text_Fr, Text_sp,

    We also looked at creating a sperate table to hold the translations. What has worked for everyone in the past.

    OT: We also are looking at using a combination of XML and XSLT (for the various height/width requirements in the more verbose langauges) to handle the static web pages as far as the translation is concerned. Any thoughts?

    Anton

  • I am using a seperate table to hold all the localized value for each table and  lauguages. The table is something like:

    LocaleID, SourceTableName, SourceField, SourcePKValue, Text

    e.g.

    'en-US',   'dbo.Author', 'Name','5','Rob'

    In the stored procedure where localized value is reuqired to return, use the following function to retrive the value:

    dbo.fn_GetLocalizedName (LocaleID,SourceTableName,SourceField,SourcePKValue)

    This method can support different languages without changing the code.

     

     

  • Why not add "language" to your PK?  Table could be like:

    FLD (PK): ID

    FLD (PK): Language

    FLD:  TextValue

    Then you'd have records like

    A - English - English Text A

    A - French - French Text A

    A - Spanish - Spanish Text A

    B - English - English Text B

    B - French - French Text A

    You should have a lookup table containing the languages which has referential integrity to the Language column in your data table.  Doing it this way, if you get a new language you just add a single record to the lookup table (say German) and you can populate the new text values into your data table without any DB structure changes.  Additionally if you find you're storing data values for one language a lot more than another, you're not ending up with a lot of records containing empty values (which you would if you have records containing a field for each language).

    My two cents...

  • You certainly can add a language column in your PK. The problem is that you will have too much redundancy for non-text columns, and it will be difficult to maintain the table.

    If I am not wrong, the original question is to expose the same information in different languages.  

    For tables that accept user input, the language column is a good idea. But it's not necesarily part of PK.

     

  • Try these links for localization in .NET because most  .NET application localization is in XML.  Hope this helps.

    http://openmymind.net/localization/index.html

    http://openmymind.net/index.aspx?documentId=4

    http://www.microsoft.com/downloads/details.aspx?FamilyID=6b6fb09f-f25c-48e9-9e26-b55144600da1&DisplayLang=en

     

    Kind regards,

    Gift Peddie

    Kind regards,
    Gift Peddie

  • Gift Peddie (8/10/2005)


    Try these links for localization in .NET because most .NET application localization is in XML. Hope this helps.

    Peddie, .NET focuses on Application localization which is completely different than "data" localization. The database structure for localizing an application is very, very easy and .NET applications can take advantage of a custom database resource reader[/url], but localizing say your Product Descriptions is a whole other story.

    I too am in this boat and I am curious if the original poster has come up with a solution yet?

  • I am looking for the best way to handle localization of lookup tables and found a website that has presented what I think is a good solution. Note that the localization information is at the bottom of the page: http://www.developerdotstar.com/community/lookup_table

  • tymberwyld (11/30/2007)


    Gift Peddie (8/10/2005)


    Try these links for localization in .NET because most .NET application localization is in XML. Hope this helps.

    Peddie, .NET focuses on Application localization which is completely different than "data" localization. The database structure for localizing an application is very, very easy and .NET applications can take advantage of a custom database resource reader[/url], but localizing say your Product Descriptions is a whole other story.

    I too am in this boat and I am curious if the original poster has come up with a solution yet?

    I have read the article link you posted and some of the resources I posted are dated based on VS2005. But I still think starting with translated static content in each language and accepting each language to either a separate table or database is better than most automatic localization which will not work with all languages.

    Kind regards,
    Gift Peddie

  • Ok, so there's absolutely no design pattern for Globalizing "data". So I've come up with a solution which seems to be working well. It's based on a combination of different patterns including the way the .NET Framework stores Resources, the EAV Model, and my own brew of things. I knew all along this was the way to go but just wanted to see if there were other suggestions.

    First off, let me state I am sorry about the formatting, this Forum seems to strip all Tabs and Spaces out of everthing!

    So, I've created a Table (ObjectGlobalization - sorry, couldn't think of a better name) which allows me to globalize data for any table in my database using an EAV Model. You'll notice there are multiple "Value" columns. Only one is used per Column / Row depending on the Column's DataType. For all DataTypes besides VarChar(MAX), Text, NText, and Xml use the Value column, otherwise use the appropriate ValueText or ValueXml columns (you can add others if need be; for example, if you're storing Icons per Culture add a ValueBinary columns with a data type of either VarBinary(MAX) or Image). Also, because the Value column is a Sql_Variant, you should be able to store Chinese characters in there because it can accept a NVarChar data type.

    CREATE TABLE [dbo].[ObjectGlobalization](

    [ObjectName] [varchar](128) NOT NULL,

    [ColumnName] [varchar](128) NOT NULL,

    [Culture] [varchar](5) NOT NULL,

    [RowID] [sql_variant] NOT NULL,

    [Value] [sql_variant] NOT NULL,

    [ValueText] [varchar](max) NULL,

    [ValueXml] [xml] NULL,

    [CreatedBy] [varchar](50) NULL,

    [CreatedDate] [datetime] NULL CONSTRAINT [DF_ObjectGlobalization_CreatedDate] DEFAULT (getutcdate()),

    [ModifiedBy] [varchar](50) NULL,

    [ModifiedDate] [datetime] NULL CONSTRAINT [DF_ObjectGlobalization_ModifiedDate] DEFAULT (getutcdate()),

    CONSTRAINT [PK_ObjectGlobalization_1] PRIMARY KEY CLUSTERED ([ObjectName] ASC,[ColumnName] ASC,[Culture] ASC,[RowID] ASC) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE FUNCTION [dbo].[ufn_get_GlobalizedValue]

    (

    @ObjectName VarChar(128),

    @ColumnName VarChar(128),

    @Culture VarChar(10) = Null,

    @RowID Sql_Variant,

    @Default Sql_Variant = Null

    )

    RETURNS Sql_Variant

    AS

    BEGIN

    Declare @Value Sql_Variant

    Select @Value = [Value]

    From dbo.ObjectGlobalization

    Where (ObjectName = @ObjectName)

    And (ColumnName = @ColumnName)

    And (Culture = @Culture)

    And (RowID = @RowID)

    Set @Value = IsNull(@Value, @Default)

    -- Return the result of the function

    RETURN(@Value)

    END

    GO

    CREATE FUNCTION [dbo].[ufn_get_GlobalizedValueText]

    (

    @ObjectName VarChar(128),

    @ColumnName VarChar(128),

    @Culture VarChar(10) = Null,

    @RowID Sql_Variant,

    @Default VarChar(MAX) = Null

    )

    RETURNS VarChar(MAX)

    AS

    BEGIN

    Declare @Value VarChar(MAX)

    Select @Value = IsNull([ValueText], @Default)

    From dbo.ObjectGlobalization

    Where (ObjectName = @ObjectName)

    And (ColumnName = @ColumnName)

    And (Culture = @Culture)

    And (RowID = @RowID)

    -- Return the result of the function

    RETURN(@Value)

    END

    GO

    Ok, so how is this used? Well, for my scenario I needed to store Product Descriptions and Names per Culture. I started out attempting to dynamically determine the Culture by using the SqlConnectionString option called "LocaleID", however, it's WAY too much work because you can't simply set it to an LCID, it needs to be SQL Server's own Unique ID for Locales in the SysLanguages table. So, instead I've opted to *sigh* pass in another parameter to my Procedures called @Culture no less.

    Most of you will be designing you data using "Invariant Culture" which means basically "English". For my Products table, I opted NOT to store the Description in the table but instead to just store all of them including the "Invariant" version in the ObjectGlobalization table. However, I do store the Invariant version of the Product "Name" in the table, so you'll notice two different techniques in this Proc for retrieving these values.

    CREATE PROCEDURE [dbo].[usp_get_ProductsByCategoryID]

    (

    @CategoryID UniqueIdentifier,

    @Culture VarChar(10) = Null-- Defaults to Invariant

    )

    AS

    SET NOCOUNT ON

    Select

    P.ID,

    P.Code,

    Convert(VarChar(50), dbo.ufn_get_GlobalizedValue('Product.Products', 'Name', @Culture, P.ID, P.Name)) As [Name],

    Convert(VarChar(255), Left(dbo.ufn_get_GlobalizedValueText('Product.Products', 'Description', @Culture, P.ID, Null), 255)) As [Description],

    IsNull(P.UnitPrice, 0.00) As UnitPrice,

    IsNull(P.ListPrice, 0.00) As ListPrice,

    P.Attributes,

    P.CreatedBy, P.CreatedDate,

    P.ModifiedBy, P.ModifiedDate

    From Product.Products P

    Inner Join Product.ProductCategories C ON (P.ID = C.ProductID)

    Where

    (C.CategoryID = IsNull(@CategoryID, C.CategoryID))

    You'll notice that you can pass in a "Default" value to these functions which is what I do in the case for the Product Name just in case there is no data for that particular Culture. Also, in my case this is the Proc for returning Products in a List, so I only need the 1st 255 chars of the Description. Also, since the first function (dbo.ufn_get_GlobalizedValue) returns a Sql_Variant, it's very important to explicitly cast it to the correct type so that you UI code performs correctly.

    And lastly, I basically just typed the values I needed for this test using SQL MGMT Studio, however, the following Proc will allow you to save the Globalized values. You'd most likely just call this Proc from you're other Ins/Upd/Save Procs.

    CREATE PROCEDURE [dbo].[usp_save_GlobalizedValue]

    (

    @ObjectName VarChar(128),

    @ColumnName VarChar(128),

    @Culture VarChar(10),

    @RowID Sql_Variant,

    @Value Sql_Variant = Null,

    @ValueText VarChar(MAX) = Null,

    @ValueXml Xml = Null,

    @createdby VarChar(50) = Null,

    @CreatedDate DateTime = Null,

    @ModifiedBy VarChar(50) = Null,

    @ModifiedDate DateTime = Null

    )

    AS

    Set NoCount On

    -- Exit if there is no data

    If (@Value Is Null) And (@ValueText Is Null) And (@ValueXml Is Null) RETURN(1)

    -- Validate the Data

    Select

    @ObjectName = NullIf(LTrim(RTrim(@ObjectName)), ''),

    @ColumnName = NullIf(LTrim(RTrim(@ColumnName)), ''),

    @Culture = NullIf(LTrim(RTrim(@Culture)), ''),

    @createdby = IsNull(@CreatedBy, SUser_Name()),

    @CreatedDate = IsNull(@CreatedDate, GetUtcDate()),

    @ModifiedBy = IsNull(@ModifiedBy, SUser_Name()),

    @ModifiedDate = IsNull(@ModifiedDate, GetUtcDate())

    If (@ObjectName Is Null) RaisError('The [@ObjectName] Parameter cannot be Null or Empty.', 16, 1)

    If (@ColumnName Is Null) RaisError('The [@ColumnName] Parameter cannot be Null or Empty.', 16, 1)

    If (@RowID Is Null) RaisError('The [@RowID] Parameter cannot be Null or Empty.', 16, 1)

    If (@@ERROR <> 0) RETURN(-1)

    BEGIN TRY

    If Not Exists(

    Select Null From dbo.ObjectGlobalization

    Where ObjectName = @ObjectName

    And ColumnName = @ColumnName

    And Culture = IsNull(@Culture, '')

    And RowID = @RowID

    )

    BEGIN

    Insert Into dbo.ObjectGlobalization

    (ObjectName, ColumnName, Culture, RowID,

    Value, ValueText, ValueXml,

    CreatedBy, CreatedDate, ModifiedBy, ModifiedDate)

    Values

    (@ObjectName, @ColumnName, IsNull(@Culture, ''), @RowID,

    @Value, @ValueText, @ValueXml,

    @createdby, @CreatedDate, @ModifiedBy, @ModifiedDate)

    END

    Else

    BEGIN

    Update dbo.ObjectGlobalization Set

    ObjectName = @ObjectName,

    ColumnName = @ColumnName,

    Culture = IsNull(@Culture, ''),

    RowID = @RowID,

    Value = @Value,

    ValueText = @ValueText,

    ValueXml = @ValueXml,

    ModifiedBy = @ModifiedBy,

    ModifiedDate = @ModifiedDate

    Where ObjectName = @ObjectName

    And ColumnName = @ColumnName

    And Culture = IsNull(@Culture, '')

    And RowID = @RowID

    END

    If (@@TRANCOUNT > 0) COMMIT

    RETURN(1)

    END TRY

    BEGIN CATCH

    If (@@TRANCOUNT > 0) ROLLBACK

    RETURN(-1)

    END CATCH

  • Storing Chinese, Japanese and Korean is not just simple Nvarchar issue because all three languages alphabet is more than 2000 characters so SQL_Variant is rolling the dice that you will not run into character conversion issues. The reason there are more one than one Chinese and Japanese collation defined in SQL Server 2005. When you accept langauge specific content there is no extra code needed, just use VS2005 to save your code for each langauge and use the langauge packs for .NET Framework. I have worked in more than 20 languages application and a friend run a 32 languages application. When you are not using the 26 character Latin alphabet things can get very complicated.

    Kind regards,
    Gift Peddie

  • Yes, I wouldn't have assumed that Asian character sets would be so easy. This is maybe another case for adding yet more "Value" columns which have their own Collation set properly to the appropriate Asian language. It is possible to have multiple collations in a table, but only one collation on the database. You have to modify your queries and procs to explicitly set the collation for those languages because SQL won't do it for you in some cases.

    For example:

    ALTER TABLE dbo.ObjectGlobalization

    ADD [Value_] NVARCHAR(MAX) COLLATE Chinese_PRC_CI_AS

    I don't know, but as a developer and DBA I am not one to want to maintain 32 different sets of code for one application and 32 different databases. There should be an easier way and if you have experience in this, can you shed some more light?

  • I would strongly favour separate tables for each language. That way you do not have redundant columns for languages you do not use. You can also specify the relevant collation for each table, according to its language.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • SOME PROS AND CONS OF THE DIFFERENT OPTIONS

    I can see 3 different approaches regarding the storage of content in different languages:

    1) one 'master' table holding invariant content, and an auxiliary table holding in different rows the versions of data (structure: idItem, idLang, text);

    2) one 'master' table for invariant content and auxiliary tables -one for each language- (structure: idItem, text);

    3) only one table, with different columns for the different languages for each translated field (structure: idItem, idCategory, text-en_US, text-es_ES, ...)

    I don't consider the many 'master' tables with and only ONE auxiliary table holiding the different versions of ALL the tables (format: idItem, table_name, lang_id, text). I find it very similar to model 1 but with serious dissadvantages: this will be a huge table with n*m*L rows, and it needs a mechanism to deal with tables using different quantities of columns as PKs.

    The following analisys is for a system consisting of T tables with m translated fields and n rows, and L languages.

    Fallback refers to a mechanism to present the text in an alternative language in case text in the original language is not available.

    The pros (-) and cons (+) I can think of for each approach are:

    1) master + translationS table

    + saves on empty translated values

    + language add: do nothing

    - one charset, collation fits all

    - slower (requires 1 JOIN on n*L rows, returns m+1 rows to be processed)-can this be optimised with stored procedures?

    - adds T tables

    - adds n*m*L rows in total

    - fallback fair: all translations must be fetch (returns (m*L)+1 rows to be processed)

    2) master + translation tableS

    + each table with its own appropiate charset and collation

    + saves on empty translated values

    - slower (requires 1 JOIN on n*L rows, returns L+1 rows to be processed)-can this be optimised with stored procedures?

    - adds T*L tables;

    - language add: N new tables must be created

    - adds n*m*L rows in total

    - fallback very expensive: all translations must be fetch (requires L JOINs on n*L rows, returns (m*L)+1 rows to be processed)

    3) one table, multiple columns for translated fields

    + each column with its own appropiate charset and collation (does the DB or table definition impose any limit?)

    + does not add any row

    + faster (requires 1 SELECT on n rows, returns 1 row to be processed)

    + fallback unexpensive: texts in every language are retrieved in the original query

    - language add: m new columns in the T different tables

    - adds T*m*L columns in total

    It seems to me that each particular situation make one of the alternatives the best (i.e. if most of the translated values are empty saving that space is be relevant, otherwise it's not). Anway it seems to me that option no. 3 -even if it looks like the most prosaic of them all- has several important advantages over the other ones. This option makes the administrator work harder (every time a language is added-hopefully not that ofen) but performs much faster on each user click (many times every hour). Or am I forgetting something?

  • One thing I'd like to add to what [andarin2] said with options #3 is that there is all this talk of space savings. When you design a table, the sum of all the column widths are what the "potential" row size would be IF you filled in every column to its maximum value. However, with what we're talking about, having separate columns for each language would not waste more space than a 1-to-many translation table because you're only ever filling in 1-3 columns (if you want fall-back translations, say from es_MX to es_ES). However, I do agree that this approach does have more maintenance associated with it and even though SQL supports large numbers of columns (how many cultures are there?), it's probably not a good idea.

    Using the 1-to-many approach (one translation table) would work, but you can only set 1 collation on the "Value" column and it needs to be nvarchar.

    Having many tables (one per culture) would help with collations, however, it's maintenance is probably even more than the first option (multiple columns) and the JOINs would quickly become unmaintainable and complex as time goes on.

  • One thing I'd like to add is that you can use the one "Value" column for all the Latin-based charsets. I'm pretty sure all English, Spanish, French, etc can use one Collation. Therefore you only ever need to add additional Culture-specific columns for the different Collations of say Asian or Russian characters.

    Has anyone got anything better? I'd really like to know 🙂

Viewing 15 posts - 1 through 15 (of 40 total)

You must be logged in to reply to this topic. Login to reply