RE: Multiple Language Database Design - Additional Columns vs. Additional Tables...

SSC Enthusiast

Points: 126

July 29, 2008 at 5:54 pm

SOME PROS AND CONS OF THE DIFFERENT OPTIONS

I can see 3 different approaches regarding the storage of content in different languages:

1) one 'master' table holding invariant content, and an auxiliary table holding in different rows the versions of data (structure: idItem, idLang, text);

2) one 'master' table for invariant content and auxiliary tables -one for each language- (structure: idItem, text);

3) only one table, with different columns for the different languages for each translated field (structure: idItem, idCategory, text-en_US, text-es_ES, ...)

I don't consider the many 'master' tables with and only ONE auxiliary table holiding the different versions of ALL the tables (format: idItem, table_name, lang_id, text). I find it very similar to model 1 but with serious dissadvantages: this will be a huge table with n*m*L rows, and it needs a mechanism to deal with tables using different quantities of columns as PKs.

The following analisys is for a system consisting of T tables with m translated fields and n rows, and L languages.

Fallback refers to a mechanism to present the text in an alternative language in case text in the original language is not available.

The pros (-) and cons (+) I can think of for each approach are:

1) master + translationS table

+ saves on empty translated values

+ language add: do nothing

- one charset, collation fits all

- slower (requires 1 JOIN on n*L rows, returns m+1 rows to be processed)-can this be optimised with stored procedures?

- adds T tables

- adds n*m*L rows in total

- fallback fair: all translations must be fetch (returns (m*L)+1 rows to be processed)

2) master + translation tableS

+ each table with its own appropiate charset and collation

+ saves on empty translated values

- slower (requires 1 JOIN on n*L rows, returns L+1 rows to be processed)-can this be optimised with stored procedures?

- adds T*L tables;

- language add: N new tables must be created

- adds n*m*L rows in total

- fallback very expensive: all translations must be fetch (requires L JOINs on n*L rows, returns (m*L)+1 rows to be processed)

3) one table, multiple columns for translated fields

+ each column with its own appropiate charset and collation (does the DB or table definition impose any limit?)

+ does not add any row

+ faster (requires 1 SELECT on n rows, returns 1 row to be processed)

+ fallback unexpensive: texts in every language are retrieved in the original query

- language add: m new columns in the T different tables

- adds T*m*L columns in total

It seems to me that each particular situation make one of the alternatives the best (i.e. if most of the translated values are empty saving that space is be relevant, otherwise it's not). Anway it seems to me that option no. 3 -even if it looks like the most prosaic of them all- has several important advantages over the other ones. This option makes the administrator work harder (every time a language is added-hopefully not that ofen) but performs much faster on each user click (many times every hour). Or am I forgetting something?