EAV design or Normalized

  • Hi designing a financial budget database, expenditure items require the amount, frequency and comment, there can be over 100 expenditure items, this would create a table with over 300 columns for example

    FoodAmount

    FoodFrequency

    FoodComment

    FuelAmount

    FuelFrequency

    FuelComment

    etc...

    Would splitting this into 3 tables like the following be normalised or is it just an EAV?

    ---------------------

    ExpenditureItemID

    ExpenditureItemName

    ---------------------

    ---------------------

    FrequencyID

    FrequencyName

    ---------------------

    ---------------------

    ExpenditureID

    ExpenditureItemID

    FrequencyID

    Amount

    Comment

    ---------------------

    Thanks for looking, would welcome any other ideas people have.

    Thanks.

    S t e f

  • Presumably each of these things turns up more than once, otherwise there are 300+columns in a table which only has one row. But you haven't yet defined what causes there to be more than one row, and your break into several tables only caters for one row in the original table. So first identify what there is in that very wide table that distinguishes one row from another and call that set of columns the primary key for that table.

    After that start by looking at what you get from a table which has the

    (Expenditure type varchar(some number) not null - this together with the key of teh original wide table forms the key of this table - and of course this Expenditure should be the last column in the primary key for this narrower table,

    Frequency varchar(some number) not null defalt('never'),

    Amount decimal(some precision and scale) not null default(0),

    Comment varchar(some number) not null default ('no comment') )

    or something like that.

    If it's going to be worth using a tinyint for expenditure type id instead of a varchar for expenditure type and having an extra join on many queries to get the id from the name, then do that - but don't start by doing it without working out whether it's pointless or not. Presumably Frequency is things like "daily", "monthly", "quarterly" and so on? Again, if it looks as if it would be worth adding the extra join to many queries, have an extra table to define a tinyint surrogate key, don't just assume it's the right thing to do without thinking about it. and what about the material that was the original wide table's primary key? - it might be useful to have a surrogate for that, and maybe the defining table for that surrogate will contain other columns from the original wide table that don't fit into the expenditure type/frequency/amount/comment set of columns, so that would be another table (and another join in many queries).

    This isn't remotely like an EAV model, and I don't understand why you thought it was. It's an ordinary relational model, potentially with surrogate keys introduced to save storage space and disc IO - but only introduce them if they give a worthwhile saving.

    Tom

  • Hi thanks for the reply.

    I think it looks like an EAV as I've effectively take an attribute and changed it to a row:

    Before:

    Expenditure

    BudgetID

    FoodAmount

    FoodFrequency

    FoodComment

    FuelAmount

    FuelFrequency

    FuelComment

    etc...

    After:

    Expenditure

    Expenditure

    BudgetID

    ExpenditureItemID

    Amount

    Frequency

    Comment

    ExpenditureItem

    ExpenditureItemID

    ExpenditureItemName

    -------------

    Has anyone come across a similar issue for instance a parts table which have common attributes such as weigh, length etc.

  • S t e f (12/12/2013)


    Hi thanks for the reply.

    I think it looks like an EAV as I've effectively take an attribute and changed it to a row:

    Before:

    Expenditure

    BudgetID

    FoodAmount

    FoodFrequency

    FoodComment

    FuelAmount

    FuelFrequency

    FuelComment

    etc...

    After:

    Expenditure

    Expenditure

    BudgetID

    ExpenditureItemID

    Amount

    Frequency

    Comment

    ExpenditureItem

    ExpenditureItemID

    ExpenditureItemName

    -------------

    Has anyone come across a similar issue for instance a parts table which have common attributes such as weigh, length etc.

    Your proposed solution is a valid relational model called a "star schema". L'Eomot Inverse gave you good advice on how to decide whether this works for you.

    An EAV schema can usually be recognized by a main table with three columns defining an entity, an attribute, and a value. In your case, that might look like this:

    ExpenditureTable

    ExpenditureID ExpenditureAttribute ExpenditureValue

    1234 Item Food

    1234 Amount 34.56

    1234 Frequency Daily

    5678 Item Fuel

    5678 Amount 137.56

    5678 Frequency Weekly

    You can see that a single entity, an expenditure, has a row for each attribute. To put together a complete picture of that expenditure, you have to select several rows and probably pivot or cross-tab them into a single row to make the data more easily understood by human beings. Sometimes, this is made even harder by the fact that the EAV table holds codes, which means that lookup tables must be joined to get names. People often think that EAV structures will be great because you can add attributes to an entity at any time without requiring a schema change, but you pay for that flexibility with huge difficulties of querying the data.

    Taken to an extreme, you could do this:

    ItemTable

    ItemID ItemAttribute ItemValue

    1234 ItemType Expenditure

    1234 Category Food

    1234 Amount 34.56

    1234 Frequency Daily

    5678 ItemType Customer

    5678 CustomerName John Smith

    5678 TelephoneNumber 555-555-1234

    which you might call the "one table to rule them all" approach. In addition to the usual difficulties with EAV schema, you now have to maintain lookup tables that define the valid attributes for each entity and such.

    This is different from a star schema, which just substitutes label-type values (like an item name) for a code (usually an integer) that corresponds to a row in a lookup table. This can be useful for a number of reasons. Saving storage space and maintaining standardization of values is a common reason - instead of storing the strings "Purple", "Orange", and "Yellow" in a Color column of an Item table, which takes 6 bytes for every row, you can use tinyints 1, 2 and 3 in a ColorID column, which takes only one byte per row. When it's time to retrieve the data, you can join the Colors table to the Item table on ColorID and include Colors.ColorName in the SELECT list. When the marketing department decides that the color "Purple" should be called "Aubergine" instead, you only have to update one row in the Colors table rather than many rows in the Item table. The requirements of a particular project will determine whether this approach is worth the cost.

    The star schema is also the basic data structure used in dimensional modeling, usually to support aggregrate analysis of data.

    Jason Wolfkill

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply