Pros and Cons - Table with loads of uniqe columns VS Row based keys with row data type

  • Hi,

    Which one is a better design strategy with respect to flexibility, performance and maintainability?

    A1: Defining Nullable columns in one table

    OR

    A2: Define character keys to identify attributes an entity/table and only populate the keys when data is available?

    For example:

    In A1 design table is

    Employee (Emp)

    EmpId, name, pay, department, dob, doj, dot

    In A2 design

    Employee (emp)

    id, name, pay

    Employee detail (edt)

    EmpId, EdtId, Edt_Data_type, data_value

    A1 is good for easy maintainance and might give better performance for reporting and searching? Might be more good with sparse columns?

    A2 is giving more flexibility in terms of future enhancements and easy adoptability of any enhancement. Simply add a new EdtID and no structural changes required as such. So lesser future development?

    What do you suggest?

    Thanks.

  • iBar (7/30/2012)


    Hi,

    Which one is a better design strategy with respect to flexibility, performance and maintainability?

    A1: Defining Nullable columns in one table

    OR

    A2: Define character keys to identify attributes an entity/table and only populate the keys when data is available?

    For example:

    In A1 design table is

    Employee (Emp)

    EmpId, name, pay, department, dob, doj, dot

    In A2 design

    Employee (emp)

    id, name, pay

    Employee detail (edt)

    EmpId, EdtId, Edt_Data_type, data_value

    A1 is good for easy maintainance and might give better performance for reporting and searching? Might be more good with sparse columns?

    A2 is giving more flexibility in terms of future enhancements and easy adoptability of any enhancement. Simply add a new EdtID and no structural changes required as such. So lesser future development?

    What do you suggest?

    Thanks.

    How about durability and history? I wouldn't put pay rate in the same table as the employee information, for example. I would, however, have a Type 2 Slowing Changing Dimension table for such a thing. Same for phone numbers, addresses, department history, and a couple of other things.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Personally I would stick to a relational model, so certainly not A2. But then I wouldn't like A1 either.

    The thing I would want to do is separate static from dynamic details. dob is static, and pay is not, so there are at leat two tables. name is not static either, it can change (for example on marriage), neither are employee bank account details (you need those unless you always pay in cash, not by money transfer in the banking system), but these (unlike pay) tend to change pretty rarely, and maybe it's useful to keep things that change very rarely separate from things that change often; but at the same time perhaps avoid keeping them with things that are totally static, since the dynamic things probably need associated history while the static things certainly don't. That makes me think that you have at least three tables, before you even look at the things that you might choose to split out to avoid nullable columns.

    Some of the dynamic things may not have a value yet, for example date of termination. This is where you have the possibility of a nullable column. These columns can be grouped into sets which whenever one column in a set has a value so do all the others in the same set (for example date of termination and reason for termination are a group, since either both have values or neither has). Each group could have its own table (with primary key employee id) which has rows only for non-null values. This is the classical way of avoiding null columns while retaining a relational structure. Alternatively, some groups could be bunched together so that some columns would be nullable. These two approaches to the groups may result in different storage requirements, so that needs to be looked at in case it turns out to be a critical difference. Whichever approach you adopt you have to cope with absent data; and there's not much difference in code difficulty between the two approaches, and no rule that says you have to take the same approach with each group.

    Some people get religious about this, and say "never have a nullable column". Others get religious and say "never force an outer join on yourself that you could avoid by having a nullable column". Both those groups of people are, in my view, utterly wrong.

    I too get religious, but what I say is "have a nullable column only when it's the right thing to do".

    Tom

  • Hi L' Eomot Inversé,

    One thing you might have missed OR may be i could not understand in your reply is the flexibility for future enhancement.

    For example:

    In first approach, if one need to add a new column in future DOT which was missed in initial design then that needs lots of changes in DB and application as well, atleast upto some extent.

    However, in Second approach, having an "Employee key dates" column with "type of date" column could give lots of flexibility. so if DOT was missed in initial design then by adding the data in type table will allow you to save a new type of information without any structure change? This will also give more control on data growth.

    Jeff, I could not understand your point about "How about durability and history?" Do you mean what i have tried to explain in second approach?

    SO which one do you recommend, more specially from performance point of view and flexibility for future enhancement? Please tell any any pros and cons.

    Thanks and Regards.

  • iBar (7/31/2012)


    Hi L' Eomot Inversé,

    One thing you might have missed OR may be i could not understand in your reply is the flexibility for future enhancement.

    For example:

    In first approach, if one need to add a new column in future DOT which was missed in initial design then that needs lots of changes in DB and application as well, atleast upto some extent.

    However, in Second approach, having an "Employee key dates" column with "type of date" column could give lots of flexibility. so if DOT was missed in initial design then by adding the data in type table will allow you to save a new type of information without any structure change? This will also give more control on data growth.

    Jeff, I could not understand your point about "How about durability and history?" Do you mean what i have tried to explain in second approach?

    SO which one do you recommend, more specially from performance point of view and flexibility for future enhancement? Please tell any any pros and cons.

    Thanks and Regards.

    The concern with you A2 approach (the EAV-style approach), is that it tends to quickly degenerate into an unmanageable scale. As in everything ends up being modeled into the Entity-Attribute-value tables, so querying for complex criteria becomes a perofrmance and scalability disaster. Try taking on a query like "find me all long term (5-years or more) customers whose name starts with 'A', live in Delaware and have had 5 or more orders over 500$ in the last three months" in one of those pure EAV scenarios, and you quickly discover just how big of a mess you've created for yourself. Or - try to enforce relational integrity (or any form of data constraint for that matter) on any attribute in an EAV model.

    The best approach IMO is a blended approach: build out using a traditional relational model whatever parts of your model you KNOW, and capture the oops'es using EAV as an interim step, WHILE you build out the changes to your longer-term stable model. A1 tends to leverage the strengths of the relational model, so should be the long-term answer for most things; A2 might have some flexibility, but loses a lot of features inherent to relational approaches and tends to invite performance and reliability issues.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • iBar (7/31/2012)


    Hi L' Eomot Inversé,

    One thing you might have missed OR may be i could not understand in your reply is the flexibility for future enhancement.

    For example:

    In first approach, if one need to add a new column in future DOT which was missed in initial design then that needs lots of changes in DB and application as well, atleast upto some extent.

    However, in Second approach, having an "Employee key dates" column with "type of date" column could give lots of flexibility. so if DOT was missed in initial design then by adding the data in type table will allow you to save a new type of information without any structure change? This will also give more control on data growth.

    I think you 'll find that it practice a relational approach (whether using nullable columns, or extra tables for each group of columns that would otherwise be nullable, or a mixture of those two) makes enhancement much less complex that and ERM-based structure. It is generally rather easier to control growth of storage in a relational system too.

    If you are adding new information you have to change the application, whatever model you are using. You can get away, in an EAV model, without changing the database schema, but that's at the cost of giving the application a rather low-level interface to the database which actually entails a lot of additional application complexity. Inventing a new attribute in EAV is much the same as adding a column to a table (or adding a new table), but the complexity of providing for efficient search on that attribute is generally trivial in the relational system and a total nightmare in the EAV system, and all that extra complexity is thrown into the application, which is probably written in a language not designed for wrting efficient manipulation of masses of data instead of being implemented mostly in a database language in which this sort of thing is easy. That's just one of the areas in which EAV requires much more coding change for each enhancement than a proper relational model with a clean application/database interface woild ever require. And your A2 is raw EAV, likely to deliver a system which is amazingly expensive to upgrade and even more difficult to obtain decent performance in the face of upgrades.

    Then there's storage. Each value for each attribute in the system you describe carries an identifier for the attribute and an encoding of the attribute's data type as well as the attribute's value. That is only going to be efficient when attributes are generally defined only for a small number of entities, and in that case you will much the same saving out of using sparse columns or having seprate tables for nullable groups that would otherwise be sparse. If the attribute is defined for more than a small fraction of the individual entities, the EAV model will take up more space than a relational approach.

    So basically I agree with Matt that you will do more from the point of view of performance and flexibility for future enhancement if you eschew EAV and stick within the relational model.

    I don't want to appear to speak for Jeff (he' much better than I am at SQL stuff) but I suspect that by "durability" he meant "not often having to tear up a lot of it up and start again to achieve enhancements" and by "history" he was simply referring to the requirement to have history that allows you to show what happened when. Regulatory requirements here mean we have to have a minimum of 6 years detailed history of personnel data such as pay and taxes deducted, and if they are significantly different in the states I'll have to ask som epeople why they've been claiming the US government imposes even more stronger regulatory requirements than does the UK gov. That's a pain in both relational and EAV models, and if its like everything else it'll be a bigger pain in EAV than in relational.

    Tom

  • The EAV model is an anti-pattern when attempts are made to apply it as a 'more flexible' replacement to a proper relational design. There are certainly places where it shines, for example in modleing sparse attributes, but they are niche.

    This article, and moreso the ensuing comments, parses the issue apart quite well:

    Keeping It Simple > EAV Fail

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (8/1/2012)


    The EAV model is an anti-pattern when attempts are made to apply it as a 'more flexible' replacement to a proper relational design. There are certainly places where it shines, for example in modleing sparse attributes, but they are niche.

    This article, and moreso the ensuing comments, parses the issue apart quite well:

    Keeping It Simple > EAV Fail

    hi opc.three,

    Only thing which is missing from your provided link is "EDT_data_type" e.g. Date, Num, String - which is certainly missing from the example in your provided link.

    Can data_type column and respective views on top solve this issue?

    Thanks

  • iBar (8/2/2012)


    opc.three (8/1/2012)


    The EAV model is an anti-pattern when attempts are made to apply it as a 'more flexible' replacement to a proper relational design. There are certainly places where it shines, for example in modleing sparse attributes, but they are niche.

    This article, and moreso the ensuing comments, parses the issue apart quite well:

    Keeping It Simple > EAV Fail

    hi opc.three,

    Only thing which is missing from your provided link is "EDT_data_type" e.g. Date, Num, String - which is certainly missing from the example in your provided link.

    Can data_type column and respective views on top solve this issue?

    Thanks

    The article's author is explaining why he believes we should not use EAV for much of anything. Your question about data types is one of the primary reasons supporting that position. So yes, you can workaround the issue with views, triggers and complicated constraints but you'd be reimplementing something the DB engine provides for free when using a proper relational design.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks everyone for your value input.

    Thanks & kind regards.

  • The EAV model is as everything else a tool. Used correctly, though it rarely is, is extremely powerful. The Relational model when implemented to address a specific problem is quick, robust and maintainable. Can they be used together ? Sure... but there are boundries that if crossed. will bring a project to it's knees. Sure storing data in 'skinny' structures is easy but the implementation of an application to manange, modify and view that data is complex and error prone. But one can't deny the ability to store vast amount of data in a single table. SQL Server 2008 and on employ Spare tables to help address the storage of tables with large numbers of columns and no data but, they to, have limitations. For example range check constraints are not possible (though Data types are). So whatever the solution you are after be aware of the design considerations required to achieve your aims.

    COde On:-P

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply