Third Normal form & some related things

  • With the third normal form, is there any kind of general "rule of thumb" For when you should or should not apply it? I understand that a lot of small tables can cause performance issues, and that it can cause the database to become more complex, but is there any kind of rule about relative sizes of tables that should be split off? Also, for "smaller" fields (numeric values, small varchar fields), at which point does the performance difference between linking to another table vs having the data contained in the same table either become negligible or turn in favor of keeping things in one table?

    In a sort of similar vein, at which point does it make more sense to use a view rather than a table for something? While it is possible to use a view to pull up data, and having redundant data is bad, is it justifiable to have redundant data for speed purposes? For example, the table that would have redundant data in it is only a couple hundred MB, while the table that it could be joined to on a view is a few hundred GB, does it make sense to create redundant data, but in a smaller table for something that gets accessed frequently? For example - Table 1 (100,000 rows, 5 columns 3 are redundant data ) vs

    Table1 joined to table 2 (100,000,000 rows, 20 columns). While the building of table 1 and adding in of the redundant fields takes time, the joining to table2 takes place one time, vs having to join to table 2 every time somebody wants to look at the report.

  • The rule of thumb is that you always go to 3rd Normal Form (or preferably to Elementary Key Normal Form) and then look at whether you need to go to a higher normal form. But that's because you want to be sure that you don't have buggy or overcomplicated code, not a performance issue.

    And usually, going to 3NF (or EKNF) instead of staying at some lower normal form will improve performance as well as reducing code complexity, because it will reduce the size of your data.

    As a general rule, any redundancy in data costs both performance and code complexity so that you end up with buggy and non-performant code. But there may be cases where redundancy can be justified - but you need to verify the justification by seeing what actually happens when you normalise the redundancy out.

    Obviously if you build a data warehouse which doesn't permit any update it can safely be in a much lower normal form (hence more redundancy so greater data size than if it were properly normalised) and it will be worth doing some performance tests to see whether the normalised or the unnormalised form gives better performance in this no update case.

    Tom

  • Steven.Grzybowski (8/11/2016)


    With the third normal form, is there any kind of general "rule of thumb" For when you should or should not apply it? I understand that a lot of small tables can cause performance issues, and that it can cause the database to become more complex, but is there any kind of rule about relative sizes of tables that should be split off? Also, for "smaller" fields (numeric values, small varchar fields), at which point does the performance difference between linking to another table vs having the data contained in the same table either become negligible or turn in favor of keeping things in one table?

    In a sort of similar vein, at which point does it make more sense to use a view rather than a table for something? While it is possible to use a view to pull up data, and having redundant data is bad, is it justifiable to have redundant data for speed purposes? For example, the table that would have redundant data in it is only a couple hundred MB, while the table that it could be joined to on a view is a few hundred GB, does it make sense to create redundant data, but in a smaller table for something that gets accessed frequently? For example - Table 1 (100,000 rows, 5 columns 3 are redundant data ) vs

    Table1 joined to table 2 (100,000,000 rows, 20 columns). While the building of table 1 and adding in of the redundant fields takes time, the joining to table2 takes place one time, vs having to join to table 2 every time somebody wants to look at the report.

    Do the two tables have a 1:1 relationship? If so, then the tables may be improperly designed. If not, then you need to put up with the join because, as Tom states, it's going to be a whole lot easier to maintain and, because of the reduced column width, could actually be faster that one huge denormalized table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply