Normalization vs Denormalization

  • Hello

    I have some doubts when should i use the normal form or when i shouldn't.

    I'm creating a very large application (kind of ERP) that has several modules, and sometimes in some places i see that if i put the data in denormalized form it's better to handle, but i don't know if i will get problems in the future if i take this decision.

    So if someone knows some guidelines, best practices when to choose the denormalized form over the normalized , please write them here.

    Thanks

  • for transactional relational databases especailly a large ERP chose normalised, it would be bad practice not too, it may seem 'easier' not too but it will lead to problems in the future.

    for data warehouses then you can use de-normalised.

  • Thanks for the reply.

    But even if i don't use it right now, i d like to know when and where to do it.

    DataWarehouses is obvious that must be denormalized. 😉

  • Normilastion is part of relational database modelling and if you are creating a large scale system should be used everywhere.

    Without knowing anything about the system and data it will be impossible for anyone to tell you anything more specific.

    To be honest if you are designing a large ERP for a client and you are struggling on the first step then maybe you should look at getting some help.

    Or is this just a homework type question?

  • Hehehe

    This was a small application that is increasing in the time, by adding other functions/modules/options.

    Now i have to redesign one of the modules, and i was looking to the tables that are related with it and i was thinking that i maybe could replace some of the tables...

    For example i have one section in the application where the users (workers) need to register hours for every working day, every user belong to a team, but the users can change of team, so the history doesn't be right if i use a normalized table.

    Small example:

    Team

    ----------

    ID | Name

    ----------

    1 - TeamA

    2 - TeamB

    Users

    ------------------

    ID | Name | Team

    ------------------

    1 - User1 - 1

    WorkDay

    -----------------------------------------

    Date | User | TimeWasted | Project

    -----------------------------------------

    27.10.2010 | 1 | 10 | ProjectA

    Later the user change to TeamB, now the table looks like this

    Users

    ------------------

    ID | Name | Team

    ------------------

    1 - User1 - 2

    If i go and try to check where the TeamB worked i'll get the wrong info because i have one row that tells me that it worked in ProjectA.

    BTW - The application was built only by me.

  • rootfixxxer (10/28/2010)


    DataWarehouses is obvious that must be denormalized. 😉

    Really?

    They often are, but there's no such rule.

    Normalise fully. 4th normal form is probably a safe place to stop. Denormalise as necessary and with good reasons. Not bothering to normalise past 1st is not denormalisation, it's lazyness.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yes there isn't any rule that tells that, but it should be... 🙂

    Not bothering to normalise past 1st is not denormalisation, it's lazyness.

    😀

  • rootfixxxer (10/28/2010)


    Yes there isn't any rule that tells that, but it should be... 🙂

    Should it? Always? In every case?

    Why?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • In an transactional database, I always default to at least 3rd normal form. If you want to create some denormalized summary or staging tables on the side for the purpose of speeding up reports or fetching historical totals into an application screen, then that's OK.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • rootfixxxer (10/28/2010)


    For example i have one section in the application where the users (workers) need to register hours for every working day, every user belong to a team, but the users can change of team, so the history doesn't be right if i use a normalized table.

    ...

    If i go and try to check where the TeamB worked i'll get the wrong info because i have one row that tells me that it worked in ProjectA.

    ...

    Actually the problem is that your existing design is not really normalized entirely. You are coding team as an "attribute" of the user in the same way as their name or date of hire. However, team membership is an "association", even if they can only be a member of one team at one time. Therfore, you need a User_Team table which relates a user with a team and also Start_Date and End_Date columns indicating when that membership was in effect. The same goes for things like User_Address, User_Phone (which would also include a column indicating "home" or "cell"), and User_Salary, because those things are also either associations or contextual over time.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • As mentioned it does not look like you have normalised to begin with,

    when desiging a database you should not worry too much about how the application structures the data, and concentrate more on solid database design following tried and tested methods.

    Yes there isn't any rule that tells that, but it should be...

    there is no rule saying a DW should be denormalised, but it can be. What level you do depends a lot on the data and reporting requirements.

  • How do i quote some one?!

    @gilamonster

    Should it? Always? In every case? Why?

    I have little knowledge about developing dw's, but the little thing that i know for example in a star scheme i have a table facts in the middle, and then the related info in the other tables (dates, clients, suppliers, whatever), and i know there are some approaches that use normalized tables...

    But i'm just a noob/beginner in the dw area...

    @eric Russell 13013

    Like i wrote in the beginning this was a small application, i didn't know how the application will grow, plus i didn't know any requirement, so the tables were designed to handle the requirements in the beginning... The tables that i put in the example have much more columns,

    So you say to put some extra tables to handle this kind of situations?

    I have a similar problem with a module of loading notes (items to pick to a truck, doesn't know the correct name).

    In the start of this process the users tell that need to load 3 units of the Item A to delivery day A, 4 units Item B to delivery day B, and so on. The next step, one user will create the loading document to send it to the warehouse, this process it's done by selecting the predefined items and submitting the document to the warehouse. When the document arrives at the warehouse the user responsible for the loading can remove some of the items, and loading the rest.

    So i need to maintain the original document intact, but the predefined items will be available to add to another loading document. In this case what should i do? Insert the ID from the predefined item and add another table (don't know how to manage this) to control when was dispatched? Insert the info directly (denormalized) because it's to generate reports?

    Thanks

    And sorry if i can't express myself correctly English it's not my mother language

  • rootfixxxer (10/28/2010)


    How do i quote some one?!

    @gilamonster

    Should it? Always? In every case? Why?

    I have little knowledge about developing dw's, but the little thing that i know for example in a star scheme i have a table facts in the middle...

    This is what we call Dimensional Modeling(*) - the Kimball(istic) approach but to be fair there is life beyond it. You may want to research Inmon, which approach allows for building a DWH in the 3NF.

    You know, when you talk DWH you must have some background info in both Kimball and Inmon.

    Hope this helps.

    (*) Which I particularly like for a DWH environment.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • GilaMonster (10/28/2010)


    Normalise fully. 4th normal form is probably a safe place to stop. Denormalise as necessary and with good reasons. Not bothering to normalise past 1st is not denormalisation, it's lazyness.

    It's not just laziness. It's also stupidity and irresponsible negligence.

    But I don't like to start at 4NF, instead I usually start out aiming at EKNF, using Bernstein's algorithm to eliminate functional dependencies. That results in a schema that is in EKNF (hence also in 3NF) and has the respresentation property - every functional dependency is represented by the keys. At that point I look to see whether I want any more normalisation - I don't want multivalued dependencies because I don't want to have to code around update and insertion anomalies, but I also don't want to sacrifice the representation property and have to produce code to suppert functional dependencies. In cases where there are no multivalued dependencies I'm already in 4NF so I'm happy; if there are multivalued dependencies that can be eliminated without sacrificing representation, I eliminate them (and if all my multivalued dependencies get eliminated, I'm in 4NF, so happy again). If there are still some multifunctional dependencies I have to decide for each one whether to normalise it out and accept the difficulty of handling some unrepresented functional dependencies, or leave it in and accept the penalty of dealing with it in code - but this is a fairly rare situation, most business problems can be handled by a schema that both is 4NF and has the representation property (it's a pity that they can't all be done that way).

    In effect, rather than starting from 4NF and denormalising where neccessary, I start from EKNF and do additional normalisation where it's both useful and sensible. I suspect the results of both approaches will be the the same, but the means of getting there is different.

    Tom

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply