How much database normalization is enough?

  • Take 1NF. It seems we can always divide something into even smaller units. A street address has a number, street name, and suffix such as RD, AVE, BLVD, etc. There can also be a compass direction. E.g., 3704 E. Main St. But, why would we ever want to break the street address up into these smaller parts? We almost never want to query on these parts (Give me a count of addresses grouped by suffix?). 

    Atomic is a business unit.  The choice has to make business sense.  This is part of the "art" aspect of design.  As you point out, you may never need to break out 3704 E. Main St..  In all but one design, that is the atomic unit.  But I did have one design where it was important to know the number of living units for a given street.  In the normal configuration, this would be problematic.  So I had a street table and a housing unit table. 

    But I've worked with a database that had both the state and the state code in the address table.  This is an obvious breakout situation.  There were many addresses that were AK meant Alaska and many others where AK meant Alasaka.  Naturally this caused issues that weren't resolved until it was fixed.  But if it had been designed properly, there would never have been an issue.

  • xsevensinzx - Monday, July 23, 2018 7:45 AM

    RonKyle - Monday, July 23, 2018 7:03 AM

    This is a scary editorial, and the preceding comments are also scary.  Normalization is not some wonderful thought experiment that it can be nice to have, but it's not strictly speaking necessary.  It applies a mathematical rigor to the art and science of relational database development.  

    Normalization through the third normal form is sufficient for most databases, and most sections of any database.  It is derelict of any database designer to suggest that normalization to this point is "optional."  The normal forms above that address specific cases that the first three normal forms do not adequately address.  

    Unfortunately, it is an option and not a poor one to suggest it's optional. In cases like mine, normalization dramatically decreases the performance of reading the data. If you were to come in as a database designer and told me that you were going to normalize the 10 TB of data in our data warehouse just because it's what you have always done with other traditional SMP databases, then you're going to be shown the door.

    When it comes to normalization and even denormalization, it just depends...

    That would depend on how the data warehouse is designed.

    Inmon style DW are built using 3rd normal form and the data marts built using dimensional models.
    Kimball style uses dimensional model conformed data marts for the DW.
    Then there is the Data Vault model.

    If you are already using dimensional model conformed tables (Kimball), then yes, it wouldn't make sense convert a 10 TB data warehouse to the Inmon style.

  • I think that normalization can be taken to far.  I worked for a company a few years back that was doing this very thing.  I should mention an outside company was brought in to do the  initial normalizing of the data.  We got involved much later, so a lot of rework was needed since they didn't understand the data. Everything was normalized down to tables that had maybe 2 elements in them, the 'key' ID to get to the record and one containing the actual value.  And it would sometimes take joining 10(or more) tables together to get to that one value.  Maybe for some applications it is needed at that level, but it went way to far IMHO for a data warehouse.  
    The funny thing is as the number of tables grew(at least 100+ tables) as we tried to normalize everything, someone said "This is going to be to hard for most people to get at the data"  So they decided another group needed to start finding ways to pull the data out of this model back into more meaningful tables.  This proved to be valuable since some of the data couldn't be put back together the way is was modeled, more rework.    To think of all the millions of dollars spent would make you sick.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • I think that normalization can be taken to far. I worked for a company a few years back that was doing this very thing. I should mention an outside company was brought in to do the initial normalizing of the data. We got involved much later, so a lot of rework was needed since they didn't understand the data. Everything was normalized down to tables that had maybe 2 elements in them, the 'key' ID to get to the record and one containing the actual value. And it would sometimes take joining 10(or more) tables together to get to that one value. Maybe for some applications it is needed at that level, but it went way to far IMHO for a data warehouse. 

    This isn't normalization; it's nuttiness run amok.  One of the keys is that you said they don't understand the data.  Normalization cannot be properly done without an understanding of the data.  You can't simply look at something beyond the most basic lookups and say it's normalized (or not).  If a table only had two elements, something was wrong.  Elements with a one to one relationship to each other should as a rule be in the same table.  Also, as you indicate it was for a data warehouse, then it's really wrong as they were trying to normalize at all.  Don't judge normalization by this because what you've described isn't that.

  • RonKyle - Monday, July 23, 2018 1:19 PM

    I think that normalization can be taken to far. I worked for a company a few years back that was doing this very thing. I should mention an outside company was brought in to do the initial normalizing of the data. We got involved much later, so a lot of rework was needed since they didn't understand the data. Everything was normalized down to tables that had maybe 2 elements in them, the 'key' ID to get to the record and one containing the actual value. And it would sometimes take joining 10(or more) tables together to get to that one value. Maybe for some applications it is needed at that level, but it went way to far IMHO for a data warehouse. 

    This isn't normalization; it's nuttiness run amok.  One of the keys is that you said they don't understand the data.  Normalization cannot be properly done without an understanding of the data.  You can't simply look at something beyond the most basic lookups and say it's normalized (or not).  If a table only had two elements, something was wrong.  Elements with a one to one relationship to each other should as a rule be in the same table.  Also, as you indicate it was for a data warehouse, then it's really wrong as they were trying to normalize at all.  Don't judge normalization by this because what you've described isn't that.

    I wasn't judging, just said it wasn't a good idea for a DW.  And I did say "Maybe  for some applications it is needed at that level ".  You understand it was a bad idea, I understand it was a bad idea, but neither of us were making the decisions. 😀  And I think this illustrates the point Ben was trying to make in the article that it can go to far.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 - Monday, July 23, 2018 2:07 PM

    RonKyle - Monday, July 23, 2018 1:19 PM

    I think that normalization can be taken to far. I worked for a company a few years back that was doing this very thing. I should mention an outside company was brought in to do the initial normalizing of the data. We got involved much later, so a lot of rework was needed since they didn't understand the data. Everything was normalized down to tables that had maybe 2 elements in them, the 'key' ID to get to the record and one containing the actual value. And it would sometimes take joining 10(or more) tables together to get to that one value. Maybe for some applications it is needed at that level, but it went way to far IMHO for a data warehouse. 

    This isn't normalization; it's nuttiness run amok.  One of the keys is that you said they don't understand the data.  Normalization cannot be properly done without an understanding of the data.  You can't simply look at something beyond the most basic lookups and say it's normalized (or not).  If a table only had two elements, something was wrong.  Elements with a one to one relationship to each other should as a rule be in the same table.  Also, as you indicate it was for a data warehouse, then it's really wrong as they were trying to normalize at all.  Don't judge normalization by this because what you've described isn't that.

    I wasn't judging, just said it wasn't a good idea for a DW.  And I did say "Maybe  for some applications it is needed at that level ".  You understand it was a bad idea, I understand it was a bad idea, but neither of us were making the decisions. 😀  And I think this illustrates the point Ben was trying to make in the article that it can go to far.

    Requires lots of left outer joins to get the data back together.  Won't store NULL values in the tables but doesn't eliminate them when you have to join tables back together.

  • And I think this illustrates the point Ben was trying to make in the article that it can go to far.

    And my point is that this isn't normalization.  If this is what Ben was talking about, he wasn't talking about normalization.

    My experience, however, is the people who don't think it's important either don't understand it or don't want to apply the rigor that's required.  Like being a mechanic and not wanting to understand how the engine works in my view.

  • Lynn Pettis - Monday, July 23, 2018 11:20 AM

    xsevensinzx - Monday, July 23, 2018 7:45 AM

    RonKyle - Monday, July 23, 2018 7:03 AM

    This is a scary editorial, and the preceding comments are also scary.  Normalization is not some wonderful thought experiment that it can be nice to have, but it's not strictly speaking necessary.  It applies a mathematical rigor to the art and science of relational database development.  

    Normalization through the third normal form is sufficient for most databases, and most sections of any database.  It is derelict of any database designer to suggest that normalization to this point is "optional."  The normal forms above that address specific cases that the first three normal forms do not adequately address.  

    Unfortunately, it is an option and not a poor one to suggest it's optional. In cases like mine, normalization dramatically decreases the performance of reading the data. If you were to come in as a database designer and told me that you were going to normalize the 10 TB of data in our data warehouse just because it's what you have always done with other traditional SMP databases, then you're going to be shown the door.

    When it comes to normalization and even denormalization, it just depends...

    That would depend on how the data warehouse is designed.

    Inmon style DW are built using 3rd normal form and the data marts built using dimensional models.
    Kimball style uses dimensional model conformed data marts for the DW.
    Then there is the Data Vault model.

    If you are already using dimensional model conformed tables (Kimball), then yes, it wouldn't make sense convert a 10 TB data warehouse to the Inmon style.

    The point I was trying to make is although the engine can support the Inmon style DW, it performs the worst with it because of the technical limitations. In just straight up SQL Server, that's a different story. 😀

  • RonKyle - Monday, July 23, 2018 2:16 PM

    And I think this illustrates the point Ben was trying to make in the article that it can go to far.

    And my point is that this isn't normalization.  If this is what Ben was talking about, he wasn't talking about normalization.

    My experience, however, is the people who don't think it's important either don't understand it or don't want to apply the rigor that's required.  Like being a mechanic and not wanting to understand how the engine works in my view.

    Straight from his article:
    "The issue I sometimes see with over normalization is when the number of tables explodes since there are only one or two columns on each table. The queries of that highly normalized system end up having ten or more joins for every query."
    So he was talking about it some.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • xsevensinzx - Monday, July 23, 2018 2:20 PM

    Lynn Pettis - Monday, July 23, 2018 11:20 AM

    xsevensinzx - Monday, July 23, 2018 7:45 AM

    RonKyle - Monday, July 23, 2018 7:03 AM

    This is a scary editorial, and the preceding comments are also scary.  Normalization is not some wonderful thought experiment that it can be nice to have, but it's not strictly speaking necessary.  It applies a mathematical rigor to the art and science of relational database development.  

    Normalization through the third normal form is sufficient for most databases, and most sections of any database.  It is derelict of any database designer to suggest that normalization to this point is "optional."  The normal forms above that address specific cases that the first three normal forms do not adequately address.  

    Unfortunately, it is an option and not a poor one to suggest it's optional. In cases like mine, normalization dramatically decreases the performance of reading the data. If you were to come in as a database designer and told me that you were going to normalize the 10 TB of data in our data warehouse just because it's what you have always done with other traditional SMP databases, then you're going to be shown the door.

    When it comes to normalization and even denormalization, it just depends...

    That would depend on how the data warehouse is designed.

    Inmon style DW are built using 3rd normal form and the data marts built using dimensional models.
    Kimball style uses dimensional model conformed data marts for the DW.
    Then there is the Data Vault model.

    If you are already using dimensional model conformed tables (Kimball), then yes, it wouldn't make sense convert a 10 TB data warehouse to the Inmon style.

    The point I was trying to make is although the engine can support the Inmon style DW, it performs the worst with it because of the technical limitations. In just straight up SQL Server, that's a different story. 😀

    With the Inmon, you don't access the 3rd normal form designed database.  It is used to populate the data marts that are used by users o the database.  Same with the Data Vault.  These two are designed differently.  I like the Data Vault paradigm.

  • Japie Botma - Monday, July 23, 2018 5:43 AM

    I agree Ben.  Small systems usually have limited resources on hardware.  Joining 6 tables instead of 10 gives you quite a performance increase.  Also, and I know many people will not agree, but foreign keys slow a system.  Doing everything via stored procedures still gives you the control when what happens AND THE SPEED.

    I've got to say that, like all else in SQL Server, "It Depends".  Joining with more tables doesn't automatically mean that something is going to take longer.  In fact, I've seen it take substantially less time and resources because of the extra filtering that the join created.

    The same holds true with FKs.  We have a table with many (about  three dozen) FKs on it and it would quite literally take more than 3 minutes to delete (for example) just one row.  Everyone was up in arms about the FKs and wanted to drop them for the sake of performance.  I added the correct indexes for the FKs (which is a "best practice" that many forget about) and even multi row deletes ran in milliseconds rather than minutes.

    There is no need to cripple functionality for the sake of speed and no need to cripple speed for the sake of functionality in most cases.  As the old saying goes, "Must look eye!".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • RonKyle - Monday, July 23, 2018 2:16 PM

    And I think this illustrates the point Ben was trying to make in the article that it can go to far.

    And my point is that this isn't normalization. If this is what Ben was talking about, he wasn't talking about normalization.

    My experience, however, is the people who don't think it's important either don't understand it or don't want to apply the rigor that's required. Like being a mechanic and not wanting to understand how the engine works in my view.

    Straight from his article:
    "The issue I sometimes see with over normalization is when the number of tables explodes since there are only one or two columns on each table. The queries of that highly normalized system end up having ten or more joins for every query."
    So he was talking about it some.
    Yes, I see what you mean.  However, normalization does not equal more tables.  I worked with a design (not mine) many years ago that had the data for a single job spread across four tables.  One table only held the business key, and the join for all these tables wasn't the business key (an integer) but a GUID data type.  The normalization rules would have said that all the columns have a functional dependency on this same business key and should be in the same table.  It may be necessary for business reasons to separate some columns out due to security (strong argument for breaking out the social security numbers in a separate table despite the otherwise one to one relationship with other employee data) or cardinality (e.g. something that applies to only a portion of the employees, even if when it does the ratio is one to one).  But that was not the case here.  So not only were there unnecessary joins, but these joins were made using a GUID.  And yes, the database did have performance issues.  The design defects couldn't be corrected at that point.

  • Normalisation has caused me no end of issues. I was contracted to a firm (I won't name them) to fix the slowness of a system they had had a massive consultancy in to write (they off-shored it). When I turned up, I thought, this shouldn't be too bad and probably a few decent indexes could fix it, then I saw the actual database and nearly quit on the spot, but thought I'd take the challenge instead. I have never seen its like before or since. As an example, it took 36 joins just to get a customers name and address, no other information like account numbers etc! That was a 'fun' few months and I learned a great deal, then the firm got said massive consultancy in to fix their own issue and I have always refused to do work for them, so I didn't accept an extension.
    I guess the lesson is, you can go far too far with normalisation..

  • As an example, it took 36 joins just to get a customers name and address, no other information like account numbers etc!

    That is NOT normalization.  In a properly normalized database, all columns that can share one key would be in the same table.  If there is a customer id key and it has a one to one relationship with the customer name and it has a one to one relationship with the address both would be required to be in the same table with the customer id key as the PK for that table.  You are not describing a normalized database but an unnormalized one that you are for some reason characterizing as over-normalized.

  • No, what I am describing still gives me nightmares!

    Think about it this way, every address has numbers that are repeated, every street has a name that is repeated, every town has... You get the idea..

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply