How much database normalization is enough?

  • Rick-153145 - Wednesday, July 25, 2018 5:39 AM

    No, what I am describing still gives me nightmares!

    Think about it this way, every address has numbers that are repeated, every street has a name that is repeated, every town has... You get the idea..

    Ok, I could buy state. And maybe city, assuming there was additional fields needed by the city. And POSSIBLY the street name (although that's really stretching it). But street number? Seriously? :blink:

    I mean we use a separate address table since a lot of our sites actually contain multiple pieces of hardware. But 36 joins for a customer name and address? That sounds like a tall tale told in a bar to get a free drink! I'm not saying I don't believe you, but who in their right mind would design something like that?

  • every street has a name that is repeated, every town has... You get the idea..

    Actually, I don't.  For most database designs, the house/apt number + street is atomic, plus the city, plus the state, plus the zip.  The zip may point to a table.  This could often alone normally identify the city and state, but not always.  The state may have a lookup to a state table, but chances are the city won't.  As I've indicated, I had one design where the house number and the street had to be split to be able to count units by street.  Fortunately it was a small scale.  Not sure how I would have done that on a larger scale. 

    Too many of you seem to have the idea that normalization means many tables, and therefore you don't "take it too far."  That is not what normalization does.  As I've indicated, I've observed unnormalization by too many tables.  We are database professionals and this isn't supposed to be some abstract idea that we can use or not use as we see fit.  And I make such a big deal about it because I've had to, and continue to, live with the results of databases that weren't properly designed.  It's not some theory that makes it harder, it is a failure to apply the rigor required in properly normalizing a database to at least 3NF, which certain unusual cases being covered by normal forms "above" that.

  • RonKyle - Wednesday, July 25, 2018 6:45 AM

    every street has a name that is repeated, every town has... You get the idea..

    Actually, I don't.  For most database designs, the house/apt number + street is atomic, plus the city, plus the state, plus the zip.  The zip may point to a table.  This could often alone normally identify the city and state, but not always.  The state may have a lookup to a state table, but chances are the city won't.  As I've indicated, I had one design where the house number and the street had to be split to be able to count units by street.  Fortunately it was a small scale.  Not sure how I would have done that on a larger scale. 

    Too many of you seem to have the idea that normalization means many tables, and therefore you don't "take it too far."  That is not what normalization does.  As I've indicated, I've observed unnormalization by too many tables.  We are database professionals and this isn't supposed to be some abstract idea that we can use or not use as we see fit.  And I make such a big deal about it because I've had to, and continue to, live with the results of databases that weren't properly designed.  It's not some theory that makes it harder, it is a failure to apply the rigor required in properly normalizing a database to at least 3NF, which certain unusual cases being covered by normal forms "above" that.

    What I think Ben, Rick and I are describing is something that was originally pitched as normalizing the data.  But something happened and this concept went to far, exactly what Ben was asking in the article.  Now I'm not saying what any of us described is what normalizing was intended to do.  Someone took the idea and "injected it with steroids", went to far, whatever you want to call it.  The corporation I was working for brought this consulting firm in to create a normalized data warehouse the corporation could use.  These "experts" had credentials or working with fortune 500 companies.  So who do you think corporate listed to when we said that it was going to far?  Not us, what would some developers who had worked with the same company 15+ years know.  Ron what you are describing makes perfect sense, and that's what I think normalized should be. But I know from my experience that what these "experts" kept saying is this this is the way normalized data should look.  And OMG don't think of questioning what they are saying, or you will be labeled as "someone resisting change".   I don't think normal normalized data means many tables, I'm saying when someone takes the idea to far, it becomes to many tables.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • The corporation I was working for brought this consulting firm in to create a normalized data warehouse the corporation could use. These "experts" had credentials or working with fortune 500 companies. So who do you think corporate listed to when we said that it was going to far? Not us, what would some developers who had worked with the same company 15+ years know. Ron what you are describing makes perfect sense, and that's what I think normalized should be. But I know from my experience that what these "experts" kept saying is this this is the way normalized data should look. And OMG don't think of questioning what they are saying, or you will be labeled as "someone resisting change".  

    No, I get that. I've had the same experience.  At a couple of companies ago our company brought in outside consultants to help design a complicated design.  One of the consultants kept wanting to have some columns calculated to avoid joining to other tables.  I yielded on some just to "go along to get along."  But once live, we had trouble keeping those columns in sync.  Of course, the consultants were gone, and we employees, and me the DBA, was left holding the bag.  At my last company, we had a data warehouse consultant who didn't use the standard referential integrity tools on a data warehouse design.  He said the ETL would take care of it.  As the new employee I naturally had no weight, although I did point it out.  Eventually things did go wrong.  He said something was wrong with the ETL, but I replied that something was wrong with the design that could let that happen.

    But this is why we shouldn't be saying "it depends."  It doesn't "depend."  We also shouldn't be saying it's a nice theory to follow if you want.  A problem is a bad design can be made to work, and too often that's what people care about since the maintenance costs that are attributable to bad design can't really be captured.

  • RonKyle - Wednesday, July 25, 2018 6:45 AM

    every street has a name that is repeated, every town has... You get the idea..

    Actually, I don't.  For most database designs, the house/apt number + street is atomic, plus the city, plus the state, plus the zip.  The zip may point to a table.  This could often alone normally identify the city and state, but not always.  The state may have a lookup to a state table, but chances are the city won't.  As I've indicated, I had one design where the house number and the street had to be split to be able to count units by street.  Fortunately it was a small scale.  Not sure how I would have done that on a larger scale. 

    Too many of you seem to have the idea that normalization means many tables, and therefore you don't "take it too far."  That is not what normalization does.  As I've indicated, I've observed unnormalization by too many tables.  We are database professionals and this isn't supposed to be some abstract idea that we can use or not use as we see fit.  And I make such a big deal about it because I've had to, and continue to, live with the results of databases that weren't properly designed.  It's not some theory that makes it harder, it is a failure to apply the rigor required in properly normalizing a database to at least 3NF, which certain unusual cases being covered by normal forms "above" that.

    And normalization to that level often doesn't make sense in a real world application, address being a perfect example.  Addresses usually have multiple valid variations and not all address validation software parses it out the same or what about physical locations that actually have multiple valid addresses even after normalization?  Or from a design perspective it often makes more sense to just let people enter the same data multiple times for business flow reasons.

  • RonKyle - Wednesday, July 25, 2018 6:45 AM

    every street has a name that is repeated, every town has... You get the idea..

    Actually, I don't.  For most database designs, the house/apt number + street is atomic, plus the city, plus the state, plus the zip.  The zip may point to a table.  This could often alone normally identify the city and state, but not always.  The state may have a lookup to a state table, but chances are the city won't.  As I've indicated, I had one design where the house number and the street had to be split to be able to count units by street.  Fortunately it was a small scale.  Not sure how I would have done that on a larger scale. 

    Too many of you seem to have the idea that normalization means many tables, and therefore you don't "take it too far."  That is not what normalization does.  As I've indicated, I've observed unnormalization by too many tables.  We are database professionals and this isn't supposed to be some abstract idea that we can use or not use as we see fit.  And I make such a big deal about it because I've had to, and continue to, live with the results of databases that weren't properly designed.  It's not some theory that makes it harder, it is a failure to apply the rigor required in properly normalizing a database to at least 3NF, which certain unusual cases being covered by normal forms "above" that.

    Heh... right there with you, especially on the "had to, and continue to" thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • roger.plowman - Wednesday, July 25, 2018 6:37 AM

    Ok, I could buy state. And maybe city, assuming there was additional fields needed by the city. And POSSIBLY the street name (although that's really stretching it). But street number? Seriously? :blink:

    I mean we use a separate address table since a lot of our sites actually contain multiple pieces of hardware. But 36 joins for a customer name and address? That sounds like a tall tale told in a bar to get a free drink! I'm not saying I don't believe you, but who in their right mind would design something like that?

    It was an simple bought ledger that was outsourced. If I couldn't still vividly remember the horrors, I wouldn't believe me either. Definitely the worst system I have ever had to support and try to improve.

Viewing 7 posts - 31 through 36 (of 36 total)

You must be logged in to reply to this topic. Login to reply