What is Normal?

  • RonKyle (11/21/2016)


    I think there is a very specific distinction between data which is unnormalized vs denormalized

    I agree with you, and maybe you, I, and some others can make that distinction. But in my view too many describe as denormalized what is really unnormalized because they can't or don't want to apply the rigor necessary. I have denormalized due to performance issues. But I started with a normalized design and make very specific changes when it was clear that there was no other way. But these are rare cases.

    Maybe the following applies:

    Unnormalised = ignorant

    Inappropriate normalisation (i.e. missed, as opposed to actively not chosen, denormalisation) = lazy

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Gary Varga (11/22/2016)

    Unnormalised = ignorant

    Inappropriate normalisation (i.e. missed, as opposed to actively not chosen, denormalisation) = lazy

    Sounds about right 😉

    I worked in the IT department of a rental car company for 3 years, as we were designing a new reservation system to replace the old unnormalized monolithic database with a new system, I had proposed a well thought out design, and 2 of the 3 IT managers said "you make a lot of good points about your design, but we're comfortable staying with something closer to the current design". The third one didn't say anything one way or the other.

    I've lost track of how many times after the fact people have asked me why I didn't push back more to get people to follow best practices or go with better designs, yet when I do push back on things it just causes more grief for everyone. I guess the bad design / lazy people just know how to push better.:ermm:

  • Jeff Moden (11/19/2016)


    I'm with Phil. It's MUCH safer to "over" normalize and help those that don't get it to finally get it. And if you're one of those folks that think you can identify customers or accounts by some form of natural key because "IDENTITY" columns are supposedly evil, then you need to prepare yourself and your peers for a world of hurt because it will certainly hurt a whole lot... again and again!

    Absolutely. My client (a major Spanish bank) relied on official, personal "identity" document numbers such as: passport, national ID card, alien resident ID card, etcetera.

    But it occurred that a few passports issued by different countries... had the same number!

    Also, there were several cases of one customer having more than just one ID document: some alien first came and opened an account using her/his passport; then s/he became a resident and got a resident ID card, and opened another account using this new document; then s/he eventually got the Spanish nationality and obtained a national ID card, and opened another account with this new document. Some people even used older documents on purpose in order to obscure their debt history!

    There also were some (few) cases of duplicated national ID card numbers (which supposedly must be unique within the country). In the past, when the government did not have a centralized registry for this, the provinces were given fixed ranges of numbers for issuing ID cards. But population in some provinces outgrew beyond the boundaries of their number range and overlapped some other's. This error was later discovered when registries eventually became automated. But it was too late: once a person was assigned an ID card, there were a number of legal complications that made a change impossible. Nowadays, with centralized computer systems, this error does not occur anymore, but still there are living people in the country who share an ID card number with some other person.

    In a big bank with millions of customers and tens of millions of accounts, these "rare" cases were actually several hundreds.

    I know this is not directly related with today's topic, but I think it is a case worth knowing.

  • And if you're one of those folks that think you can identify customers or accounts by some form of natural key because "IDENTITY" columns are supposedly evil, then you need to prepare yourself and your peers for a world of hurt because it will certainly hurt a whole lot...

    I agree that sometimes with people an identity is sometimes unavoidable. And not just those. But I hope you're not one of the folks who uses an identity all the time even if there is a natural key available. Use a natural key in OLTP designs when possible.

    That said, per Kimball I exclusively use identities in my OLAP designs. There is a place for both and a good designer will be able to figure out when each is better than the other.

  • Charles Kincaid (11/20/2016)


    I do have a slight case for denormalization. I want to have an unchangeable representation of something historical. Where did we ship that order and to whom did we bill that. Because a customer moves I don't want that to affect where we did ship their stuff in the past.

    I don't that particular use case qualifies as denormalization. You are describing a normalized design. Consider it this way:

    Before the shipment is made, there is a Customer entity with a Shipping Address attribute. Once the shipment is made, the new Shipment entity has a Shipped To Address attribute. The Shipped To Address is populated with the value of the Customer's Shipping Address at the time of shipment, but it's not the same attribute. The Shipped To Address, as an attribute of Shipment, should be stored separately from the Shipping Address of the Customer, which can continue to vary as time goes on. And each Shipment has its own Shipped To Address, even if multiple Shipments are associated with a single Customer.

  • Jeff Moden (11/20/2016)


    Eirikur Eiriksson (11/20/2016)


    I'm strictly for the full most practical normalisation with one exception, carrying forward a very carefully selected keys such as Invoice-->InvoiceLine-->InvoiceLineDetail(InvoiceID) where, at the cost of a single integer column, one can group and aggregate by only accessing one table. I don't do a blanket application of such de-normalisation, there will have to be solid and tangible justifications. Guess what I'm trying to do here is answering the question raised by Phil and my answer is "normal(ise) as in practical"

    😎

    I don't understand why you think that such a thing isn't normalization.

    Not saying that this isn't normalization, it's a lower/lesser normal form as it duplicates the key values in those tables.

    😎

  • RonKyle (11/22/2016)


    And if you're one of those folks that think you can identify customers or accounts by some form of natural key because "IDENTITY" columns are supposedly evil, then you need to prepare yourself and your peers for a world of hurt because it will certainly hurt a whole lot...

    I agree that sometimes with people an identity is sometimes unavoidable. And not just those. But I hope you're not one of the folks who uses an identity all the time even if there is a natural key available. Use a natural key in OLTP designs when possible.

    That said, per Kimball I exclusively use identities in my OLAP designs. There is a place for both and a good designer will be able to figure out when each is better than the other.

    Heh.... I would tend towards the opposite, an OLTP is much more likely to be a candidate for an identity or some kind of system generated id because an OLTP is much more likely to be the source system for data IE a CRM, eCommerce site or what have you with no natural key at time of entry.

    And while an identity might be useful for joining data in an OLAP I wouldn't rely on it without also trying to identify a legitimate natural key in data from external sources. I've seen a lot of jobs crash and burn because someone in a data warehouse just stuck a system id on records and left it at that since if it loads without throwing errors it must be right, right?

  • Heh.... I would tend towards the opposite, an OLTP is much more likely to be a candidate for an identity or some kind of system generated id because an OLTP is much more likely to be the source system for data IE a CRM, eCommerce site or what have you with no natural key at time of entry.

    My concern is not that identities are sometimes used. A work order number is often an identity generated number. But at that point that value becomes the natural key.

    And while an identity might be useful for joining data in an OLAP I wouldn't rely on it without also trying to identify a legitimate natural key in data from external sources. I've seen a lot of jobs crash and burn because someone in a data warehouse just stuck a system id on records and left it at that since if it loads without throwing errors it must be right, right?

    That is a good point and so to clarify: I will always use an identity column on my dimension tables. Where there is a natural business key and the dimension table is in 1SCD, I will uniquely index that column/those columns to ensure there are no duplicates. Those columns are usually have the clustered index as well.

    I haven't had the experience that I think you're describing at the end of that comment. A good OLAP design should be able to handle unexpected data with the unknown bucket. Given what I just said, duplicate information does not get in.

  • The company I work for was purchased and split into two separate companies at the same address. The names were almost identical, plus each one had subsidiaries with totally unrelated names also at that address.

    Fun for our suppliers!

    ...

    -- FORTRAN manual for Xerox Computers --

  • Stephanie I have a structure for shipped orders. When shipped we instant archive the order there and remove it from the active orders structure. For me this does two things: First the shipped order can't be changed (triggers disable updates and deletes), and the active and planned orders structure remains small. Yes, I am aware of things that can happen between when an order leaves my door and being received. I am speaking in principle here as many of the techniques are proprietary.

    Gary I agree with you almost all of the time. I prefer to not be harsh in criticizing folks. Here in the U.S. we have been through a period of harsh language and I'm about finding gentler ways of driving the point home. Cheers.

    ATBCharles Kincaid

  • Here in the U.S. we have been through a period of harsh language

    We have? Must have been asleep....;-)

  • ZZartin (11/22/2016)


    RonKyle (11/22/2016)


    And if you're one of those folks that think you can identify customers or accounts by some form of natural key because "IDENTITY" columns are supposedly evil, then you need to prepare yourself and your peers for a world of hurt because it will certainly hurt a whole lot...

    I agree that sometimes with people an identity is sometimes unavoidable. And not just those. But I hope you're not one of the folks who uses an identity all the time even if there is a natural key available. Use a natural key in OLTP designs when possible.

    That said, per Kimball I exclusively use identities in my OLAP designs. There is a place for both and a good designer will be able to figure out when each is better than the other.

    Heh.... I would tend towards the opposite, an OLTP is much more likely to be a candidate for an identity or some kind of system generated id because an OLTP is much more likely to be the source system for data IE a CRM, eCommerce site or what have you with no natural key at time of entry.

    And while an identity might be useful for joining data in an OLAP I wouldn't rely on it without also trying to identify a legitimate natural key in data from external sources. I've seen a lot of jobs crash and burn because someone in a data warehouse just stuck a system id on records and left it at that since if it loads without throwing errors it must be right, right?

    There's nothing that says a table can't have more than one unique index on it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson (11/22/2016)


    Jeff Moden (11/20/2016)


    Eirikur Eiriksson (11/20/2016)


    I'm strictly for the full most practical normalisation with one exception, carrying forward a very carefully selected keys such as Invoice-->InvoiceLine-->InvoiceLineDetail(InvoiceID) where, at the cost of a single integer column, one can group and aggregate by only accessing one table. I don't do a blanket application of such de-normalisation, there will have to be solid and tangible justifications. Guess what I'm trying to do here is answering the question raised by Phil and my answer is "normal(ise) as in practical"

    😎

    I don't understand why you think that such a thing isn't normalization.

    Not saying that this isn't normalization, it's a lower/lesser normal form as it duplicates the key values in those tables.

    😎

    Ah... understood. Yeah... I agree. No sense in having "bridge tables" on such things.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There's nothing that says a table can't have more than one unique index on it.

    it certainly can. It is no crime but it is a sign that there may be a problem with normalisation to have more than one candidate key, surely

    Best wishes,
    Phil Factor

  • Phil Factor (11/22/2016)


    There's nothing that says a table can't have more than one unique index on it.

    it certainly can. It is no crime but it is a sign that there may be a problem with normalisation to have more than one candidate key, surely

    I think that a lot of folks get confused by the names of things.

    I have started referring to tables as either a clustered table or a non-clustered table. Putting a clustered index on a table changes so many things about the table that I think the distinction is important. Internally the other indexes on a table change when you add a clustered index

    That another index on the same table has the unique attribute makes it, and as I refer to it, a unique constraint. Agreed that it would be rare that a table that is a clustered table might need a unique constraint that is different from the clustering key.

    ATBCharles Kincaid

Viewing 15 posts - 16 through 30 (of 89 total)

You must be logged in to reply to this topic. Login to reply