What is Normal?

  • Comments posted to this topic are about the item What is Normal?

    Best wishes,
    Phil Factor

  • So how does one deal with such ambiguities when creating normalized structures? I'm facing a similar problem in my current project where we have a large number of unclear structures in our source data. I'm creating "translation" tables to try squeezing the source data into our own business logic and then flagging occurrences that have no equivalent and getting back to the business for corrections. This seems rather cumbersome. Has anyone else implemented similar solutions and continued with them or did you manage to resolve somehow this issue where "regular data" seems to be the outlier?

  • I guess I live in the world of trying to eliminate normal. I'm always shooting to denormalize my data. 😛

  • I've had to work with plenty of databases that have mimicked the old card file system. The old card file had that great advantage that you could write ad hoc notes on them. This doesn't really work with relational databases. I prefer to expect the worst and produce a fully normalised database design, but then provide all the views and functions that anyone would want so as to give the 'flat' view of the data that suits them.

    The problem comes when the people providing you the information about the data don't themselves understand it, or know about all of it. Then it all gets sticky. I've been told that you can fudge a great deal with PostgreSQL and JSON, but I wouldn't want to go down that route myself, due to past experience.

    Best wishes,
    Phil Factor

  • I have also been accused of over-normalizing. In my view, however, those saying that simply weren't apply the rigor necessary to produce a correct database. And I know all too well from first hand experience the difficulty of getting quality and consistent data from "undernormalized" databases. The edge cases you describe present peculiar difficulties. Fortunately I haven't had to deal with those yet.

  • Oooooo.... don't get me started on the silliness of denormalized data being stored in the database. I'm going through some hell (again) because my predecessors thought it would be a good idea to use a "ClientID" to identify a bank and a "LoanNumber" to identify a given loan for a given client. The reason they did this was to make it so they wouldn't have to make a trip to the actual loan table, which has a surrogate LoanID, the ClientID, and the LoanNumber. Apparently they thought they were safe in the "simplicity" of it all. What a stroke of "genius" to be able to avoid one freakin' join.

    I said "again" above because we are again in the position where two banks merged and, due to some conflicting (some are duplicated between the two banks), they changed some of the loan numbers, which are located in dozens of the tables we have in several databases and across not only our own systems, but in DB2 systems, as well. "LoanNumber" has also been spelled a bunch of different ways because there was no standard naming convention AND they've used the name of "LoanNumber" to mean other things. Of course, we haven't created any new tables in the last 5 years, right? In a pigs eye!

    I'm with Phil. It's MUCH safer to "over" normalize and help those that don't get it to finally get it. And if you're one of those folks that think you can identify customers or accounts by some form of natural key because "IDENTITY" columns are supposedly evil, then you need to prepare yourself and your peers for a world of hurt because it will certainly hurt a whole lot... again and again!

    That and NEVER store full names. Make sure that things like first name, middle name, last name, title, salutation, etc, are stored in separate columns so that you don't have to do something silly like using leading wildcard LIKEs or use FULL TEXT searches to find someone that calls in and doesn't have their account number.

    And, I agree... addresses and phone numbers should never be stored in the customer table or any contact table. Take the very little time up front to do it right so that you don't have to do it so wrong for the rest of the life of the database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah yes. Mergers are a real testing ground, especially if the 'natural' business terms, constraints and even units of measurement of the merging companies are different. This isn't really the fault of the RDBMS. It was even harder before technology. In my experience, a previous denormalisation makes it a lot harder to put that sort of mess right, once the two businesses have finally agreed on a common business culture.

    Best wishes,
    Phil Factor

  • Our last generation of software was denormalized. (Against my advice, but I still have to deal with the issues this architecture caused.)

    Our next gen software is already giving me happiness, it's designed "normally". 😀

  • I'm strictly for the full most practical normalisation with one exception, carrying forward a very carefully selected keys such as Invoice-->InvoiceLine-->InvoiceLineDetail(InvoiceID) where, at the cost of a single integer column, one can group and aggregate by only accessing one table. I don't do a blanket application of such de-normalisation, there will have to be solid and tangible justifications. Guess what I'm trying to do here is answering the question raised by Phil and my answer is "normal(ise) as in practical"

    😎

  • Eirikur Eiriksson (11/20/2016)


    I'm strictly for the full most practical normalisation with one exception, carrying forward a very carefully selected keys such as Invoice-->InvoiceLine-->InvoiceLineDetail(InvoiceID) where, at the cost of a single integer column, one can group and aggregate by only accessing one table. I don't do a blanket application of such de-normalisation, there will have to be solid and tangible justifications. Guess what I'm trying to do here is answering the question raised by Phil and my answer is "normal(ise) as in practical"

    😎

    I don't understand why you think that such a thing isn't normalization.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have been accused of over normalization. "Why all of these tables?" I was asked what was the cost of adding tables. I took two days and wrote a paper showing that the cost of adding a single new table could be near zero.

    We had Sales Orders, Purchase Orders, Manufacturing Orders, (because we supported distributed warehousing) Transfer Orders. And it does not end there.

    We had the good sense to put a Unit Of Measure ID on every amount column. I can buy something in Metric Tons and sell it in Troy Ounces because we have a unit of measure conversion table.

    I do have a slight case for denormalization. I want to have an unchangeable representation of something historical. Where did we ship that order and to whom did we bill that. Because a customer moves I don't want that to affect where we did ship their stuff in the past. That is why there is OLTP and data warehousing.

    One case against IDENTITY columns was that there won't be enough numbers to last the lifetime of the system. Wrong by half. Don't start them at zero. "Don't those huge numbers take up more room than the small ones?" No. "Won't the user be confused by the minus sign?" No. The USER will never see the *&%-ing minus sign!. It's an ID column that is never exposed. (Thank you, Joe.)

    ATBCharles Kincaid

  • It always amazes me the number of experienced professionals who do not consider the longevity of software and databases. Some must work on too many greenfield projects and never support older systems, or at least designs, as they never appear to consider supportability.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • I've reached the conclusion that people don't realise what a well thought out data model gives them and they aren't interested in knowing.

    They see SQL Server as a dumb data store and regard constraints as an affront limiting their flexibility. They absolutely don't see that an enforced data model benefits data quality or that a normalised data model represents a concept in a way that gives a fine grained understanding of the domain. Neither do they care that yhe FKs, PKs, check constraints give the query optimiser a boost.

  • chrisn-585491 (11/20/2016)


    Our last generation of software was denormalized. (Against my advice, but I still have to deal with the issues this architecture caused.)

    I think there is a very specific distinction between data which is unnormalized vs denormalized. A number of the things people are talking about in this topic are unnormalized data issues, meaning the schema design did not take into account all the aspects of 3rd normal form. Denormalization would be taking some schema that already has been through normalization during analysis, and purposely designing some small data duplication for purposes of optimization.

    For example, I used to work with a third party CRM database for a number of years, and in a base table it might have a current status column, while there's a separate account status history table. That current status is denormalized to handle the large number of queries in that system that were dependent on status to filter results.

  • I think there is a very specific distinction between data which is unnormalized vs denormalized

    I agree with you, and maybe you, I, and some others can make that distinction. But in my view too many describe as denormalized what is really unnormalized because they can't or don't want to apply the rigor necessary. I have denormalized due to performance issues. But I started with a normalized design and make very specific changes when it was clear that there was no other way. But these are rare cases.

Viewing 15 posts - 1 through 15 (of 89 total)

You must be logged in to reply to this topic. Login to reply