Modeling Design / Approach

  • sqlvogel (3/29/2012)


    Sean Lange (3/29/2012)


    It is actually the very capture of this point in time data that makes it denormalized. The point I think we all made is that this is a perfectly valid time to have denormalized data. It makes sense to store this information from a point in time perspective. It is however denormalized.

    Sorry, but that's nonsense. No type of data capture requirement can possibly make the data denormalized. All data, including the point-in-time information about a customer, is capable of being modelled in a normalized fashion (say, BCNF, 5NF or potentially even 6NF if you find the need to do so).

    My point of disagreement with Lynn is that the need to record the customer's address at the time of the order is no good reason at all to denormalize. The customer data example isn't really very interesting but if you aren't familiar with modelling of temporal data then you should definitely check out some of the books and other resources here fort example: temporaldata.com. The literature on this topic is very extensive and covers plenty of examples like the one mentioned here.

    You know what, I'm tired of arguing with you. We will just have to agree to disagree and move on.

  • David,

    I'm appreciating all answers as I'm learning a lot. Lynn's and Sean's responses made sense to me as did the customer purchase example. But what you've done is state that the temporal data can be normalized (and I don't get the importance of that fact), and more or less (in my understanding) that being temporal is not a reason for denormalizing. I am very interested in your argument but not those hidden in books I'm not going to read. Can you give a simple example (or two) which supports the critical importance of your point?

    Much thanks,

    Gary

  • gyoung 93471 (3/29/2012)


    David,

    I'm appreciating all answers as I'm learning a lot. Lynn's and Sean's responses made sense to me as did the customer purchase example. But what you've done is state that the temporal data can be normalized (and I don't get the importance of that fact), and more or less (in my understanding) that being temporal is not a reason for denormalizing. I am very interested in your argument but not those hidden in books I'm not going to read. Can you give a simple example (or two) which supports the critical importance of your point?

    Much thanks,

    Gary

    Hi Gary,

    To take a simple example of an Order and associated CustomerAddress:

    Order {OrderNum*, t1, CustomerNum, ...}

    CustomerAdddress {CustomerNum*, t2*, Address}

    * = key

    For this generic example, t1,t2 are the temporal elements - they could be a time interval type if your DBMS supports it; or an interval defined by separate start and end timestamp attributes; or a version number. t2 becomes part of the composite key in CustomerAddress because the address changes over time.

    Commonly a customer might have several current addresses so {CustomerNum, t2} alone might not be sufficient to identify the address. For this simplified example I'm assuming the customer only has one address at any point in time. There's nothing to stop you adding further attributes to identify more than one current address if you need to.

    It may also be that the customer can specify a one-off address for an order, so let's extend the model a bit:

    OrderAddress {OrderNum*, Address}

    Order satisfies the dependency {OrderNum} -> {t1, CustomerNum}

    CustomerAdddress satisfies {CustomerNum, t2} -> {Address}

    OrderAddress satisifies {OrderNum}->{Address}

    These are all key dependencies. Order, CustomerAddress, OrderAddress are therefore in 5NF with respect to those dependencies.

    One sensible reason to denormalize a database schema below 5NF is to implement some dependency or business rule that can't be implemented at a higher normal form (often because the DBMS doesn't support doing it any other way). I don't think that applies here though. Normalization is particularly important with temporal data in my experience because it is extremely sensitive to changing requirements and the accurate point-in-time representation of data is easily lost and very hard or impossible to recover. Denormalizing temporal data below 5NF is something you should only do with caution and with good reason because otherwise it is very likely to catch you out later on

    My reason for joining this thread was simply to point out that given any set of requirements and data to work with it is the database designer's choice whether and how to normalize that data. There is nothing implicit in any properly formulated business requirement that forces denormalization upon the database and certainly not denormalization below 3NF (a dependency-preserving 3NF relation schema exists for any given set of FDs and so 3NF is always achievable without sacrificing any information). Software limitations may occasionally force denormalization on us but the requirement to record the address of an order does not.

    Hope this helps.

  • To take a simple example of an Order and associated CustomerAddress:

    Order {OrderNum*, t1, CustomerNum, ...}

    CustomerAdddress {CustomerNum*, t2*, Address}

    * = key

    For this generic example, t1,t2 are the temporal elements - they could be a time interval type if your DBMS supports it; or an interval defined by separate start and end timestamp attributes; or a version number. t2 becomes part of the composite key in CustomerAddress because the address changes over time.

    Commonly a customer might have several current addresses so {CustomerNum, t2} alone might not be sufficient to identify the address. For this simplified example I'm assuming the customer only has one address at any point in time. There's nothing to stop you adding further attributes to identify more than one current address if you need to.

    It may also be that the customer can specify a one-off address for an order, so let's extend the model a bit:

    OrderAddress {OrderNum*, Address}

    Order satisfies the dependency {OrderNum} -> {t1, CustomerNum}

    CustomerAdddress satisfies {CustomerNum, t2} -> {Address}

    OrderAddress satisifies {OrderNum}->{Address}

    These are all key dependencies. Order, CustomerAddress, OrderAddress are therefore in 5NF with respect to those dependencies.

    One sensible reason to denormalize a database schema below 5NF is to implement some dependency or business rule that can't be implemented at a higher normal form (often because the DBMS doesn't support doing it any other way). I don't think that applies here though. Normalization is particularly important with temporal data in my experience because it is extremely sensitive to changing requirements and the accurate point-in-time representation of data is easily lost and very hard or impossible to recover. Denormalizing temporal data below 5NF is something you should only do with caution and with good reason because otherwise it is very likely to catch you out later on

    My reason for joining this thread was simply to point out that given any set of requirements and data to work with it is the database designer's choice whether and how to normalize that data. There is nothing implicit in any properly formulated business requirement that forces denormalization upon the database and certainly not denormalization below 3NF (a dependency-preserving 3NF relation schema exists for any given set of FDs and so 3NF is always achievable without sacrificing any information). Software limitations may occasionally force denormalization on us but the requirement to record the address of an order does not.

    Hope this helps.

    David,

    I appreciate the response. Some of it, like types of normalization and the implications, will take me a while to grasp and I will come back and re-read (even though you provided the simple example I wished for). My first reaction is that perhaps we have a slightly different concept of temporal. I was thinking of temporal data (with respect to the customer purchase example as laid out by Lynn) as being temporal because by definition it was desired to make it non-dependent on past or future database elements and therefore it only exists to describe what happened in that single point in time. It can later be referenced by the order ID so it is fully accessible. And I would choose not to normalize this data because I see more disadvantage to additional tables than advantage. My imagination hasn't yet provided me with potential new requirements on the database that would undermine my chosen design and create a real problem. At the moment I'm focused on concrete examples, in part, because I do not yet understand all the theory. I have been involved in limited SQL design for a few years and while I've participated in design that ended up being lessons, I have a hard time envisioning some of the problems that I'm being told to worry about.

    Gary

Viewing 4 posts - 31 through 33 (of 33 total)

You must be logged in to reply to this topic. Login to reply