How do I model this?

  • Sorry for the lame subject, but here goes.

    I have a basic sales data warehouse. Basically, a salesperson earns a commission when a customer in his territory buys product. The catch is that the territory "borders" can get redrawn every year. So it would be necessary to "freeze" a definition of the territory map to get an accurate idea of what's going on in the business. (What customers in what territories are buying more/less etc).

    The "customers" in my case are companies (hospitals and clinics), so they don't move like people do... How would I model this situation so that I could look at a single customer across time and at a territory with a fixed "definition" across time. (I suppose that would mean a membership that was fixed as of a point in time.... so something like "use the definition of the sales map that was valid on a given day").

    It would seem that it would be something like:

    Territory--(1,M)--Customer--(1,M)--SalesFact

    So how would I specify when the "Customer in Territory" was valid?

    Thanks!

    Pieter

  • Have you read about slowly changing dimensions type 2? It seems to be what you need.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Yes, but I clearly don't understand it.

    Basically, my understanding is that I would create a new Customer record (with a new Primary Key, but with the same Alternate Key)... it would get updated whenever the customer's sales territory changes. Maybe my problem is that I'm too accustomed to OLTP type databases as opposed to Data Warehouses...

    Thanks,

    Pieter

  • In addition to what you have said above, your customer dimension should also contain effective/termination dates and preferable a "current record" indicator.

    The effective/termination dates should be used when populating your fact table, to pick the correct version of your customer when the transaction occurred and using that customer record's surrogate key in the fact record.

    Territory should not be its own dimension table, unless you have a really good reason to do so. It should rather be an attribute in the salesperson dimension. If that is done, a simple group by territory and customer would then give you the breakdown you need. (Also assuming that you have a date dimension that is referenced by your facts).

    Edit: Corrected my statement about the territory attribute...it should be in the salesperson/employee dimension, as that is where you would track it. The fact record would tie territory to customer.

  • I think SCD2 is what you need. As noted by Martin, you need start/end dates, whatever you call them. I'd also note that you might link salespeople to territories as an SCD 2 as well.

    All your queries need to look at dates if they span time. They should look at the begin and end dates, with the end date of null meaning it's currently active.

    Some links here: http://www.sqlservercentral.com/search/?q=slowly+changing+dimension&t=a&sort=relevance

  • Steve Jones - SSC Editor (11/10/2015)


    All your queries need to look at dates if they span time. They should look at the begin and end dates, with the end date of null meaning it's currently active.

    Mr. Jones, although a matter of preference I have to respectfully disagree about the usage of null values for end dates. In my experience, we typically try to avoid null values as much as possible for any dimensional attribute. Using a future-dated default for current record end dates (like 12/31/9999) usually works pretty well.

  • I thought about a future date. Using 2999/12/31 or something makes sense as well.

    I've done it both ways. I guess as I think about it, the end date as null or a fixed future date probably performs the same in terms of indexing and querying, so why not use a future date. I'll cease recommending NULL.

  • Steve Jones - SSC Editor (11/11/2015)


    I thought about a future date. Using 2999/12/31 or something makes sense as well.

    I've done it both ways. I guess as I think about it, the end date as null or a fixed future date probably performs the same in terms of indexing and querying, so why not use a future date. I'll cease recommending NULL.

    It wasn't a "cease and desist" comment, but I'll take it 🙂

    The main problem with using NULL values for effective dates in dimensions, is the amount of effort it takes to work around the NULL values later on.

    For instance, queries that load your fact tables and referencing a type-2 dimension will have to use the "between" operator in order to assign the correct dimension record as reference to the fact. If you have NULL values, you will have to use the ISNULL function in each one of those where clauses to deal with the NULL values... and that's a lot of additional work.

  • I agree with you. As I was writing the first time, I was thinking, what about the ISNULL code that's needed? Then I saw your comment, and I started to wonder why I'd ever done that. Maybe the disdain for a "magic" date?

    I'll happily give the cease and desist. Learning and writing better code over time is what I want to promote here. Happy to follow my own path.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply