How to model a slowly changing dimension

  • I am wondering how to model a slowly changing dimension. It's a basic sales database, pretty much like Contoso or the AdventureWorksDW databases. The part I'm not sure about is this...

    How do you handle the "redistricting" of Sales Territories, so that you can compare one "definition" of the territory to another over time? It looks like this is one option:

    CREATE TABLE SalesTime (

    ZipCode CHAR(5),

    SalesYear TINYINT,

    TerritoryID INT

    PRIMARY KEY (ZipCode, SalesYear, TerritoryID));

    Are there other options that would be more flexible or make summaries easier? And where does this go in a typical star schema? Doesn't seem to have an obvious "place" in a star schema. How do I relate the changing territory assignments and DimCustomer and SalesFact?

    Thanks!

  • Let me see if rephrasing this will help. I'm wondering how to model sales territories when the "borders" move over time.

    Same basic sales database (think Contoso DW... it's just less muddy than AdventureWorks!). The part that's giving me fits is the Geography/Territory dimension. In my "world" that I'm modeling, the company only sells to hospitals and clinics, so if I am looking at only the USA, that's something like 4000 hospitals. The part I'm having trouble with is modeling the fact that a customer (hospital) can be assigned to "territory A" one year and then "territory B" the next year, just because the sales maps got redrawn. How, then, do I compare like/stable territory definitions over time? (Because "territory A" in 2014 may not be exactly the same as it was in 2012.) Do I snowflake a little bit and add another couple of tables? Something along the lines of

    CREATE TABLE CustomerTerritory(

    CustomerID INT,

    FiscalYear INT,

    TerritoryID INT NOT NULL

    CONSTRAINT pkCustomerTerritory (CustomerID, FiscalYear));

    Is there a good reference to read on this? Chris Adamson's "Star Schema: The Complete Reference" (or is that overkill?)

    Thanks!

    Pieter

  • Quick questions, are these assumptions correct?

    😎

    1) Definition: Territory is a Geographical Area, defined by a collection of Zip Codes, each of which represents either a Geographical or a Pseudo location. At any given time, a Client can only be assigned to a single Territory.

    2) Territories are subject to annual changes, new definitions are created for each year, although only a small portion of the definitions are changed.

    3) Zip Codes are subject to infrequent changes by the governing body (USPS), at an irregular intervals.

  • pietlinden (11/29/2014)


    I am wondering how to model a slowly changing dimension. It's a basic sales database, pretty much like Contoso or the AdventureWorksDW databases. The part I'm not sure about is this...

    How do you handle the "redistricting" of Sales Territories, so that you can compare one "definition" of the territory to another over time? It looks like this is one option:

    CREATE TABLE SalesTime (

    ZipCode CHAR(5),

    SalesYear TINYINT,

    TerritoryID INT

    PRIMARY KEY (ZipCode, SalesYear, TerritoryID));

    Are there other options that would be more flexible or make summaries easier? And where does this go in a typical star schema? Doesn't seem to have an obvious "place" in a star schema. How do I relate the changing territory assignments and DimCustomer and SalesFact?

    Thanks!

    I use Type 2 SCDs for such things. The key to it being easy is to use 9999-01-01 ('9999' as an assignment to a DATETIME column) for the end date of the current "record" so that you don't have to test for NULL. See the following article on Type 2 SCDs and add the 9999-01-01 end date to the mix. Why not use 9999-12-31? No headroom for other date calculations and '9999' is a lot easier to type. 😉

    http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Oh, I think I get it now... So each time a customer got assigned to a different territory, then I would have to create a new record for that customer with a new primary key, but the same natural key (or whatever key it is that tells me they're all the same customer). Makes sense now.

    Now to go play with a simple data warehouse... well, after SQL Saturday!

    Thanks!

    Pieter

  • Yes to all three.

    The confusing part for me was how to compare a group of sales from one year to another year when the territory definitions are changing. (So that I'm always comparing apples to apples...) "Freeze" the territory definition, and then do a summary.

    Thanks!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply