New Dimension or Use Existing

  • I'm in the process of adding a new fact table to our Data Warehouse. As part of this I need to create a new dimension and I'm a bit stuck about which route to go down.

    The fact table is to record variations made to an order. Each variation can create multiple rows on a variations table, each with their own GUID. The client has asked for everything on the variations table to be recorded on the fact table. That means that because each row has its own GUID, each row is considered a distinct variation. I think the client is under the impression that each variation creates only one row and only variations of a certain type are on the table. Unfortunately, because of some extremely nasty politics, we're not actually allowed to ask them whether this is the case or not.

    My dilemma is how to record the reason for these variations. As I said, each variation can create multiple rows so there a many rows all with the same reason. The reason is picked from a dropdown and there is a text field where the precise details can be entered, the dropdown is recorded as an int. The text field is duplicated on each row too.

    I think I have two options. Option one is to use an existing dimension to record the free text and create a small (10 row) dimension to cross-reference the int from the variation table with actual reason it represents. The current dimension contains 20 odd million rows already and creating the small dimension seems like a bit of a waste of time. I would also end up creating a row for each line of text and as I said there a duplicates in there, including 7400 full stops. Option two is to create a dimension and include the int reason, the actual reason in words and the free text. This method will mean that I have one row for each combination of int, reason and text so it will be at nearly the same grain as my fact table but it won't have the duplicates in there. This is the method I'm edging towards but does anybody have any advice either way?


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • I'd recommend a dimension for the distinct reasons only (which are selected from the dropdown). This will result in a small dimension but give you all the flexibility when you want to slice the data in a cube (or create calculated members).

    For the free text, I would create a junk dimension as container. It is unlikely that people will want to see the free text fields often, and you wouldn't want to exponentially increase the size of your actual dimension because of this free text attribute. I would also not want to bloat the fact table with something like that, because any scanning-type query will have to return that field as well.

    Hope this helps.

  • Thanks Martin.

    After a bit of testing\trial and error, I went down the small dimension and adding to the existing dimension route. The Comments dimension that's already there is a junk dimension anyway so it seemed like a bit of a waste of effort to create another one to do the same thing.

    ....Does anybody else think that junk and degenerate dimensions sound like something Terry Pratchett would write about?


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    —Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply