Best way to add description values to data mart

  • I am moving relational data to a data warehouse data mart. I am looking for the most efficient way to combine the descriptive data from THE code table into the data mart table. For example I have a product table with display_id, manufacturer_id, industry_id, pik_id, order_id, sid_id, etc. Then there is a code_value table with all the ids and a code_set_id. The code table contains a display description, an internal product description, the industry description, the pik set description it belongs to. Anyway that should give the concept.

    There are multiple ways to do it.

    1. I can create multiple connections to the code_value table when I do the original pull.

    2. I can just transfer the code_value table and make multiple joins in the data_mart

    3. I can run updates after the insert into the data mart table

    Any experience to recommend one choice over another?

  • I would most likely put it all in a single table that will become something like your product dimension. Then in SSAS you can configure some or all of your keys to display their descriptive values instead of their key values if you like it that way (this sometimes creates sorting issues...that are easily overcome, but sometimes a bit confusing)

  • Yes, ultimately we would like the codes and descriptions and facts all in the same data mart table. The real question is not "is that the best choice for a data mart table" the real question is "at what point is it most efficient to merge the data in the data mart table".

  • Without getting into the huge discussion of dimensional modeling. Here are the ultra basics. Most Data Marts or Data Warehouses are heavily denormalized because end users and many analysts don't really understand relational structures and CUBES are optimised on denormalized structures.

    So all your fact tables should have nothing but facts, and foreign keys. Pretty much a table filled with numbers, revenue, cost, overhead, etc., that kind of stuff.

    All your dimensional tables should have information that gives the data in the fact table meaning. For example a customer dimension would give meaning to revenue at a customer level (what was the revenue from customer A) and at a customer group or other sorts of level (what was the revenue for all retail customers, or what was the revenue for all customers in a specific city, etc.) Moving on to another dimension, adding a product dimension for example would provide something like revenue per part, or revenue per product category, or something along those lines. Combining these dimensions and facts allows you to answer questions like what was the revenue for this product line for customers in this city, or how many of this part did a specific customer buy and how much money did we make.

    So I think the answer to your question is to merge the data together into one table before loading the cube. Perhaps I misunderstand your question, but you might want to spend a bit of time reading up on dimensional modeling.

    In any case, I hope I was helpful. Good luck.

  • I'm sorry, I seem to have confused the issue with too much detail, or maybe it was my choice of posting formum. We are going to combine the data some where. I'm asking about practical speed of the combination process, not whether to combine but does anyone have experience (have you done tests?) of the actual speed of different ways of combining data. For example bulk loads are much faster than updates but will I lose the gain of the bulk load speed with the data pull process?

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply