Datawarehouse Design Question

  • Hello ,

    While designing the Data warehouse , i am importing data primarily from two transaction systems A and B.

    Now there are tables in A and B which have essentially the functionality, however because of the design there are records (which are unique) in tables of A and B which have the same id.

    Inserting all of this in the same table(for example one customer table) means that two separate customers will have the same id.

    I know i can create a surrogate key and this would help identify uniquely each customer. However to join across multiple tables(or the fact table) would make things slightly more complicated.

    Realistically there is a workaround, by creating columns that identify the system and so on, but would that be good DB design?

  • The standard is to use surrogate keys on every dimension table regardless of needing to or not. I would recommend doing this in nearly every situation.

    It does mean that during the ETL process for your fact data you will need to map all of your ID's, but that is pretty standard practice and not terribly difficult or slow if done correctly.

    Using composite keys for your dimension tables will quickly make your fact table much wider as well as make the joins more complicated. The performance and maintenance issues caused by this could easily be significant.

  • Thanks, yeah what i have mapped out is that each dimension table has a unique key, and these are surrogate keys.

    Then each fact table has those keys in it with data for that particular grain referencing the dimension key.

    Does the primary key in the fact table have to be composite, or can it have a unique id, like the transaction id for a transaction fact table ?

  • I usually include an identity column on my fact tables and make that and some kind of additional field as the PK for the table. The only reason for the second column for me is for partitioning.

    You could just include a single unique column and make it the PK.

    I do not believe analysis services cares if the fact table has a primary key at all. The PK should NOT be a composite key of all of the dimension keys - if you do this, you would never be able to put in a reversing record.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply