Multiple Joins on Fact Table

  • In terms of the Kimball methodology of star-schema modeling, I believe this would be an example of a "role playing dimension".

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (11/17/2016)


    In terms of the Kimball methodology of star-schema modeling, I believe this would be an example of a "role playing dimension".

    From what I understand I dont think it is here. OP mentions that fields in question are mutually exclusive. That indicates to me that the record will originate from one of two sources. That is, CounterpartyAccountKey and GenevaAccountKey are intended to have the same meaning and designed to hold either NULL or one other value only.

    Hence for example >> if a fact record was pulled >

    from sourceA --> CounterpartyAccountKey gets value 23 and GenevaAccountKey is NULL

    from sourceB -->CounterpartyAccountKey is NULL and GenevaAccountKey gets value 5

    A role playing dimension has the same dimension apply to different attributes of the same fact record

    (orderDate, paymentDate, shippedDate, arrivedOnDate, ...)

    ----------------------------------------------------

  • Hi,

    I have updated my design to add a Source table. I have added the SourceKey to the fact table and removed the source reference from the other dimension tables.

    Is this now a better design?

    Thanks!

  • Yes. That appears like a straight forward star schema. One note though is that I would be consistent with the namings. If a column is named sourceKey in the fact table then good to name it the same way in the dimension, for all keys.

    ----------------------------------------------------

  • Upon further inspection, do you still need the following in the fact table: CounterpartyAccountKey and GenevaAccountKey? I would think these are being replaced with the sourceKey field.

    ----------------------------------------------------

  • I still see 2 foreign keys from the fact to Account table, and 2 foreign keys from the fact to Broker table. In a post on 11/16 you said they were mutually exclusive, so it would make sense that you would only need one BrokerKey column and one AccountKey column in the fact table. The model will be much easier for your end users to work with once these relationships are simplified.

    Edit: D'oh, MMartin just beat me to this observation 😉

  • Chris Harshman (11/21/2016)


    I still see 2 foreign keys from the fact to Account table, and 2 foreign keys from the fact to Broker table. In a post on 11/16 you said they were mutually exclusive, so it would make sense that you would only need one BrokerKey column and one AccountKey column in the fact table. The model will be much easier for your end users to work with once these relationships are simplified.

    Edit: D'oh, MMartin just beat me to this observation 😉

    Thanks! Updated design as below...

  • OK, the fact table looks better, but I'd need to understand the business a little better to know why you still have multiple balances in the fact. Will these ever be aggregated together in any way? If they are the same measure then they should be the same column, and your Source information would be used in a report or pivot table to filter out or group the various entities such as Geneva or CounterParty. If they are truly different measures then keep them separate.

    I also have a question in your dimension tables Account and Broker. How are they managed from the original system(s)? If you only have an ID and Name column, either you're using the ID from your OLTP system(s) or assuming that you'll always be able to match by Name, which seems unlikely.

  • Chris Harshman (11/22/2016)


    OK, the fact table looks better, but I'd need to understand the business a little better to know why you still have multiple balances in the fact. Will these ever be aggregated together in any way? If they are the same measure then they should be the same column, and your Source information would be used in a report or pivot table to filter out or group the various entities such as Geneva or CounterParty. If they are truly different measures then keep them separate.

    Apologies. I should only have had one balance column. Design updated as below.

    I also have a question in your dimension tables Account and Broker. How are they managed from the original system(s)? If you only have an ID and Name column, either you're using the ID from your OLTP system(s) or assuming that you'll always be able to match by Name, which seems unlikely.

    The Account and Broker name values should be identical to the source system. So account name of 'EEERT0450' is unique and will be on source and DW. So when I populate the DW, the process would check if that account already exists in the Dimension tbl. Do I need a better process around this?

    Thanks!

  • . So account name of 'EEERT0450' is unique and will be on source and DW. So when I populate the DW, the process would check if that account already exists in the Dimension tbl. Do I need a better process around this?

    I would be hesitant to check using name as misspelling that occurred before can then be corrected in the source system and your process would think you have a new account here. So I would stick with using the ID fields for this (for as the name implies), that is what they are for.

    I am sure you've heard of Kimball and the book "The data warehouse toolkit?" It is worth the read, even just the first few chapters to get you started.

    ----------------------------------------------------

  • Thanks! Currently reading it.

    Should I always have an identifier key in each dimension that points back to the source/staging table?

  • I think it is a good idea to keep the business ID along with your surrogate key definitely. As far as the source, that sounds to me more for auditing suspect data correct? I would keep that in your Operational Data Store (if you have one). Otherwise as mentioned you can have that in a Source dimension (similar to a Supplier dimension).

    ----------------------------------------------------

  • points back to the source/staging table

    Just does not come across to me as a good idea. That should be defined and referred to in your ETL process.

    ----------------------------------------------------

  • I think it is a good idea to keep the business ID along with your surrogate key definitely.

    Strongly agree with it in the fact table. Unsure what you meant about it pointing back to the staging table. Normally that is truncated on each load, unless you mean as an immediate indicator if something goes wrong.

  • MMartin1 (12/9/2016)


    I think it is a good idea to keep the business ID along with your surrogate key definitely. As far as the source, that sounds to me more for auditing suspect data correct? I would keep that in your Operational Data Store (if you have one). Otherwise as mentioned you can have that in a Source dimension (similar to a Supplier dimension).

    OK. So I have added a business id to the Broker, Account, and Currency dimensions. This business id represents the OLTP data. I have also cleaned up the fact table.

    Is there anything else I need to add?

    Thanks for all the help! I'm learning a lot!

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply