Multiple Joins on Fact Table

  • Hi,

    I am building a DW which will load data from 2 separate sources.

    Each source will populate the Broker Dim table and the Account Dim table.

    To identify which source the Broker/Account relates to, I added BrokerSource column and AccountSource Column.

    In the Fact tbl, I have FKs to each of these Dims. CounterpartyBrokerKey and GenevaBrokerKey relate to the Broker Dim.

    CounterpartyAccountKey and GenevaAccountKey relate to the Account Dim.

    The problem I now have is that there are multiple FKs in the Fact table pointing to the one Dim table.

    Is this ok? Or is there a better, more efficient way of designing this?

    Thanks!

  • kevin.obrien 66193 (11/9/2016)


    The problem I now have is that there are multiple FKs in the Fact table pointing to the one Dim table.

    Is this ok? Or is there a better, more efficient way of designing this?

    This is fine and absolutely normal, think of i.e. date dimensions for order date, delivery date, payment date and expiry date, you wouldn't want to have multiple physical instances of the date dimension, rather instantiate the same date dimension multiple times.

    😎

    Word of caution, a single instance of a dimension table used many times in a model will most likely result in an empty result set.

  • kevin.obrien 66193 (11/9/2016)


    Hi,

    I am building a DW which will load data from 2 separate sources.

    Each source will populate the Broker Dim table and the Account Dim table.

    To identify which source the Broker/Account relates to, I added BrokerSource column and AccountSource Column.

    In the Fact tbl, I have FKs to each of these Dims. CounterpartyBrokerKey and GenevaBrokerKey relate to the Broker Dim.

    CounterpartyAccountKey and GenevaAccountKey relate to the Account Dim.

    The problem I now have is that there are multiple FKs in the Fact table pointing to the one Dim table.

    Is this ok? Or is there a better, more efficient way of designing this?

    Thanks!

    The question to ask is: does a row in your fact table really have both a CounterpartyBroker and a GenevaBroker, both a Counterparty Account and Geneva Account? If a row of your fact table can have both attributes, then yes it makes sense to have both columns and both foreign keys.

    If this is related to the two sources and each fact row really only has one or the other then just have one BrokerKey and one AccountKey column in fact table and one foreign key to each dimension. Let the dimension describe the source instead of forcing the use of alternate foreign keys which will make queries more difficult.

  • Word of caution, a single instance of a dimension table used many times in a model will most likely result in an empty result set.

    Hi, any reason why this might be?

    Thanks!

  • kevin.obrien 66193 (11/9/2016)


    Word of caution, a single instance of a dimension table used many times in a model will most likely result in an empty result set.

    Hi, any reason why this might be?

    Thanks!

    Think of an order with an order data and a delivery date, most likely those will not be the same. If you use a single instance of a date dimension and link both dates to the that then obviously all rows will be filtered out.

    😎

  • Think of an order with an order data and a delivery date, most likely those will not be the same. If you use a single instance of a date dimension and link both dates to the that then obviously all rows will be filtered out.

    😎

    Ah yes. That makes sense. Thanks!

  • kevin.obrien 66193 (11/9/2016)


    Think of an order with an order data and a delivery date, most likely those will not be the same. If you use a single instance of a date dimension and link both dates to the that then obviously all rows will be filtered out.

    😎

    Ah yes. That makes sense. Thanks!

    You are very welcome, just ping back if you need any further assistance

    😎

  • Word of caution, a single instance of a dimension table used many times in a model will most likely result in an empty result set.

    Not sure if I'm understanding this correctly, but this could be misleading. As you add a dimension to a cube, it won't be possible to hook the same dimension in by two dates. Assuming you are using the same date dimension, it would hook in by the second date and becomes a role playing dimension. Let's say you have a dimension Date. It's hooked into the cube by the one date and then ideally renamed to the Geneva date (or whatever it is, I can't go back while I'm typing this reply). The dimension Date is then added again, hooking into the second date. It will also be renamed. There are some downsides to this, but generally role playing dimensions are the way to go. The main advantage is that any changes made to the Date dimension will automatically be a part of both dimensions in this case.

  • RonKyle (11/9/2016)


    Word of caution, a single instance of a dimension table used many times in a model will most likely result in an empty result set.

    Not sure if I'm understanding this correctly, but this could be misleading. As you add a dimension to a cube, it won't be possible to hook the same dimension in by two dates. Assuming you are using the same date dimension, it would hook in by the second date and becomes a role playing dimension. Let's say you have a dimension Date. It's hooked into the cube by the one date and then ideally renamed to the Geneva date (or whatever it is, I can't go back while I'm typing this reply). The dimension Date is then added again, hooking into the second date. It will also be renamed. There are some downsides to this, but generally role playing dimensions are the way to go. The main advantage is that any changes made to the Date dimension will automatically be a part of both dimensions in this case.

    True for some but not for all, the cube design interface will instantiate the dimension automatically

    😎

  • ...CounterpartyAccountKey and GenevaAccountKey ..

    If these do represent the source, think of the maintenance nightmare when the source changes, or another is added. This is not the way to design IMHO.

    Unless you know that an account can and will always be tied to one source, for all accounts... ( I would not think this is likely) then I would just use a separate source dimension instead of including the source in the account dimension. There is a gray area here as you could consider the source a slowly changing attribute in the account dimension, but I just dont think it strictly relates to the key despite the correlation. Hence why I would advocate a separate source dimension here.

    ----------------------------------------------------

  • MMartin1 (11/10/2016)


    ...CounterpartyAccountKey and GenevaAccountKey ..

    If these do represent the source, think of the maintenance nightmare when the source changes, or another is added. This is not the way to design IMHO.

    Unless you know that an account can and will always be tied to one source, for all accounts... ( I would not think this is likely) then I would just use a separate source dimension instead of including the source in the account dimension. There is a gray area here as you could consider the source a slowly changing attribute in the account dimension, but I just dont think it strictly relates to the key despite the correlation. Hence why I would advocate a separate source dimension here.

    In the original question & diagram, the column defining the source is AccountSource. Those two keys are probably poorly named though, if for example the GenevaAccount is no longer in Geneva. Might be better to call it MainAccount or something that implies the opposite of Counterparty.

    ...

    I didn't see any mentions of creating cubes so the auto creation of role playing dimensions might not help here.

    The design is good though, and the effect of having the role playing dimension still applies in a straight SQL query. To query the data based on the GenevaAccount and the Counterparty account, you have to join the fact table to the account dimension table twice. It's no worse than having two account dimension tables, and better because there's only one table to maintain, and potentially some benefit from the pages of that table already being in memory for the second join - maybe. The page life/ caching is not guaranteed and not why you would design it like that, but might be a nice side effect.

  • davoscollective (11/11/2016)


    MMartin1 (11/10/2016)


    ...CounterpartyAccountKey and GenevaAccountKey ..

    If these do represent the source, think of the maintenance nightmare when the source changes, or another is added. This is not the way to design IMHO.

    Unless you know that an account can and will always be tied to one source, for all accounts... ( I would not think this is likely) then I would just use a separate source dimension instead of including the source in the account dimension. There is a gray area here as you could consider the source a slowly changing attribute in the account dimension, but I just dont think it strictly relates to the key despite the correlation. Hence why I would advocate a separate source dimension here.

    In the original question & diagram, the column defining the source is AccountSource. Those two keys are probably poorly named though, if for example the GenevaAccount is no longer in Geneva. Might be better to call it MainAccount or something that implies the opposite of Counterparty.

    ...

    I didn't see any mentions of creating cubes so the auto creation of role playing dimensions might not help here.

    The design is good though, and the effect of having the role playing dimension still applies in a straight SQL query. To query the data based on the GenevaAccount and the Counterparty account, you have to join the fact table to the account dimension table twice. It's no worse than having two account dimension tables, and better because there's only one table to maintain, and potentially some benefit from the pages of that table already being in memory for the second join - maybe. The page life/ caching is not guaranteed and not why you would design it like that, but might be a nice side effect.

    The question to the OP is ... are CounterpartyAccountKey and GenevaAccountKey mutualy exclusive? It appears to me from the reading that they may very well be (normally a record only comes from one source). Hence only one will join to the account source in the dimension and the other will be NULL. If so this is not a good design for the reason described above.

    ----------------------------------------------------

  • The question to the OP is ... are CounterpartyAccountKey and GenevaAccountKey mutualy exclusive? It appears to me from the reading that they may very well be (normally a record only comes from one source). Hence only one will join to the account source in the dimension and the other will be NULL. If so this is not a good design for the reason described above.

    Hi,

    Yes the keys are exclusive. So the Fact table may contain the below data...

    The SQL query to display data would use a separate 'mappings' table which stores relations between GenevaAccount and BrokerAccount, so in effect it would try to roll the above 2 rows into 1.

    So the query to display data may look like this:

    SELECT f.*

    FROM CashBalanceFact f

    INNER JOIN

    Broker b

    ON b.BrokerID = f.CounterpartyBrokerKey

    OR b.BrokerID = f.GenevaBrokerKey

    INNER JOIN

    Account a

    ON a.AccountID = f.CounterpartyAccountKey

    OR a.AccountID = f.GenevaAccountKey

    -- mappings

    LEFT JOIN

    MappingTbl m

    ON m.GenevaBroker = b.BrokerName

    AND m.GenevaAcc = a.AccountName

    AND m.CounterpartyBroker = b.BrokerName

    AND m.CounterpartyAcc = a.AccountName

    -- end mapping

    Do I need to change my design for this?

    Thanks!

  • kevin.obrien 66193 (11/16/2016)


    The question to the OP is ... are CounterpartyAccountKey and GenevaAccountKey mutualy exclusive? It appears to me from the reading that they may very well be (normally a record only comes from one source). Hence only one will join to the account source in the dimension and the other will be NULL. If so this is not a good design for the reason described above.

    Hi,

    Yes the keys are exclusive. So the Fact table may contain the below data...

    The SQL query to display data would use a separate 'mappings' table which stores relations between GenevaAccount and BrokerAccount, so in effect it would try to roll the above 2 rows into 1.

    So the query to display data may look like this:

    SELECT f.*

    FROM CashBalanceFact f

    INNER JOIN

    Broker b

    ON b.BrokerID = f.CounterpartyBrokerKey

    OR b.BrokerID = f.GenevaBrokerKey

    INNER JOIN

    Account a

    ON a.AccountID = f.CounterpartyAccountKey

    OR a.AccountID = f.GenevaAccountKey

    -- mappings

    LEFT JOIN

    MappingTbl m

    ON m.GenevaBroker = b.BrokerName

    AND m.GenevaAcc = a.AccountName

    AND m.CounterpartyBroker = b.BrokerName

    AND m.CounterpartyAcc = a.AccountName

    -- end mapping

    Do I need to change my design for this?

    Thanks!

    This would suggest that there is a problem with the design that you would need to join from your fact to your dimensions conditionally using OR. How is this mappings table used? In your query here you don't seem to be selecting from it or have any WHERE clause, and it feels a bit awkward that it's being joined by 4 name columns. Given that b.BrokerName and a.AccountName will only match one or the other of their equivalent columns in the mapping table I'd suspect that the LEFT JOIN here will never match a record in MappingTbl.

  • kevin.obrien 66193 (11/16/2016)


    The question to the OP is ... are CounterpartyAccountKey and GenevaAccountKey mutualy exclusive? It appears to me from the reading that they may very well be (normally a record only comes from one source). Hence only one will join to the account source in the dimension and the other will be NULL. If so this is not a good design for the reason described above.

    Hi,

    Yes the keys are exclusive. So the Fact table may contain the below data...

    The SQL query to display data would use a separate 'mappings' table which stores relations between GenevaAccount and BrokerAccount, so in effect it would try to roll the above 2 rows into 1.

    So the query to display data may look like this:

    SELECT f.*

    FROM CashBalanceFact f

    INNER JOIN

    Broker b

    ON b.BrokerID = f.CounterpartyBrokerKey

    OR b.BrokerID = f.GenevaBrokerKey

    INNER JOIN

    Account a

    ON a.AccountID = f.CounterpartyAccountKey

    OR a.AccountID = f.GenevaAccountKey

    -- mappings

    LEFT JOIN

    MappingTbl m

    ON m.GenevaBroker = b.BrokerName

    AND m.GenevaAcc = a.AccountName

    AND m.CounterpartyBroker = b.BrokerName

    AND m.CounterpartyAcc = a.AccountName

    -- end mapping

    Do I need to change my design for this?

    Thanks!

    When a fact occurs, it represents an event. You capture what has happened and not what has not. So by stating that this record originated from sourceA implies it did not come from anywhere else. Have a key that maps to something like a sourceSystem dimension much like a sale of a product would map to a supplier dimension.

    Another strategy that comes to mind is leaving the audit field(s) in any existing operational data store and join from dimensional database to ODS database (via a business ID) whenever you need to investigate issues.

    As you have it now, think of the changes to your design (not even the ETL portion) that may be required if a new source of data comes into play. And here we are assuming that a records in a fact all comes from one source, which through ETL transformation may not necessarily be the case. I prefer leaving audit information aside and linking with business keys.

    ----------------------------------------------------

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply