foreign key relationship between dimensions and the facts

  • Hi All,

    Is that mandatory to create foreign key relationship between dimensions and the facts in date warehosue ?

    Thanks & Regards

    Deepak.A

  • Why would you not want to do so?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • deepak.a (2/23/2012)


    Is that mandatory to create foreign key relationship between dimensions and the facts in date warehosue ?

    Well... no but, how do you think optimizer learns about relationship between tables? You have to tell and the way to do it is by creating such FKs.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Hi,

    we have built the date warehouse with snowflake schema and also it is difficult to process the incrematal load . so any suggestions on this could be more use full

    Thanks & Regards

    Deepak

  • Difficult in that you get FK Violations?

    I so have you considered changing the order in which you load the tables?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Or you could just drop all of the Keys as part of your load and then re-build them once the tables have loaded - this way you can load the tables in any order you wish.

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • deepak.a (2/23/2012)


    we have built the date warehouse with snowflake schema and also it is difficult to process the incrematal load . so any suggestions on this could be more use full

    If the issue is related to FK violations please take into consideration that it is usual to have FK disabled on a Data Warehouse - in most dimensional designs FKs are there to tell optimizer about relationship between tables rather to enforce referential integrity.

    In this kind of scenario, ETL process takes care of enforcing Referential Integrity.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • You could also disable and then enable the FK's but if you get a record(s) that violates the FK constraint exception that violates the FK Constraint then you will not know about it until after the load completes

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (2/23/2012)


    You could also disable and then enable the FK's but if you get a record(s) that violates the FK constraint exception that violates the FK Constraint then you will not know about it until after the load completes.

    This is true however its swings and roundabouts really - you either have to investigate why your FK's have been violated hence why you can't re-add them (which the error message would tell you) or investigate the row that violated the keys in question and fix that (and if you have an error table that captures the row and the error description this will give you the info) 🙂

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • PaulB-TheOneAndOnly (2/23/2012)


    deepak.a (2/23/2012)


    we have built the date warehouse with snowflake schema and also it is difficult to process the incrematal load . so any suggestions on this could be more use full

    If the issue is related to FK violations please take into consideration that it is usual to have FK disabled on a Data Warehouse - in most dimensional designs FKs are there to tell optimizer about relationship between tables rather to enforce referential integrity.

    In this kind of scenario, ETL process takes care of enforcing Referential Integrity.

    This is a good idea.

  • The join is no more "mandatory" than by a standard OLTP design. But that doesn't mean it's not a good idea.

    Why did you opt for a snowflake schema? There are uses for it, but they are not common. A star schema is almost always the better choice.

    Normally loading the dimensions first will take care of most of the problems. Each of my dimensions also has an "Unknown" bucket, and I use a LEFT JOIN during the load to send any unknowns to that bucket.

  • RonKyle (2/27/2012)


    The join is no more "mandatory" than by a standard OLTP design. But that doesn't mean it's not a good idea.

    Why did you opt for a snowflake schema? There are uses for it, but they are not common. A star schema is almost always the better choice.

    Normally loading the dimensions first will take care of most of the problems. Each of my dimensions also has an "Unknown" bucket, and I use a LEFT JOIN during the load to send any unknowns to that bucket.

    + 1

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • RonKyle (2/27/2012)


    The join is no more "mandatory" than by a standard OLTP design. But that doesn't mean it's not a good idea.

    Why did you opt for a snowflake schema? There are uses for it, but they are not common. A star schema is almost always the better choice.

    Normally loading the dimensions first will take care of most of the problems. Each of my dimensions also has an "Unknown" bucket, and I use a LEFT JOIN during the load to send any unknowns to that bucket.

    Hi Ron,

    I just got assigned as an ETL Developer to a Data Warehouse Project.

    Could you please share with me your SQL Statement and/or a little more information?

    Thanks!

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (2/27/2012)


    RonKyle (2/27/2012)


    The join is no more "mandatory" than by a standard OLTP design. But that doesn't mean it's not a good idea.

    Why did you opt for a snowflake schema? There are uses for it, but they are not common. A star schema is almost always the better choice.

    Normally loading the dimensions first will take care of most of the problems. Each of my dimensions also has an "Unknown" bucket, and I use a LEFT JOIN during the load to send any unknowns to that bucket.

    Hi Ron,

    I just got assigned as an ETL Developer to a Data Warehouse Project.

    Could you please share with me your SQL Statement and/or a little more information?

    I'm not Ron, but I'll give it a shot 🙂

    When you load your fact table, you need to do lookups to your dimensions to replace the dimension's business key with its surrogate key. You can do this with a LEFT OUTER JOIN. All the matches will give a valid surrogate key. However, if you have rows with no matches (early arriving facts), the surrogate key will be NULL. You can put a simple ISNULL(dimension.surrogateKey,0), with 0 representing the surrogate key for the UNKNOWN member.

    For example:

    SELECT f.myFact, ISNULL(d.surrogateKey,0)

    FROM myFactTable f

    LEFT OUTER JOIN

    myDimension d ON f.BusinessKey = d.BusinessKey

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • There is no substitute for doing proper data quality checks during the ETL. While there is a provision to convert to unknown member during processing of the cube and even the options of defaulting it within the database by assigning a default value as the surrogate key. They are not going to help you clean up after you have inserted the data into the cube.

    Once a fact assigns a unknown member to its key there is no way to differentiate it from previous unknown members , this means that once you do get the correct business key for the dimension you cant go back historically to rectify the issue.

    I might be better to no insert these facts into the table

    Jayanth Kurup[/url]

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply