SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Foreign key constraints in data warehouses


Foreign key constraints in data warehouses

Author
Message
Chuck Bevitt
Chuck Bevitt
SSC Rookie
SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)SSC Rookie (45 reputation)

Group: General Forum Members
Points: 45 Visits: 109
I am just learning about data warehousing and business intelligence, and reading Ralph Kimball’s excellent book “The Data Warehouse Toolkit, Second Edition”.

In dimensional modeling, many of the fact table fields are foreign keys that link to the primary keys of dimension tables. My question is, when the data warehouse database is implemented in SQL Server 2005, is there any point to adding SQL Server foreign key constraints between the fact and the dimension tables? The data should have already been cleansed in the loading process, and adding foreign key constraints doesn’t automatically index the foreign key fields – the fields have to be indexed separately. It seems that foreign key constraints might add complexity, slow processing, and provide no benefit.

Any opinions?
Loner
Loner
Hall of Fame
Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)Hall of Fame (3.5K reputation)

Group: General Forum Members
Points: 3540 Visits: 3350
The cleansing routine is done during the ETL process. After the data loaded into the dimension table, it is not guaranteed that no one would delete it.
To put a foreign key in the fact table is used for data integrity.
RonKyle
RonKyle
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2703 Visits: 3497
I understand your point about the data cleansing. Why should it be necessary? One, it will help analysis services when you import the tables because it will see the relationships. Two, just in case something is wrong somewhere, the FKs might show it. I recently had a case where having FKs showed an issue that would not have shown up otherwise.



Hendra Eka Putra
Hendra Eka Putra
SSC-Enthusiastic
SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)

Group: General Forum Members
Points: 175 Visits: 77
I agree that you should always add a FK in your data warehouse.

My other consideration is that you can always use the benefits of index between FK column in your fact table and PK in your dimension table if you want to drill through the data
fayez.khan
fayez.khan
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 32
During ETL, loading Facts we always do a lookup with the dimensions to ensure that data gets filtered. In other words fact tables will contain only that data whose reference is present in dimension tables. Adding primary key foreign key relationship will decrease the response time
RBarryYoung
RBarryYoung
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14676 Visits: 9518
Chuck Bevitt (2/7/2008)
My question is, when the data warehouse database is implemented in SQL Server 2005, is there any point to adding SQL Server foreign key constraints between the fact and the dimension tables?


I wouldn't do it. Foreign Keys are really antithetical to the purpose and philosophy of OLAP and data warehouses.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
RonKyle
RonKyle
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2703 Visits: 3497
During ETL, loading Facts we always do a lookup with the dimensions to ensure that data gets filtered. In other words fact tables will contain only that data whose reference is present in dimension tables. Adding primary key foreign key relationship will decrease the response time


I suppose that for some downloads there would be no change to the dimension tables, but I haven't had that experience. There are always new dimension facts to download too, such as new technicians or new zip codes where the work is being done. There is the chance that something will go wrong with those downloads, in which case, you wouldn't want the fact table to also succeed. FKs prevent that second download from succeeded.

There could be a case where the download time performance is absolutely crucial, but most downloads occur in the middle of the night, and 10 or 20 extra seconds isn't going to matter.



RBarryYoung
RBarryYoung
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14676 Visits: 9518
RonKyle (3/10/2008)
[quote]I suppose that for some downloads there would be no change to the dimension tables, but I haven't had that experience. There are always new dimension facts to download too, such as new technicians or new zip codes where the work is being done. There is the chance that something will go wrong with those downloads, in which case, you wouldn't want the fact table to also succeed. FKs prevent that second download from succeeded.

There could be a case where the download time performance is absolutely crucial, but most downloads occur in the middle of the night, and 10 or 20 extra seconds isn't going to matter.


In general, I think that it is better to implement ETL features in the ETL itself and keep them out of the OLAP. And that is a large part of the whole philosophy of having an OLAP database that is separate from your OLTP database.

Also, I have been in many environments where my customer was pushing the limits of their 4 to 6 hour ETL window and there have not been many cases where one FK would have added only 10 to 20 seconds, let alone one or more for every dimension.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
RBarryYoung
RBarryYoung
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14676 Visits: 9518
Loner (2/7/2008)
The cleansing routine is done during the ETL process. After the data loaded into the dimension table, it is not guaranteed that no one would delete it.
To put a foreign key in the fact table is used for data integrity.


Users and Apps should not be deleting records or making any other permanent modifications to an OLAP database. Only the ETL should be doing that.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Hendra Eka Putra
Hendra Eka Putra
SSC-Enthusiastic
SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)SSC-Enthusiastic (175 reputation)

Group: General Forum Members
Points: 175 Visits: 77
Since data warehouse is not a mission critical, I completely agree with Ron. Added a FK would not decrease a process at all, because we do it in a batch. No users involved in this process

Also imagine when you had to drillthrough the data. FK will speed up the process.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search