Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

Foreign key constraints in data warehouses Expand / Collapse
Author
Message
Posted Sunday, March 30, 2008 9:36 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 18, 2012 12:47 PM
Points: 19, Visits: 59
After reading through some of the "suggestions" on this thread, and going through years of getting my butt handed to me by my own code, I'm curios; if you don't declare and enforce RI, how would you know you're ETLs are scrubbing the data properly? I can see unit testing, but when edge cases are taken into account and there's no RI all hell generally breaks loose.

After cleaning up messes made by some of the published authors in this field, I can assert from personal experience that RI is not optional. First, it'll save you time to release as your ETL bugs will be immediately visible (no testing required, the code just fails) AND it will save your administrator hours of sleep as an RI violation in production should trigger a batch rollback/reload so no human intervention is required to clean the errant data.

Last, and probably most importantly, it'll save your users from making huge business mistakes. If a part is purchased from supplier A and the reporting system is asserting that it is purchased from supplier B, a bad decision is going to be made. Depending on the decision, it could be terribly detrimental to the company as a whole. If the company doesn't make money, neither do you :)

There have been occasions where I've dropped RI due to physical limitations of a machine, but I'm also aware I do so at my peril. I'd suggest this is always the case.

ROb



Post #476767
Posted Monday, March 31, 2008 9:51 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, March 6, 2014 6:13 PM
Points: 318, Visits: 1,183
100% agree with ironpants. You should start with RI enforced always, and only ever consider removing it in special cases (usually for performance issues that can't be solved any other way) and when doing so make the business fully aware of the risks. How many of us are absolutely certain that all processes that touch the DB are perfectly written and respect our non-RI enforced relationships?

Regards,

Jacob
Post #477393
Posted Tuesday, April 1, 2008 4:59 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Friday, August 29, 2014 2:41 PM
Points: 815, Visits: 2,020
Had a case just yesterday where having the FKs in place led us to find that a customer hadn't provided information on 5 zip codes for a process that had been added after the initial fielding. Without restraints, we would not have discovered it was an issue. I'm on the side that you take a huge risk without them. You just never know what can happen, especially if you make changes.


Post #477505
Posted Thursday, July 8, 2010 7:09 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 25, 2010 8:26 AM
Points: 4, Visits: 28
Shouldn't those 5 missing zip codes be shown up by the RI in the OLTP database?
If the OLTP database lacks RI it won't exist in the DW either!
I think that the valid logical 'foreign key relation' is important in the DW though it need not be enforced by an 'actual' foreign key.
I use a lookup function on loading at the ETL level to assign fact table surrogate key values from dimension tables and if a value is not found (ie. where RI is broken at a logical level I assign a default value so as to avoid a null: in practice the default value in the fact table might flag a foreign key relation problem presumably in the ETL region). If you need enforced foreign keys in the DW then the data loading is defective.
It could be possible to apply RI during development and testing to ensure that data is coming in to the DW without RI problems and to remove it when the DW loading is significant. I can imagine that a fact table of perhaps 30 foreign keys to large dimension tables and millions of rows is going to be very slow on inserts and heavy on resources. To speed up the querying on the DW it is possible to use suitable indexing which can be switched off during loading.

Having to enforce foreign key relations on all keys of all fact tables must surely overload performance.
If the input data from the OLTP data is not 'clean' there is a problem 'outside' the DW itself.
Post #949265
Posted Thursday, July 8, 2010 10:38 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, August 21, 2014 5:18 PM
Points: 406, Visits: 965
We've completely separated the dimension tables from the fact tables. They could be on separate servers if I wanted.
This was a bit nerve racking at first, I will admit. Adding triggers to the dim tables to prevent deletes helped.
Also access to the dim data is only through the SSIS package or the SSAS account.
Post #949445
Posted Thursday, July 8, 2010 10:49 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 25, 2010 8:26 AM
Points: 4, Visits: 28
Do you mean separate databases for the fact and dimension tables?
I am curious to know if this slows down joins between fact and dimension tables due to their being in separate DBs.
It would also be possible to use separate disks (RAM disks?) for different parts of the DW (staging tables or lookup tables as well as dimensions or facts or even indices) but remaining within the same database.
Is your advantage in having separate servers for each part of the DW?
Post #949449
Posted Thursday, July 8, 2010 9:29 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 18, 2012 12:47 PM
Points: 19, Visits: 59
You know how I know you have no idea of relational integrity? ;)

Do this at your own risk. After about 20 years I've come to the conclusion that my employment is boud to the extent of my constraints. You just removed a bunch of yours. After it all falls apart, what do you think your job security will look like?



Post #949706
Posted Friday, July 9, 2010 4:42 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 25, 2010 8:26 AM
Points: 4, Visits: 28
I agree with extensive use of foreign keys in an OLTP database (I worked on one recently which had NONE and it was a very scary experience). Personally I would not use FKs at all in an OLAP/DW.
If you use FKs rigorously in a DW (that is in virtually ALL fields of a fact table) these tables are likely to be so slow on block inserts or updates that the DW is likely to unusable as the fact tables grow with time.
I think it is bad practice to use RI in the DW to detect problems in the ETL layer. It is true that the FK will show up an RI problem in the ETL, but I would suggest that this be handled another way (lookups).
If a lookup surrogate key value is not found, this implies broken RI, and it can be logged as an error if this is critical, the lookups can be added (as with the 5 missing zipcodes) either manually or by using code to upate lookup tables.

This is however an ETL/staging problem. It should not be allowed to dictate constraints on the DW.

The whole point of the DW is its speed of response both to data loading and to data querying. Orthodox use of FKs (as for OLTPs) works directly against this. Partial use of FKs seems pointless!
To use the fact tables as a means of detecting ETL errors is getting the argument backwards.
RI errors in the ETL should be managed BEFORE the data gets into the fact tables or dimensions. I would suggest writing good ETL procedures that handle missing lookup surrogate key values or which automatically update new values (eg zipcodes as above) in the ETL feed data into lookup tables. This way a value CANNOT be inserted into a fact table or dimension without the lookup value since the lookup value itself has been checked before hand and MUST be there.
I am concerned that good practice OLTP techniques when applied to OLAP become a recipe for an overloaded and finally dysfunctional DW. (I would also avoid the use of triggers and cascades in a DW star scheme)
Post #949838
Posted Friday, July 9, 2010 6:07 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Thursday, August 21, 2014 5:18 PM
Points: 406, Visits: 965
Yes separate databases. All the fact data lives in separate databases as well. All these fact DBs share the common dimension db. I don't worry about joins since join speed is not related to where the data is located, it's indexes, besides almost all data access is through cubes. Integrity is an issue that is why you have proper backups. The ability to recover is what saves your job.
Post #949878
Posted Friday, July 9, 2010 6:57 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 25, 2010 8:26 AM
Points: 4, Visits: 28
I suppose separate DBs means easier maintenance: if you were to use disk and table partitioning and separate tablespaces wouldn't the result be similar? I can see that the modular approach that you suggest might be simpler from an admin point of view. I suppose it would also be possible to get perfomance gains during ETL if the dimensions are on one machine and the facts on another, though there could be a problem synchronising the two.
What were the factors that pushed you to try this solution?
Post #949929
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse