The Pitfalls of Foreign Keys

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/dpoole/thepitfallsofforeignkeys.asp

  • When working on deleting audit data from several very large tables without indexes, I borrowed a trick from warehousing: I created appropriate indexes, ran the deletes, then dropped the indexes. (To mention, in one case I had to use a query hint to force the use of the index--the only time I've ever had to do that.)

    Philip

  • Using DBMS-features like foreign keys, check constraints, defaults, unique keys, etc. as some sort of "last line of defence" to my opinion is crucial whenever designing/maintaining a database system, especially in "evolving systems" that are not out-of-the-box.

    of course you could do all those things by "well tested SPs", but how can you ever assure to have your 1000+ procedures in a larger system up-to-date, especially when the underlying DB-table structure has to be changed?

    the cost of such features is absolutely outweighed by the fact that you've done everything possible to ensure data integrity.

  • "scarcely necessary if creating, updating and deleting are done through well-tested stored procedures".

    Where is it more likely that bugs will arise ? In a "well-tested" stored-proc, or in a foreign key declaration ?

    With the no-foreignkeys approach, you end up writing all sorts of repair routines to search for & clean up orphaned data. Of course, this orphaned data should never exist, because these "well-tested" stored-procs are entirely 100% bug-free and never allow duff data into the database. Perhaps in fantasy-land that is true, but in the real world ... And of course, every application always uses these stored-procs, no application ever bypasses them and updates the data directly. That would never happen. No, of course not - that would just be daft, and lead to duff data.

    If declarative integrity checks are used, you avoid wasting time on such cleanup activities. You state to the DBMS what things must be true, and let the DBMS worry about it. Make it work for its living - that's what it's there for! OK there is some (minimal) overhead, but I'd rather pay that cost and have good data, than save some minimal amount of performance and end up with crap data that needs sorting later.

  • Good article David.

    But the first thing I thought was: Oh no, not another discussion on DRI...

    Sounds a bit defeatist but, as with so many things it's impossible to get everything that you want. If someone creates a method to enforce DRI without the overheads involved then I'm sure we'll all be pleased to make them very rich. In the mean time, however, we're going to have to work around the constraints.

    Enforcing DRI programmatically using stored procedures sounds like it's got some potential for disaster. Once the database ownership has been handed over, you can't rely on the initial protocol. I've seen people go to extraordinary lengths to remove and create their constraints and DRI programmatically, not something I would recommend for the faint of heart.

    I would be in favour of being able to use TRUNCATE if there's no associated child data though. Does anyone know why this has been "blanket-enforced"?

    Max

  • Nice article 🙂

    Particularly interesting for me, because in my spare time I am working on building a .net CMS system using SQL 2000 for the backend.  There have been a couple of occasions where I've been unsure whether to include DRI or not, but for now my approach is to always include it.

    Good stuff.

  • Interesting consideration, but of course with no DRI your database is not a relational database, just a convenient collection of tables written to by applications.  (You could potentially gain more performance gain by using a CSV file for each table and bypassing SQL Server alltogether. )

  • Some additional thoughts:

    Re: Deletes are expensive

    1. How often is a delete from a code table performed (hint if there are <100, <1000 entries?) There may be no need for optimization

    2. If you are doing a lot of deletes as in a DW feed, then you can use disable/enable check/no check to improve performance. Then TRUNCATE is also an option.

    Re: DRI not necessary

    1. At least create the DRI with the NO CHECK option to allow tools that read metadata to understand the joins. There is no performance argument against a NO CHECK constraint. Most DB tools nowadays use this information.

    2. At some point in Development and Testing, you should turn appropriate DRI on to keep persons honest. So I would advocate turning to NO CHECK any constraints in PRODUCTION that you feel cause performance concerns.

    3. If developers instinctively know about integrity, why do some have no clue when they get an RI constraint violation? I frequently use Triggers vs DRI to give a more precise error message including the value of the FK column that caused the problem, and the name of the PK attribute. Some developers STILL email the message/stack trace to me and ask what it means. This way, I can answer the question without doing any research, and accomplishing a training goal in the same breath

    Recommendations:

    1. Choose only between creating a disabled or enabled constraint. Even if you use RI triggers or SPs.

    2. Perform Performance Tuning by eliminating constraints the same way you add indexes- on an as needed basis. Wait until you have a problem, then disable the constraint/trigger.

    2. Consider temporarily disabling constraints for certain high volume activities. Ditto for special use indexes as another poster mentions.

  • That was an outstanding article. Thanks.

    We're currently in a bit of a struggle with a development team that has become completely enamored with generated code. Problem is, with type of coding being generated it's a lot of extra work to determine if a record has been edited. Therefore, they want to simply delete complete record sets and re-insert them. We've been arguing against it at length, and you've just given us some more ammunition. Thanks again.

    BTW, our own tests, while not focusing on DRI, do bear out most of your tests. For those who are concerned that he did the test on a code table, which is unlikely to change, our developers are deleting from an active table with about eight child tables, so the paradigm holds up.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Good article. Quick question, has anyone ever experienced the case where the foreign keys ARE indexed but a table scan appears to take place on referential deletes?

  • I don't think Mr. Poole used a fair example.  A "well tested stored procedure" would also delete records from the referenced tables to avoid leaving orpahned records all over the database.  The execution plan for all of these tests and deletions should be included for a more accurate comparison.

    This is clearly a maintenance issue.  Consider what happens when the database is changed.  Once there are hundreds or thousands of relations to manage I would much rather trust a built-in mechanism, such as DRI, than modify and re-test dozens of "well tested stored procedures".

  • Bummer. The author shoots himself in the foot. In the non-DRI stored proc he should be trying to delete all the foreign key/children of the CommonObject table, not just the PageObject record. The SP as is would leave us with orphans.

  • I have avoided posting on this hotly debated topic in the past but there is a major point that both DBAs and developers overlook. Whether to use RI in production is a BUSINESS decision and not a technical one. It is up to the customer as to whether or not that want a minimal performance improvement over data integrity. It is very possible that the data could be recovered quickly and easily and that the performance improvement is justified. In a transactional environment though, the cost of lost data usually outweighs the cost of a beefier server.

    The "well tested stored procedures" argument holds no weight because the assumption is that some twit with access to the tables is going to use the stored procedures just because they are tested and his SQL script is not. By the same token we hope that developers don't assume that users will always use the interface they created and not hit the web server with a carefully crafted SQL injection attack; we hope they will put the proper validation code on the server side and not just rely on client script to catch these types of things.

    The only way to truly keep the data safe is with RI, period. It is the only thing that users can't get past; well, unless they have permission to edit the schema but I am not going to even go there

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • My thoughts exactly.  It isn't a fair comparison. 

    I have a major problem with even the premise of the question though.  If you are just concerned with throughput, then don't use a DBMS at all, you'll find much better performance using binary files to store data.  I don't care how fast your system is, it is totally worthless if it is not logically correct.  Using the Relational Model (declaring constraints etc...) vastly simplifies, and thereby increases the odds of getting it right. 

    Of course there is overhead in checking constraints, but there is a cost to ignoring them as well, in the vast majority of cases the cost of the latter far outweighs the cost of the former.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • >> Whether to use RI in production is a BUSINESS decision and not a technical one. <<

    Absolute rubbish.  It's part of the relational model... how is that a business decision?

    >> I have a major problem with even the premise of the question though.  If you are just concerned with throughput, then don't use a DBMS at all, you'll find much better performance using binary files to store data.  I don't care how fast your system is, it is totally worthless if it is not logically correct.  Using the Relational Model (declaring constraints etc...) vastly simplifies, and thereby increases the odds of getting it right.  <<

    Nail hit on head... completely agree with that.  Do it properly!

     

    As someone has said though... this runs the risk of turning into another thread on DRI!

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply