The Pitfalls of Foreign Keys

  • DRI does not impact performance of SELECTs directly.

    Indirectly its absence might because you are trawling through more (orphaned) records to achieve a stated goal.

    If the query uses INNER JOINS it won't bring back orphans.

    By definition an OUTER JOIN is expecting records in one table but not another.

  • So, to display only right orders you need to include Customers into all select statements querying Orders. Are you sure it will not affect performance?

    And if you allow to delete Customer record without deleting or related Order records you may end up in... Databasa restore at morning?

    _____________
    Code for TallyGenerator

  • If you were going to match up Customer to Orders anyway then DRI should have no affect.

    If all you wanted were Orders without the customer information then having to join on Customers will have a performance overhead but this isn't DRI, it is the result of not having DRI.

  • Are you sure?

    To get statistics for number or total amount of last week orders you don't need to match up those orders to customers, unless you are not sure some of those orders not suppose to be deleted.

    So for each select statement for Order Line having Total Amount to be aggregated you need to add JOIN to Order Header table, and than to Customer table, and that to whatever you have on top of this hierarchy.

    And if you need to know if somebody ordered this particular part last week you need to join Order Lines containing this part to the top of your hierarchy again.

     

    _____________
    Code for TallyGenerator

  • That's one example of a possible performance hit that would be necessary in the absence of DRI. 

    The bigger problem is that you have orders for products that are not tied to a customer.  So either you write your application to ensure that that situation never occurs (likely imposing a larger performance penalty than DRI) or you allow that situation to occur.  If you opt for the second then you have to code your application or "well written stored procedures" to account for it, simultaneously increasing the likelihood of bugs and logical errors, reducing the understandability of the data model driving up support and maintenance costs, and destroying data independence.

    Not the kind of tradeoff that I'd call a bargain...

     

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Yes but as I said, the performance hit is as a result of not having DRI in the database and this resulting in orphaned records.

    DRI does not affect SELECT statements directly.

    The bit I am not sure about is the borderline between a good normalised design and a design that is bad because it follows an object orientated paradigm.

  • Your approach prescribes to create orphaned records.

    It's only point of not having DRI described in the article.

    DRI allows to create clear SELECT statements without extra check if this record is orphaned.

    DRI does not affect performance, but absence of DRI really does.

    _____________
    Code for TallyGenerator

  • Its hard to believe that SQL6.5 didnt have foreign keys..personally I think if you lock down the database as much as possible your helping the programmer write good quality code. If the programs been tested as much as possible and your happy then all constraints could potentially be removed but in my experience end users have been known to dts data into tables etc. causing all sorts of issues. I know constraints have prevented me make lots of mistakes! Also if anyones generating SQL DRI is essential. Put it in place and the sps can be generated for each release, without them you have to hand code deleting child records etc. introducing errors. Also cascade deletes havent been mentioned as a handy feature of foreign keys. Lastly communication is the biggest problem in any project by encoding the (DRI) structure in the database you're ensuring everyone is able to understand the database?


    Phil Nicholas

  • No abscence of DRI does not hurt performance it merely makes it possible to create a situation where you have orphaned records. It does not automatically mean that orphaned records will exist.

    I've just thrown away my old SQL6.5 Admin course and test prep books but I could have sworn that it had DRI!

    I know that MySQL 4 didn't.

    As I've said before I tend to use DRI unless I have a very good reason not to and on occassion I have had good reason.

    The thing with well tested stored procedures as in Phil Factor's article is a bit of a swings and round-a-bouts argument.

    In simple applications it is perfectly possible that the do-it-all in stored procedures approach can be managed because the scale of testing and the application of QA procedures can cope. The thing is that in a simple application a performance boost won't be realised.

    In a larger and more complex application where there could be a realistic chance of a performance boost the testing load and QA procedures are too excessive to keep the risk to the data to acceptable levels.

    My position as a DBA is that pessimism and paranoia should be the default approach when it comes to data integrity. In other words use DRI, CHECK constraints etc.

    The other factor is that developing in a large team is a completely different experience to developing in a small team or even as an individual. If ever your have had your software tested by a professional tester then it is a humbling experience, even if you take great pains over the quality of your work.

  • Yes it was 4.2, not sure whether it supported nullable fk's though (ie with nocheck) as erwin produced triggers in this situation.

    DRI does have a performance hit. Agree with everything else, but maybe I just dont know the superhuman programmers that make no mistakes that the no dri people know?? Used to be an arguement about MySQL vs SQL server that Mysql didnt have dri but I'm pretty sure the latest release does.


    Phil Nicholas

  • The latest release has Triggers, stored procs, DRI, support for clustering etc.

    Well worth a look even if it is just as the icing on top of a CV.

  • The original premise – referential integrity constraints can be replaced by logic in stored procedures – is valid in that you can put logic anywhere as long as you have it, you know where you put it, and you can change it if necessary. Along the same lines you could put that logic in subroutines within your application code (hey, it’s an option). There are two issues to consider. First, what makes the most sense from an architectural perspective (probably the constraints)? Second, what is most efficient? If you are going to use a relational database, then you should use it as God intended. If that isn’t fast enough, get a bigger machine. And if it still isn’t fast enough, don’t use a relational database. That may sound crazy, but how much of your time is spent attempting to tune a slow database verses the timesavings it affords you. If you have large application and you are considering NOT using referential integrity to make it run just a little bit faster, consider blowing off the relational database altogether. When is the last time you compared performance between a relational database and some of the alternatives? Hey, it’s an option.

  • Comment on the original article:

    When deciding to enforce relations using FKs or not, this should be compared with using application logic . not with simple deletes or updates.

  • In my experience, testing is the last thing a customer wants to pay for. They want the product done as fast as possible and the least expensive as possible. Its hard to tell a customer, "we'll have to add anothier 50% to the cost, so that we can make sure our 1000 or so SP's don't let the data get outa whack".

    Not that we don't test our procedures, but I would rather spend that little amount of time that the customer will pay for testing, to test the business logic, not making sure that the database is consistant at all times. I would rather hand that testing off to the product that the customer is paying $1000's to use for storing data.

    Anyway, you can't get around looking up child records before deleting parents, or deleting children before deleting parents. You either look it up yourself (and incur the same performance hit that DRI incurs), or you let SQL do it for you. I would bet that SQL has some internal optimizations specifically for DRI, and its lookups would be somewhat more efficient then yours will be.

    I aggree, if you are afraid of the performance hit of making sure that your data is consistant, then you probably do not want any RDMS, just make you own storage system. Either way, I bet the difference is pretty minor.

  • You enforce referential integrity using FKs, not relations.  Relations are tables.

Viewing 15 posts - 31 through 45 (of 46 total)

You must be logged in to reply to this topic. Login to reply