The Pitfalls of Foreign Keys

  • I have designed and built a number of large and high transaction rate OLTP systems and have played this both ways. However, my own perf tuning and testing mirrors what David Poole discovered - FK overhead is measurable and in some cases significant and in rare cases, a deal breaker, as in, when all other possible solutions for wringing maximum performance out of a system (including hardware upgrades) have been applied, it presents the final spot upon which to negotiate a final step.

    Without a doubt, the data integrity concerns voiced by numerous writers are real and must be considered, AND, when push comes to shove, I will ensure that my database developers actually do create "perfect" code and not create orphans. We don't ever allow ad hoc access to production systems, so that isn't a concern (someone bypassing the coded RI), and frankly, it is pretty easy to create all of the basic sprocs for a system and get the RI correct. Good planning and execution win, when absolute performance is required.

  • I didn't argue for well tested stored procedures replacing DRI. That was the opinion of Phil Factor.

    I strongly believe that whether to use DRI is a design issue and that are few absolutes in SQL. If you have thousands of procedures then DRI becomes essential for the reasons that you have posted.

    The example I gave was based on a commercial product and adding DRI into the product DID drag the thing through the floor when deleting records from the database.

    Having a surviving brain donor going into the database tables of a 3rd party application would totally invalidate any support contract.

    DRI stops orphanned data but to claim it as a cure-all is daft. It doesn't protect you against someone doing an UPDATE without a WHERE clause or some flawed calculation to calculate your tax returns.

  • Somehow the idea of trusting my data to a bunch of (supposedly) well crafted stored procedures scares me. One of the databases I dealt with in the past took this approach (mostly because the schema was designed for a really old RDBMS and never updated). It was a nighmare. Orphaned items often outnumbered real items in certain tables. The applications would show ghost items, which were impossible to get rid of, without mucking about in the database directly. And it would cause wierd and seemingly random errors in the applications. It was because of this database and the problems associated with it that I first had to learn SQL. It was a great learning tool, not so good as a real system, though it was.

  • I have built systems in SQL Server that processed more than 2.5 million transactions per hour with DRI.  That was on 32 bit hardware and SQL 2000, and the system was not "maxed out."  I'd love to see how far you could push 2005 on 64 bit hardware...

    The point is that DRI is very seldom the issue, it is usually poor database/application design.  A quick glance of the db diagram in the article leads me to believe that this whoever built that thing didn't know what they were doing.  I understand that David used a commercial software package as the model, but in my experience, the level of database design in commercial software packages generally ranges anywhere from poor to pathetic.

    The only time I've ever seen DRI become a real problem is in a few warehousing systems that did large (millions of records) batch operations.  In these cases, since the data was comming from systems in which the RI was enforced, and the warehouse is strictly read-only, there was really no need to enforce those constraints again.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • >>I have built systems in SQL Server that processed more than 2.5 million transactions per hour with DRI. That was on 32 bit hardware and SQL 2000, and the system was not "maxed out." I'd love to see how far you could push 2005 on 64 bit hardware...<<

    Just for reference, on our current (32 bit) system, we are pushing more than 2.4 million transactions PER MINUTE on a sustained basis, and yes, that is a bit taxing to our db servers but still within their capability, surprising with the legacy db and code designs in place right now. I have redesigned the backend into SQL 2005, and the middle tier has been redesigned into .Net 2.0 64 bit, and we are placing this new system onto new 64 bit Opteron servers with the db server having lots of memory. In that situation, DRI overhead becomes invisible, and our ultimate throughput has yet to be reached (we are still in testing). I would estimate at this time that it will be close to 4-6 million transactions per minute, sustained, on a QUAD. We are using the HP DL585's connected via dual 2gb HBA's to a SAN for the dbs.

    The new 64 bit Opteron servers, for db work, are shockingly fast and adept at handling workloads that would force their 32 bit cousins to fall over. For SQL 2005 64 bit, it's a remarkable combination.

  • Now I'd love to see that!  Especially since the highest TPC benchmark ever achieved by SQL Server is 1.2 Million per minute and that was using a 64 CPU Itanium 2, with 1 TB of RAM that cost almost $6 million... 

    Now the TPC-C benchmark doesn't count all transactions, only new orders, but not to put to fine a point on it, there is no way that you got that kind of performance out of a quad 32bit system...sorry, but I think you are either measuring incorrectly or exagerating...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • >> It is worth remembering that if a table is referenced by a foreign key constraint you cannot use the TRUNCATE TABLE statement on that table.

    If you really want to truncate and don't need your data then:

    you can use scripts to drop FKs, truncate your tables, and re-create your FKs.

  • One of the biggest factors in using DRI is the nice dataflow diagrams and relationship diagrams one can give to new developers or for that matter to help in the design of new functionality.  If you use a third party data modeler tool (ERwin, Enterprise Architect, etc.) to model or reverse engineer there is no question that you should use DRI.   

    Let the DB maintain the RI, that is what it was designed to do.

    In my option you will be shooting your self in the foot if you rely on developers to maintain RI.  This also makes for long code reviews and QA cycles.

     

  • Our current system is using 4 Quads, and we're not doing TPC-type transactions. There is a huge difference between simple inserts and "select/deletes" and the TPC queries. The app I'm referencing is a real-time messaging app - very high speed inserts and async selects, but largely sequential in nature and all transactions are either atomic/single record or moderate unlinked batches of a few hundred records.

    The TPC-C benchmark is defined from "... a mix of five concurrent transactions of different types and complexity either executed on-line or queued for deferred execution." Concurrent and linked transaction activity changes the timing model profoundly, with locks and latches holding longer.

    Even still, at max transaction rate (2.4m/sec), we rarely max out any of the servers. I did ask one of our engineers if he was sure about the max rate and he confirmed the number. I have no need to exagerate the number, this is just a friendly discussion about performance and limits.

  • One of the great disillusionments of my career is seeing the backend databases for applications that are making money hand over fist.

    I would agree that many of the database designs are poor

  • With just one exception, I have not seen the db's for that type of app since I've generally only worked on new product design. However, if they mirror what I have encountered at essentially every client company I've worked at, then I cringe with you, and wonder how the heck those guys ever got paid for their work. It's clear that at least for the SQL Server world, the ease of use has created more hackers than crafters, with normal design skills abundantly...absent.

    It is almost as if those db engineers were saying, "oh, they'll fix it in the next release" on every release... 🙂

  • Just as a matter of interest what design would you put in place instead of the example given?

  • I won't attempt to provide a "better" design since I know almost nothing about the requirements for the system.  However, just looking at that particular diagram, several questions/objections spring to mind.

    It appears that whoever came up with this design was an OO developer since everything is related in a type/sub-type relationship.  OO folks can't seem to get trees out of their heads... 

    Just because entities share some common attributes does not automatically mean that they should be represented as sub-types of some overarching class.  This kind of design is obviously driven by confusing a relation (table) with a "class."  It is a very common mistake made by those who insist that the database is merely a storage mechanism.

    The blanket use of surrogate keys (IDENTITIY columns I'm assuming) is also a signature of the OO mentality. 

     

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Spot on DC Peterson!

    The design of the real package is very similar to my example.

    A CommonObject held about 50 fields that were common for any single specific table no matter what type.

    A CommonObject record was in a one to many relationship with the specific object tables.

    For example, if there was a web page CommonObject record then there would be a WebPage specific table that could hold many entries depending on the language required, the medium used to browse the page and the go-live date for the object.

    The CommonObject table slotted into a specific SiteTree table designed to determine the structure of the web site.

    I was thinking that a better idea would be to have a Specific Object parent for each object type and a Specific Object Child to handle the multiple instances.

  • Everybody talks about DELETEs.

    But what about SELECTs?

    Should "well tested procedures" allow to display orphaned objects?

    I believe not.

    How you can ensure it?

    SELECT ....

    FROM ...

    Where ObjectId not in (select ObjectId from dbo.Object)

    Or LEFT JOIN with check for dbo.Object.ObjectId IS NULL - does not really matter.

    Can you evaluate the overhead of such queries?

    And most of the time you do SELECT, not DELETE.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 16 through 30 (of 46 total)

You must be logged in to reply to this topic. Login to reply