Foreign keys good or bad practice?

  • This is really a matter of what the database was created for. If there is only 1 front end application doing all of the inserts and deletes, there is no reason to force FK type constraints. The application itself, if well written, should handle all of the relational integrity.

    Remember that long before databases there were COBOL programs that handled parent/child (header/detail) type relationships in their files and never had a problem with orphans.

    FK type constraints will slow down inserts, updates and deletes.

    I do, however, recommend Delete triggers to handle validating whether or not a delete can be done at all, and to cascade the deletes from parent to child tables. The application can handle inserts and updates.

    If, however, there are multiple applications inserting, updating or deleting records in the database, then I would implement Foreign Key constraints. There's no guarantee, with multiple applications, that any kind of referential integrity will be maintained in the database. You'll have orphans before you can count to 3.

  • If there is only 1 front end application doing all of the inserts and deletes, there is no reason to force FK type constraints.

    Actually, I'll disagree with you on this. Appropriate FK constraints are a good idea even when you have only one application servicing the database. Again, harping on my "consider the real world" mantra - what happens to this application over time? The original programmers move on. Perhaps the original specs get lost. No one clearly remembers *why* things were written the way they are, if they remember at all. New (vastly inexperienced) blood takes over maintenance programming. And, in the background your FK relationships and constraints keep their silent watch over the database, ensuring data integrity despite all the flux in the development team.

    Steve G.

  • tfifield (11/21/2007)


    If there is only 1 front end application doing all of the inserts and deletes, there is no reason to force FK type constraints.

    I just have to disagree on this one. It doesn't really matter if it's one app or 50. The RI checks are a safety mechanism to help protect the data. While there are extreme exceptions where performance is so severe that saving the few ms from the RI checks counts, or when the data is completely unstructured. The exceptions are, and should remain, exceptional. Not maintaining and ensuring integrity in a referential system is just, flat out, a design flaw.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • tfifield (11/21/2007)


    This is really a matter of what the database was created for. If there is only 1 front end application doing all of the inserts and deletes, there is no reason to force FK type constraints. The application itself, if well written, should handle all of the relational integrity.

    Remember that long before databases there were COBOL programs that handled parent/child (header/detail) type relationships in their files and never had a problem with orphans.

    FK type constraints will slow down inserts, updates and deletes.

    I do, however, recommend Delete triggers to handle validating whether or not a delete can be done at all, and to cascade the deletes from parent to child tables. The application can handle inserts and updates.

    If, however, there are multiple applications inserting, updating or deleting records in the database, then I would implement Foreign Key constraints. There's no guarantee, with multiple applications, that any kind of referential integrity will be maintained in the database. You'll have orphans before you can count to 3.

    I totally agree...with Grant that is. What difference does it make if you have 1 or more applications hitting your DB? Aren't they all written by developers? The days you speak of with COBAL, files, records and fields no longer apply in the world of RDBMS's. Relational databases represent a paradigm shift from the days of old where tables were nothing more than 'files' and rows were 'records' and columns were 'fields'. There is a major shift in thinking here between the old file based systems and relational data modeling. With that change in thinking brought a new type of specialist to the IT realm....the DBA. Back in the COBAL/mainframe days the hardware was the major investment for companies. Now it is the information on the hardware. The data is the most important piece to any application so why not ensure your data integrity by hiring a qualified specialist (DBA). I find it irresponsible for a DBA not to want to keep control over the data at the database level. If it is you who they are going to come crying to when there is a problem, why would you want the control in the hands of the developers? So when a problem arises, you'll have to explain to them why you did not do anything to ensure the integrity of the data. As for the performance hit, the same statement can be made of an index. Do you rely on the front end to handle indexing as well?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • tfifield (11/21/2007)


    This is really a matter of what the database was created for. If there is only 1 front end application doing all of the inserts and deletes, there is no reason to force FK type constraints. The application itself, if well written, should handle all of the relational integrity.

    Remember that long before databases there were COBOL programs that handled parent/child (header/detail) type relationships in their files and never had a problem with orphans.

    FK type constraints will slow down inserts, updates and deletes.

    I do, however, recommend Delete triggers to handle validating whether or not a delete can be done at all, and to cascade the deletes from parent to child tables. The application can handle inserts and updates.

    If, however, there are multiple applications inserting, updating or deleting records in the database, then I would implement Foreign Key constraints. There's no guarantee, with multiple applications, that any kind of referential integrity will be maintained in the database. You'll have orphans before you can count to 3.

    looking at some of our queries i'll take the minor performance hit on inserts to having FK's any day. i don't think they are appropriate in all cases but we have queries joining 20 tables to verify integrity where if we had FK's we could just write a simple select

  • I find it quite amusing to observe the passions evoked by this subject. 😛

    In database design, as in life, there are few absolutes. Do I believe the DRI is a best practice worthy of following? Absolutely. Have I had experiences with databases where the performance hit of a FK was prohibitive on writes? You bet.

    So for the benefit of some of our colleagues that are newer to our profession, perhaps it would be valuable to provide some guidelines when to use certain approaches.

    If we're talking about referencing "lookup" or "type" tables, where the number of records is in the tens or even hundreds, DRI should be a no brainer. On the other end of the spectrum, using DRI on a 10 million record table that references a 100 million record table may not perform very well. In this case, I have relied on the business logic encoded in stored procedures to maintain data quality instead of using DRI.

    We should remember that just as DRI is a best practice, so is restricting data access to stored procedures. In my shop, DBA's are responsible for writing all stored procedures. If a skilled developer helps out by writing his own stored procedures, that code still must be validated and pushed to the database by a DBA.

    This practice addresses the issue of 1 application versus 50 applications accessing the database, in that it shouldn't matter. All data access is through stored procedures that have been vetted to insure they promote data quality.

    In summary, the proper architectural approach depends on the application, and often a hybrid solution is the right answer. Academic certitudes need to give ground to real world pragmatism.

  • alain_smithers (11/27/2007)If we're talking about referencing "lookup" or "type" tables, where the number of records is in the tens or even hundreds, DRI should be a no brainer. On the other end of the spectrum, using DRI on a 10 million record table that references a 100 million record table may not perform very well. In this case, I have relied on the business logic encoded in stored procedures to maintain data quality instead of using DRI.

    I'm curious how you maintain relational integrity without accessing the two tables. Besides - a unique index on the primary and foreign keys allows you to scan for the existence of a value within hundreds of millions of records in milliseconds. Is there an overhead? sure. Is there an other option to ENSURE/GUARANTEE DRI? not really.

    Don't confuse data quality with DRI. There's a rather large difference, since you can improve "data quality" if you HAVE DRI, but if you don't have DRI, you're pretty much smoked, since you then have to "interpret"/"massage"/"make assumption" relations that have been lost, if you even bother to do that.

    Maybe I am being simplistic - but if DRI is important, you implement it in a way that will continue to function. Business logic might help, but it's way too easy to just bypass. Sorry - it's just not reliable enough.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Wow! I never thought my first post on this forum would evoke so many passionate responses.

    I still think that my points on the type and number of applications accessing the database and for what purpose, and that applications before databases seemed to manage just fine hold water. I use DRI whenever it makes sense to do so. Your point on 10 million records referencing 100 million is a very good point.

    Another point is when a database needs to be loaded from imported data. Foreign keys slow it down tremendously and make it a real pain to make sure that the load is done in sequence. I've written a couple of stored procedures that put all foreign keys into a table and then drop them in order to do a database load. Then another to restore all FK's. This is another pain.

    The only point I was trying to make was that they are very useful in maintaining integrity of the database, but can become onerous in some cases.

    TFifield

  • tfifield (11/28/2007)


    I use DRI whenever it makes sense to do so.

    Actually, I would put it the other way 'round. Use DRI as a default. If a rational case can be made to skip it, then do so. The situations where it makes more sense to skip DRI will be far fewer in number than where it makes sense to use DRI. Design first for the general case, then worry about exceptions.

    Your point on 10 million records referencing 100 million is a very good point.

    Not really. One of the points that I and others have made is that a FK relationship places the performance load on inserts and updates while relieving the load from selects -- which typically are executed far more often. A 10 million row table referencing a 100 million row table is where this performance is most needed. If you have to join the two together, FKs will not help or hinder. However, you can join them knowing that every Item Detail is correctly associated with an existing Item.

    On the other hand, multi-million row tables typically hold audit and history information and are only loosely related to information in other tables. Many don't even require a Primary Key. Being as they are on the low end of the data referential structure, data integrity is not such a big issue here. So the "problem" could well be too contrived to be meaningful.

    Another point is when a database needs to be loaded from imported data. Foreign keys slow it down tremendously and make it a real pain to make sure that the load is done in sequence. I've written a couple of stored procedures that put all foreign keys into a table and then drop them in order to do a database load. Then another to restore all FK's. This is another pain.

    Oh, boo-hoo! So your suggestion is to place a higher burden on everyday operations in order to make your once-in-a-blue-moon operations easier? Get real. We work to make the data more reliable not to make our jobs easier. We absolutely do not do not do not sacrifice the former for the sake of the latter. How comfortable would you be driving across a bridge if you had heard the bridge designer comment on how easy his job had become once he had decided to "skip all that integrity nonsense?"

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Megha Yadav (11/13/2007)


    Hi Guys,

    I have come across a DB where there are no Foreign key relationship's ...

    Is this a wise thing to do? ... Since the data is being inserted through a ASP.Net application and it makes sure to the relationships and data integrity .... but then is the overhead involved in this approach more ?

    Can there be a scenario where Foreign keys are an overhead to maintain and best be avoided ?

    Do Foreign keys help query processing at all or are they just constraints to maintian referential integrity ??

    Please give me pointers to solve these question.. Thanks .....

    My answer is they are a BEST PRACTICE. Foreign Keys provide data itegrity which is your goal, can you account for the validty of data or not. I will say thou I have been lax in the past with DLR but of all the reasons I have been given I have never seen any valid reason not to have FKs, even if the reason is perfomance issue (these generally point back to design issues).

  • I've seriously thought about letting this go because it benefits me not to argue with someone I don't know or particularily care about. However, I think I should clarify the point I was trying to make in my previous posting.

    I believe I clearly cited the case of when one might consider not using DRI was when it impacted write performance. In my example of 100 million row table referencing the 10 million row table, how do imagine that many records got there? I am referring to massive import of data that occurs many times a day, often having 500K records in one batch. In this case, multiple reads for each record (traversing the b-tree) adds up to be very costly. If I have stored procedure that is loading my data, I can be assured that I have referential integrity because I am looking up a surrogate key in the 10 million record table to insert data into the 100 million record table. Since I am using set logic, the whole batch either succeeds or fails, again ensuring referential integrity.

    Unless you have had to support an application with massive volumes, I think platitudes about doing it "by the book" are so much rhetoric. I agree whole heartedly with the post that suggests that DRI should be the default design. I believe my original post accurately illustrated a case when some out of the box thinking/design is warranted.

    In order to be useful for the reader who has no axe to grind, perhaps we can agree that the answer may be dependent on the type of application you have. For straight forward OLTP applications, I can think of no reason why you wouldn't want to use DRI. For applications that resemble data warehousing, there may be occassions where you may be forced to consider alternate methods of ensuring referential integrity. Fair enough?

  • Interesting. Can you post the code that you used to perform your test?

    Now, you sound like me 😀 and, dang it... ya beat me to it 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yep, and I'm still waiting on that code to test......

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • alain_smithers (11/28/2007)


    In order to be useful for the reader who has no axe to grind, perhaps we can agree that the answer may be dependent on the type of application you have. For straight forward OLTP applications, I can think of no reason why you wouldn't want to use DRI. For applications that resemble data warehousing, there may be occassions where you may be forced to consider alternate methods of ensuring referential integrity. Fair enough?

    I'm not sure anyone here has had an axe to grind. A fair amount have a LOT of experience at a lot of apps, so I think it's fair to say that some opinions are strongly worded and based on long track records. Perhaps you are taking the strong opinions as a personal attack: don't - it's to elicit dialog, not putdowns or inferences as to someone's purported experience or lack thereof.

    I'd have to say - the 100 million records probably got there about the same way the 75 Billion detail records in my system got there: lots of work. I've just never resorted to dropping my DRI in order to get data in. Perhaps I've been lucky, perhaps I'm just too mule-headed to see the value in getting data in that might get orphaned, but the doctors I work with would rather wait a little while longer for the data than for there to be a risk of missing/corrupting DRI.

    Inserts are the least of my worries. given the nature of emergent medicine, my "public" identifier is likely to change or get updated, get merged or get deleted based on the vagaries of trying to identify the patient correctly (especially one that might be unconcious or burned beyond recognition). And yet - DRI is still the MOST important part of the equation, so I "wait" for it if I need to.

    If anything - to me it's been the opposite: the bigger the system, the MORE fundamentally important DRI has become. If anything - my data gets more normalized and more stringently related the bigger it gets. Why - because the cost and the effort involved in fixing said conflicts get bigger with every row that gets added.

    If our systems were perfect and we were fully on top of everything - we'd be doing nothing BUT platitudes as you say.

    That being said - you manage your data any which way you like - perhaps your data can live with some cleanup on the relations after the fact. It's a matter of paying now or paying later. I just can't afford the "pay later", since that usually means someone got hurt or harmed because of it.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • A wise friend of mine once said, "A single test is worth a thousand calculations and speculations". Another wise friend of mine said, "A Developer must not guess... a Developer must KNOW."

    Folks will tell you, I'm for writing the fastest code possible... almost everything I do has to do with scalability and maximizing performance. I believe in the need for speed!...

    ... just not at the expense of possibly allowing data to become corrupted. 😉 Guess you might say I'm with Matt 100% on this one.

    So, without grinding any axes, or calling anyone a dummy, let's do a test. The following code builds 3 tables... admittedly, they are not on the scale that some of you speak of... it may have taken years for that much data to be deposited in the database and I'm thinking a max of a ten minute test, here.

    I've an "import" table to simulate an external data source called jbmImport and it contains 500,000 rows of data to simulate a small to medium size import. We'll use a simple Insert/Select to copy it's data into another table called jbmTargetTable. I used Insert/Select on this particular experiment because it's speed is comparable to that of a well formed Bulk Insert (actually, Bulk Insert can be nearly twice as fast, but close enough).

    The third table is the jbmFKTab and it contains a million rows of DRI data. The test code should be run as a unit just so we don't have to explain anything other that the difference that DRI makes. The only difference between a non-DRI run and a DRI run is whether or not you comment out the FK code near the last part of the code.

    Here we go...

    --===== Use a "safe" database that we know everyone has

    USE TempDB

    --===== Supress rowcounts for appearance

    SET NOCOUNT ON

    --===== If the test tables already exist, drop them (for repetative runs)

    IF OBJECT_ID('dbo.jbmTargetTable','U') IS NOT NULL

    DROP TABLE dbo.jbmTargetTable

    IF OBJECT_ID('dbo.jbmImport','U') IS NOT NULL

    DROP TABLE dbo.jbmImport

    IF OBJECT_ID('dbo.jbmFKTab','U') IS NOT NULL

    DROP TABLE dbo.jbmFKTab

    --===== Create and populate a 500,000 row "import table" (simulates external data source).

    SELECT TOP 500000

    SomeIntFK = ABS(CHECKSUM(NEWID()))%1000000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.jbmImport

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== Create and populate a 1,000,000 row test "FK" reference table.

    SELECT TOP 1000000

    FK = IDENTITY(INT,1,1),

    SomeDesc = 'Some reasonable size description of the row here',

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)

    INTO dbo.jbmFKTab

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.jbmFKTab

    ADD PRIMARY KEY CLUSTERED (FK)

    --===== Create and populate a 1,000,000 row "target table".

    SELECT TOP 1000000

    PK = IDENTITY(INT,1,1),

    SomeIntFK = ABS(CHECKSUM(NEWID()))%1000000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.jbmTargetTable

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.jbmTargetTable

    ADD PRIMARY KEY CLUSTERED (PK)

    --====================================================================================================================

    -- Do the test...

    --====================================================================================================================

    --===== Declare a duration timer variable

    DECLARE @StartTime DATETIME

    --===== This creates the DRI in question. May comment out or not...

    -- ALTER TABLE dbo.jbmTargetTable

    -- ADD CONSTRAINT FK_jbmTargetTable_jbmFKTab

    -- FOREIGN KEY (SomeIntFK) REFERENCES dbo.jbmFKTab (FK)

    --===== Copy the data from the import table to the target table with no DRI

    SET @StartTime = GETDATE()

    INSERT INTO dbo.jbmTargetTable

    (SomeIntFK,SomeLetters2,SomeCSV,SomeMoney,SomeDate,SomeHex12)

    SELECT SomeIntFK,SomeLetters2,SomeCSV,SomeMoney,SomeDate,SomeHex12

    FROM dbo.jbmImport

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    Do notice that the only time we're measuring is the time to do the insert either in the presence of DRI or without...

    On my poor little 5 year old 1.8 Ghz desktop computer, here's what I get for results...

    [font="Courier New"]

    Without DRI: 00:00:09:907 Duration (hh:mi:ss:mmm)

    With DRI: 00:00:19:157 Duration (hh:mi:ss:mmm)[/font]

    With DRI, the code runs nearly twice as slow... But let's get real here... 500,000 rows takes virtually no time to import and the time difference between the two is less than 10 seconds.

    Now, do what you want... say what you want... tell others that DRI interferes with performance because it certainly does. But, the first rule of a DBA and folks that are concerned with data should be "Above all else, either by action or inaction, let no harm befall the data". If you value your data, DRI should be an integral part of every table that's not a definition table that supports DRI.

    If the performance differential is too big a hit on your code, rewrite your code to perform better. I know a pretty good forum for that 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 61 through 75 (of 87 total)

You must be logged in to reply to this topic. Login to reply