DRI or No DRI?

  • There is a great debate going on in the forums about using PKs/FKs to enforce RI, in other words, declarative referential integrity (DRI) as opposed to using code logic to ensure that proper relationships are maintained in a schema.

    The opening post is a DBA stunned that developers in his new company do not use DRI. The reason given is performance is degraded, and that's one that I can buy. There is additional overhead for checks with DRI, but it's minimal and actually David Poole has a great article about this topic coming soon.

    It's interesting that one of the questions is about how many 3rd party applications do not use DRI in their databases. Who knows why, though my suspicion is that they can easily "fix" issues with the applications with backend updates rather than maintaining good DB design practices. Or that they can easily alter the application to meet changing needs at various customers' sites.

    I think it's mostly the latter reason. That and laziness. I think the vast majority of developers are lazy by nature and dealing with DRI constraints when building an application is a pain. It's a pain for me, but I still think it's a good idea. Just like stored procedures add some overhead, so does DRI and many developers I have seen, both building shrink wrap and corporate software, don't want to deal with the overhead.

    Personally I like having DRI implemented for the same reason that developers use objects, libraries, functions, and other consolidation techniques. It centralizes the "rules" about your application and ensures that they are always used. I agree there is overhead, but it's a minimal amount and if you are seeing this on most servers then you are probably underpowered.

    My big concern is that often multiple applications or even multiple modules of a single application often need the same business rules: like no orders unless we have a valid customer number. If you depend on the application to enforce this, then you are gambling that every developer will do it correctly and the same way. In small, tightly controlled and managed environments, this works great. But as you grow your development teams, then it is easy for someone to forget to implement some RI rule or implement it differently than others.

    And those bugs are hard to find.

    I don't think that every application needs DRI setup in it, but if you choose to not implement it, be sure you understand the consequences of your actions. And your boss does as well.

    Steve Jones

  • I have to agree.

    At the end of the day one could say that IT is all about making information more accessible, or, in other words, making our lives easier. Now when it comes to developing an application, and I (as a developer) need to start building modules, what easier way to remind me of business rules than by the database telling me what I'm trying to do is violating a constraint?

    Moreover - what point is having data that you can't be 100% sure is 100% correct. Let any developer, (and as stated above, I am one) say here that they have never found some bug after initial testing, when a project is sitting in some stage of QA or later - I would be amazed and impressed.

    Imagine not noticing an error, because it only made a difference of 0.02 cents in some report (let's say some "required" field was not being properly inserted or updated, or not all rows properly matched for a join). How embarrassing when that app goes into production, and a year down the line all your clients start telling you reports are wrong. THEN... go find the error.

    I'm sure there are some applications that don't need DRI applied (I personally - always feel nervous when updating or cleaning up an app that is built without it applied), but I suppose, each to his own. I will always use it though - it's like an extra safety net, that in the long run WILL save you, your app and your company time and money.

  • Sorry Steve - haven't got time to write a proper reply.  But - just in case everyone's going along with your view - let me say, as a long-time developer, I would only code RI or any business/data logic within SPs or UDFs - never as part of the structure of a database (and it's bad enough having to use SPs/UDFs!).

    (Never, ever, ever - got to go & lie down now!).

  • Assuming this last post is not just a wind-up then its a pity you havn't got time to write a proper reply because I would interested in hearing your justfication for "never" using RI in any database design you would ever produce.

  • This is the same debatecontext like for NULLs.

    OK, DRI is a pain in the but for developers, but it guarantees some data related business rules/needs.

    After all , it's "the business" we all are working for.

    It's always a blade that cuts on both sides !

    For the moment, one of the companies that I suffer a db from, implements everything nullable, with only one column not nullable i.e. the primary key.

    No DRI, no spc, no nothing...

    Most of the time this PK is not clustered ! (in the other situation, they forgot to alter it . )

    They claim to master it all in their applications.

    They'd better, because I'll not be able to help them out in case of casualties.

    Offcourse, this company works on a per-hour-base, and each and every "bug" means $$$.

    The fact that DRI will support performance, because one should by default also implement the DRI-indexes, is mostly disregarded ... Untill there are issues.

    It becomes very interesting, when you try to figure out how to join these data in your datawarehouse(s).

    Then it's datawarehouse that costs a lot, because they will have to investigate everything from 0 on.  

    "Best practises, are only there for the nerds" is a common attitude ! 

    Until companies get into consolidation/merger/takeover-situations, then the one with the best structure/documentation will survive, even if not in a 50-50 relation.

    In a 24/7 context you cannot make up time !

    99.9 or 99.999  ?

    Let's hope managers will have their eyes opened for midterm/long term solutions and insist for solid systems and data

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Your editorial repeatedly 'accepts' that DRI can only have a negative impact on performance. I certainly see that it can have a marginal negative impact in some cases and a significantly negative impact in other cases.

    However I thought (and I am certainly no expert here) that in some circumstances DRI often had a very significant benefit on performance in that the optimiser can use that information to produce much more efficicent query plans. For example, if a BIGINT is a FK to another table, but that other table only has 5 rows the optimiser can infer the distribution of values in the source column.

    Am I missing something here and have I been giving the optimiser too much credit?


    Regards,

    Mark

  • Sadly, not many descisions makers I've encountered fully support complete implemetation of DRI. In most cases they seem intimidated by it and will come up with some lame excuse not to implement it fully as part of the database development protocol.

    There is always the argument that the RI can be handled code-side but then again, which manager that doesn't see the benefits of DRI will also explicity employ the use of failsafe methods code-side?

    The dba\developer, however,will always be responsibile for the integrity and quality of data in the db and there is practically no other better method to insure this than DRI. There are cases when RI can be a pain but I'd rather find work arounds when the situation occurrs.

     

    Max

  • The optimizer mainly looks at the indexes, but you're right there can be positive performance as many db developers use the unique identifier as their PK\FK combinations and the PK usually has a clustered index due to the default way SQL assigns the clustered index. When the table size increases though this structure can become a hindrance.

    Max

  • I am a Coldfusion Developer for a University in Melbourne, Australia.

    I always start at the database side and implement DRI. I find that it actually assists my coding - not because I can get lazy in the application coding, but it acts as documentation (of sorts) to code against.

    I would love to have the time available to me that would see me develop the appropriate documentation prior to coding a single line (of either SQL or CFML/HTML/XML) - but in the real world that never happens - eveythig is always wanted yesterday and is a "must have", "make this your top priority" requirement. DRI ensures that if I DO miss some business logic, at least the back-end will have half an idea that something isn't quite right.

    Surely, the oldie - but goodie : Crap in - Crap out becomes more evident if you don't use DRI.

    As a coder first and DBA resting somewhere down the list at number 6/7 I certainly see it from the devlopers point of view too. The code is what's most important - the database is (normally) someone elses drama to deal with.


    Gavin Baumanis

    Smith and Wesson. The original point and click device.

  • I'm starting to wonder if some of the people in the "DRI is stupid" camp are mis-understanding what is meant by referential integrity. Some posters in the original thread seemed to think it was cascading RI that was being discussed, rather than PK/FK constraints.

    If not, then I'd also love to hear Davids justification!

  • About the only thing in the editorial today I can object to is the offhand remark that developers are lazy. As a developer, I would suggest that not using DRI actually shows those who don't must be damn energetic, since it tends to create more work over time. As someone who uses DRI in everything I can to make my data safer, I would prefer not to have these energetic folks lumped in with me and the other hard working lazy people.

    That silliness aside, with more than 2 decades of development under my hat, whenever I see a lack of DRI in a system I always find myself grumbling. The problem with such systems is that they tend to be hit and miss as to whether the data sets are actually reliable over time, and imposing DRI on such a system prior to a rebuild of its access modules is frequently agonizing. Presupposing that the integrity of the system will somehow magically survive years of code maintenance is foolishness, because seldom is the same mastermind around when the application needs a facelift or an extension.

    But I can see why some developers dislike DRI. It tends to make the data access layer somewhat boring, and even restrictive in cases where you need to insert in a specific order, etc. Every application I have ever built since I could have stored procedures in the background uses them, and I'm told that it makes the surface data access code frustratingly boring to use. Without the seat-of-the-pants excitement...well, it's just less fun. Don't we all want to risk billions of bits of data value regularly? (Yes, I am being sarcastic.)

    As for DRI's inefficiencies, that claim is problematic for me. It may be less efficient on a per transaction basis, but once you roll maintenance time and other factors in, it tends to be far more efficient in terms of overall resource use. 90% of all issues I'm dealing with in my mature applications go away with a poke or two at the stored procedures, UDFs, etc. When that can be done, efficiency is being embraced, because downtime is minimized.

    DRI is really about enforcing behaviour that maintains data value, and that has to be respected as a key part of our jobs.

  • And so, your code is flawless, always. And the next guy who writes some code against your system, he too doesn't need data integrity checks because his code is perfect. And the lady down the hall writing the DTS load package, she walks on water in between coding sessions, so her packages load the data with no flaws. And of course, your DBA, when presented with emergency hot fixes or data fixes that have to go into Production, now, Odin D--n it, now, will write, on the fly, flawless TSQL... That's pretty cool. Meanwhile, back in the real world...

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Both sollutions have good and bad things in it. I'll leave it that way. The line where you express your big concern is a point of discussion, however. If you write about proper database design and in the next line write about programmers not knowing or not willing to use a bit of code, than you talk about poor program design. When a application is design properly, it has only one entrance leading back to one database action. So, on that point, you raise an error to make clear the programmer forgets to check something.

  • I think data integrity is very important so I always put in primary key and foreign key.  It is part of the database design.  I also put in constraints in certain fields to make sure the correct data get put in. 

    It is all in database design.  Sometimes I think some SQL developers do not put in data integrity is because they don't have a good concept of database design.   Is it DBA's job to check the database design to ensure the integrity of the data?

  • Thank you, Frank.  I've worked on too many systems with both poor DBAs and Developers.  I think the same developers who dismiss DRI likely also design code with everything public, global and "If x=y then a=b".  Here's some pseudo code:

    If DRI Then 

       Developer = (Senior Or Junior)

    Else

       Developer = HiringError

    End If

    - Jeff -

Viewing 15 posts - 1 through 15 (of 59 total)

You must be logged in to reply to this topic. Login to reply