DRI or No DRI?

  • Comments posted to this topic are about the item DRI or No DRI?

  • Sometimes I use DRI but most of the time I do not.

    When DRI is used, it makes truncating tables difficult. You need to do them in a very specific order. Also, when adding test data, or importing data from other sources, it also has to be installed in a specific order.

    I cannot tell you how many times I get data from an SSIS package and the data at the source is not up to par. If I turn on DRI, the entire package will fail and none of the tables will load. We at least need to get the data that is available and we can work with the small quantity of errors in our reports because it only represents a small sample.

  • I would argue that DRI is important when you have bad data.

    Packages ought to be designed to work with DRI, and if there are source problems, move error rows to staging tables to be dealt with. I know it's not always possible, but that ought to be the way you think about building software.

    In terms of clearing tables, the ordering should matter. If you were to remove a parent first, then it might end up causing issues with orphans if you removed the wrong rows.

  • Steve Jones - SSC Editor (7/14/2011)


    I would argue that DRI is important when you have bad data.

    Packages ought to be designed to work with DRI, and if there are source problems, move error rows to staging tables to be dealt with. I know it's not always possible, but that ought to be the way you think about building software.

    In terms of clearing tables, the ordering should matter. If you were to remove a parent first, then it might end up causing issues with orphans if you removed the wrong rows.

    I repopulate tables on a daily basis. First, they get truncated, then they get populated. So it doesn't matter in the end which ones get truncated first because they will ALL eventually get truncated.

    When populating the data, we don't know which records are bad until we get the entire data set. There are other problems unrelated to DRI that occur such as duplicate records.

    That can happen due to the frequency of the source data being updated. Perhaps the detail table gets updated more frequently than the other tables and therefore have additional records that do not match the parent -- yet. It will but could be in a couple days from now. Imagine a company where you have HR data (employee IDs and names) and you have a second system that contains training information. You could have a new employee taking training and their ID has not yet entered into the HR dataset yet. Should those records be removed? Not in our case -- we still need to report that information. When an employee gets terminated and are removed from the HR tables, should all the tracking that is associated with that person get removed too. The answer in our case is no.

    The reporting environment is critical and the SSIS takes 10 hours to complete. If that fails, we will miss an entire day of productivity and that is unacceptable. I'm not going to risk that day on a single record that didn't match something in the parent table.

  • I'm with cengland0 here. For ETL populating a data warehouse, I do not use DRI. I believe the ETL itself is responsible for the RI and using foreign keys just slows the ETL down. I would them use though on normalized databases.

    Regarding Steve's comment about moving error rows to staging tables, I would rather use the Lookup No Match to redirect rows to an error table. This means that all the data that eventually does get inserted is already "checked", so why would I use an FK to check it again?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I write a lot of code, including sql code, and I'm not a dba all thou I've done much of what dba's does from time to time. I am a developer.

    I like to have DRI and also code check. Suck on that one 😉

    My reason for this is that it does not take all that time if you know what your doing and the quality of the product is ensured to have a higher standard which also means less time needed to support it and maintain it.

  • Not using DRI and instead relying on applications to ensure business rules feels to me like not wearing underwear: you might be able to get away with it most of the time but it's extra embarrassing when you are caught with your pants down. As a DBA I wouldn't trust the developer(s) with my stewardship responsibility (or reputation). As a Developer I'd hope that the DBA protects us from a possibly too-narrow view of today's problem. When we have to wear both hats, I suggest defensive planning at every tier.

    In the same way security considerations need to be built in from the ground up, data integrity is (should be) everyone's concern. A user should be constrained to potentially damage only their own record. Data entry operators could potentially damage hundreds of records but the application should be recording those actions so there is some isolation/containment option. Application developers have the potential of wrecking thousands of records in an instant by deploying apps that have logic problems (runtime errors) or do not properly constrain operator actions. DBAs have the potential to destroy whole systems of data in less time than it takes to say "oops." DRI is an available tool to help prevent the inevitable human error from corrupting the data.

    * This analogy provides some insight and context for the term "CYA" 🙂

  • I'll try hard not to take the slam at developers personally. 🙁

    Regardless of one's take on DRI, there's no cause for personally alienating an entire group of I.T. professionals. There are lazy people in every aspect of life. Most of the developers I know personally work very hard and keep up with technology on a regular basis. The truly lazy ones don't last very long.

  • The last application I worked on had some DRI but a lot that wasn't. Personally, dealing with the headaches that came from references that weren't declared (and seeing headaches other people went through documenting and having legal review wording describing the repercussions) far more time got wasted due to the fact that the references weren't declared. While performance was a big concern with our product it was more lazyness that caused references to not be declared. And tables where a column could reference several different tables depending on on the value in another column.

  • DRI should be the default, with very rare exceptions that need to be proven as exceptional 😉

    I've been burned too many times, when the sh!t hits the fan and someone in upper management is screaming because their data is garbage.They arent talking to the developers, they are talking to the DATAbase Admins.

    If your data is important to you, performance issues due to dri are easily solved. If the data is not important enough to care about it being mangled, well thats what Access is for 😉 The devs can cry all they want, but I've proven to managment the value of a real dba.

  • In over 10 years I have never found a reason to use DRI, and I probably never will. I have never had 1 single problem without using it. However I have had problems with it, but only related to replication. For my projects I learned it was a waste of time and effort and I eventually learned that it didn't seem to make much sense to put what I consider as logic in the database when it really truely belongs in the application logic. IMHO

  • I'm glad you've never been bitten by that. From working on a dev project where updates and inserts for a table came from multiple locations and there were probably 20-30 or more developers working on the application it's easy for things to slip through the crack if they're not done in a centralized location.

  • I'm working on a large data warehouse project so data quality issues are my world at present.

    Data quality in any shape or form is everyones responsibility. It doesn't matter if you are a DBA, developer, data provider, data consumer. If you can't trust the data then you don't have a data asset.

    Hats off to the guy who said he had DRI checks in his code AND the DB.

    http://www.sqlservercentral.com/articles/Advanced/thepitfallsofforeignkeys/2290/

    Where you have a system with complex user interactions then doing without DRI is suicidal. If you have a defined and well controlled path through your data then after serious heavy weight testing you can choose whether to risk doing without DRI.

    My data warehouse project is trying to integrate 60+ lines of business that function almost as separate companies into a unified whole (or hole). The referential integrity, domain integrity and sheer amount of bad data beggars belief. Anyone who seriously believes that all data quality should be handled by the front-end application is delusional, I have the proof.

  • We call them Relational Database Management Systems for a reason.

    I have yet to see the M$ SQL environment where related Tables joined faster with no defined PK/FK relationships. 😛

    Why do so many third party apps not have them?

    Most likely the same reason they have no Indexes and no PK set on the tables.

    They were writen by someone that are used to DataSet objects automagicly creating Row or Record_Id's for them. 😎

    Shhh... don't let every developer on the planet know that good application performance starts with good DB design. If they can't read Cobb et al or take a class and listen, then they deserve to pay someone like me when thier application takes an hour to update a single object record after being used for a year. :smooooth:

  • I came to the DBA party a bit late, and I'm not sure that I would really be considered a DBA. I think of myself first and foremost as a database developer. I design the database structures, I write the stored procedures and views, the reports, the data access layers, the business rule layers, and the various user and scripting interfaces. I'm also in charge of performance, backups, and the various and sundry maintenance tasks that keep the server actually serving up data.

    When I started studying relational databases, I was blown away. That rule that says no detail records without a master record? 20 seconds or less to implement and I was done. Forever. Sure, I still had to have proper error handling bubbling up to the application layer, but that had to be done no matter how I handled referential integrity. And nobody can ever break the RI rules, not even me (well, I know how to turn them off or bypass them, but my own rules prohibit such actions).

    I've had to import all kinds of data from all kinds of sources and I've never yet considered the DRI as anything other than beneficial. And I don't mean in the "it's for your own good" way. I mean that it simplifies and streamlines the import of valid data and the actual business processes that generate new data. If nobody can make sense of the data before it goes into the database, they'll certainly never make sense of it after. I've never been slowed down by the fact that I have to import a master record before a detail record and my database diagrams make it abundantly clear which order has to be followed. If people can't give me a complete set of data, then it doesn't go into the database and that's exactly the way management wants it. The data has to be cleaned up and validated at some point, and my experience is that it's far easier to do that before it hits the database. I find that once everybody understands what it takes to get the data ready for import, they build processes to support that, which makes everybody's life easier by streamlining the whole system.

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply