• sqlvogel (1/4/2012)


    From a SQL developer's perspective it is tempting to assume that data integrity should always be enforced by the database. Unfortunately SQL Server (like most SQL DBMSs) is only really effective for enforcing a few very simple data integrity rules - for example those that can be expressed as FOREIGN KEY constraints or "one row at a time" CHECK constraints. Other types of rules either aren't supported (e.g. mandatory cardinality "at least N" constraints) or require procedural code rather than being part of the data model (inclusion dependency and join dependency constraints).

    Relying on procedural Transact SQL to enforce complex rules or even simple rules has significant disadvantages. It's a basic tenet of business rules management that rules should be specified declaratively - something that just isn't achievable in SQL. Additionally: T-SQL doesn't integrate easily with other components outside the database; it doesn't support code reuse to the extent of other languages like C# or Java; it is weakly-typed; it can't easily be scaled out across multiple servers; it lacks support by developer tools (e.g. for modelling, static analysis, testing and instrumentation).

    To simplify development, maintenance and support it is therefore quite common for business rules to be centralised in a domain layer or in rules engines and data integration / data quality tools outside the database. That way rules can easily be shared, maintained and inspected by applications, tools and the people who support them. So the DBMS isn't always the preferred place to implement data integrity - but it's certainly a better place than embedding integrity rules separately into every application.

    The problem with that is that I've never yet seen it work. It seems to be in the category of "sounds great on paper, but it doesn't actually fly".

    My question on philosophies like that is always, "why bother with a fire extinguisher in the kitchen when you can call 911 and get a truck there to put the fire out?" In other words, have a data rules layer, on top of or integrated with the data access layer, but also enforce the rules at other levels, including the database.

    Yes, there are rules that the database can't readily enforce. Constraints across tables is one of the more obvious ones, where the only answer (triggers) is often worse than the problem it's meant to prevent. And there are rules you want to enforce WAY before the data ever gets sent to a table. You really don't want an SQL error if someone formats a date incorrectly in a form on a website. That should be handled much earlier in the process, even if only to avoid trips to and from the database for performance reasons.

    But there are also rules that really do need to be prevented in the database layer, even if "just in case", like a fire extinguisher. If you don't have a fire extinguisher in your kitchen, you've basically decided that your home burning down because of a grease-spill on the stove is an acceptable outcome. Not having data protection in the database is the same kind of thing.

    An eggregious example I ran into a couple of years ago was a company with a firm policy of never deleting any data or updating any data. They had a bit column in each table named "Active", which went along with a pseudo-ID. For each pseudo-ID, there was only ever supposed to be one row marked active, and all the rest were to have a 0 in that column. For example, customer 123 might have 20 or 30 rows in the Customers table, but only one of those rows would have a 1 in the Active column, the rest would have 0. Deletions would mark all the associated rows with a 0, and updates would insert a new row, with the new data, mark it active and mark the old row inactive. Sounds great on paper. They had all the rules for it built into a business rules layer, to make enforcement more efficient. After a year of it being in use, they were totally certain it was working perfectly until I did a simple Select ... Where Active = 1 Group By pseudo-ID Having Count(*) > 1, and found that some IDs had hundreds of "active" rows, a couple even had thousands, and the percentage of error was simply huge. Because of this, management for that company had been basing key decisions on seriously flawed data, and nobody could figure out why reality never matched the reports very well at all.

    Had the data been protected at every level, instead of one, it's highly probable that this scenario could have been prevented. It would have been easy to prevent, but they saw that as unnecessary because the business rule layer would take care of it. Kind of like the Titanic not having lifeboats, a double-hull, or sealed compartments, because the barrier walls would "take care of it".

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon