Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Voluntary Constraint? Expand / Collapse
Author
Message
Posted Tuesday, January 3, 2012 12:44 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
I've never yet seen an application correctly validate all the data being fed in through it. Most? Sure. All? No way.

Just ran into a situation where "the data format is controlled by the front end" was the policy. But someone using a browser with Java scripting turned off managed to bypass all the data validation and put junk data that violated almost every business rule in the place.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1229521
Posted Tuesday, January 3, 2012 9:59 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, December 21, 2012 1:46 AM
Points: 28, Visits: 59
Just noticed your by-line ...for "Row-By-Agonizing-Row"
Don't you just "curse OR" rant when that happens?
Post #1229655
Posted Wednesday, January 4, 2012 6:16 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 1:02 AM
Points: 450, Visits: 3,413
From a SQL developer's perspective it is tempting to assume that data integrity should always be enforced by the database. Unfortunately SQL Server (like most SQL DBMSs) is only really effective for enforcing a few very simple data integrity rules - for example those that can be expressed as FOREIGN KEY constraints or "one row at a time" CHECK constraints. Other types of rules either aren't supported (e.g. mandatory cardinality "at least N" constraints) or require procedural code rather than being part of the data model (inclusion dependency and join dependency constraints).

Relying on procedural Transact SQL to enforce complex rules or even simple rules has significant disadvantages. It's a basic tenet of business rules management that rules should be specified declaratively - something that just isn't achievable in SQL. Additionally: T-SQL doesn't integrate easily with other components outside the database; it doesn't support code reuse to the extent of other languages like C# or Java; it is weakly-typed; it can't easily be scaled out across multiple servers; it lacks support by developer tools (e.g. for modelling, static analysis, testing and instrumentation).

To simplify development, maintenance and support it is therefore quite common for business rules to be centralised in a domain layer or in rules engines and data integration / data quality tools outside the database. That way rules can easily be shared, maintained and inspected by applications, tools and the people who support them. So the DBMS isn't always the preferred place to implement data integrity - but it's certainly a better place than embedding integrity rules separately into every application.


David
Post #1229863
Posted Wednesday, January 4, 2012 7:38 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
sqlvogel (1/4/2012)
From a SQL developer's perspective it is tempting to assume that data integrity should always be enforced by the database. Unfortunately SQL Server (like most SQL DBMSs) is only really effective for enforcing a few very simple data integrity rules - for example those that can be expressed as FOREIGN KEY constraints or "one row at a time" CHECK constraints. Other types of rules either aren't supported (e.g. mandatory cardinality "at least N" constraints) or require procedural code rather than being part of the data model (inclusion dependency and join dependency constraints).

Relying on procedural Transact SQL to enforce complex rules or even simple rules has significant disadvantages. It's a basic tenet of business rules management that rules should be specified declaratively - something that just isn't achievable in SQL. Additionally: T-SQL doesn't integrate easily with other components outside the database; it doesn't support code reuse to the extent of other languages like C# or Java; it is weakly-typed; it can't easily be scaled out across multiple servers; it lacks support by developer tools (e.g. for modelling, static analysis, testing and instrumentation).

To simplify development, maintenance and support it is therefore quite common for business rules to be centralised in a domain layer or in rules engines and data integration / data quality tools outside the database. That way rules can easily be shared, maintained and inspected by applications, tools and the people who support them. So the DBMS isn't always the preferred place to implement data integrity - but it's certainly a better place than embedding integrity rules separately into every application.


The problem with that is that I've never yet seen it work. It seems to be in the category of "sounds great on paper, but it doesn't actually fly".

My question on philosophies like that is always, "why bother with a fire extinguisher in the kitchen when you can call 911 and get a truck there to put the fire out?" In other words, have a data rules layer, on top of or integrated with the data access layer, but also enforce the rules at other levels, including the database.

Yes, there are rules that the database can't readily enforce. Constraints across tables is one of the more obvious ones, where the only answer (triggers) is often worse than the problem it's meant to prevent. And there are rules you want to enforce WAY before the data ever gets sent to a table. You really don't want an SQL error if someone formats a date incorrectly in a form on a website. That should be handled much earlier in the process, even if only to avoid trips to and from the database for performance reasons.

But there are also rules that really do need to be prevented in the database layer, even if "just in case", like a fire extinguisher. If you don't have a fire extinguisher in your kitchen, you've basically decided that your home burning down because of a grease-spill on the stove is an acceptable outcome. Not having data protection in the database is the same kind of thing.

An eggregious example I ran into a couple of years ago was a company with a firm policy of never deleting any data or updating any data. They had a bit column in each table named "Active", which went along with a pseudo-ID. For each pseudo-ID, there was only ever supposed to be one row marked active, and all the rest were to have a 0 in that column. For example, customer 123 might have 20 or 30 rows in the Customers table, but only one of those rows would have a 1 in the Active column, the rest would have 0. Deletions would mark all the associated rows with a 0, and updates would insert a new row, with the new data, mark it active and mark the old row inactive. Sounds great on paper. They had all the rules for it built into a business rules layer, to make enforcement more efficient. After a year of it being in use, they were totally certain it was working perfectly until I did a simple Select ... Where Active = 1 Group By pseudo-ID Having Count(*) > 1, and found that some IDs had hundreds of "active" rows, a couple even had thousands, and the percentage of error was simply huge. Because of this, management for that company had been basing key decisions on seriously flawed data, and nobody could figure out why reality never matched the reports very well at all.

Had the data been protected at every level, instead of one, it's highly probable that this scenario could have been prevented. It would have been easy to prevent, but they saw that as unnecessary because the business rule layer would take care of it. Kind of like the Titanic not having lifeboats, a double-hull, or sealed compartments, because the barrier walls would "take care of it".


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1229931
Posted Wednesday, January 4, 2012 7:51 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 1:02 AM
Points: 450, Visits: 3,413
GSquared (1/4/2012)
For each pseudo-ID, there was only ever supposed to be one row marked active, and all the rest were to have a 0 in that column.


How would you enforce that in SQL Server? I'm pretty sure it can't be done reliably with CHECK constraints. Indexed views and filtered unique indexes won't stop you updating all rows to be Active=0. Would you use triggers?


David
Post #1229938
Posted Wednesday, January 4, 2012 1:41 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Saturday, October 11, 2014 8:18 PM
Points: 831, Visits: 1,588
it is tempting to assume that data integrity should always be enforced by the database


Pretty sure that's not the argument. I wouldn't use the word "always". Merely "where possible". If column_A in table_B should only sensibly take values between 1 and 10, but within that should only be a 1 when certain combinations of data in other tables exist, should you throw out the constraint on column_A altogether because it's not sophisticated enough?

[Changing tack]
Of course, if I'm a really clever developer, I'll employ super-flexible "multi-purpose" tables where a lot of columns are set to varchar because they need to accommodate a variety of data types. Then I have a really strong argument to throw out constraints... don't I? I mean... it's the constraints that are the problem, not my design... surely.




One of the symptoms of an approaching nervous breakdown is the belief that one's work is terribly important.
Bertrand Russell
Post #1230297
Posted Wednesday, January 4, 2012 2:02 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
sqlvogel (1/4/2012)
GSquared (1/4/2012)
For each pseudo-ID, there was only ever supposed to be one row marked active, and all the rest were to have a 0 in that column.


How would you enforce that in SQL Server? I'm pretty sure it can't be done reliably with CHECK constraints. Indexed views and filtered unique indexes won't stop you updating all rows to be Active=0. Would you use triggers?


All rows set to 0 = row deleted, in this kind of schema. Nothing wrong with that. Sorry for the ambiguity in the statement. Should have been:

For each pseudo-ID, there was only ever supposed to be one row marked active, and all the rest were to have a 0 in that column, or all rows marked 0 for "deleted" data.


That's kind of mentioned later in my post, but not clearly enough.

Filtered unique index was the solution I recommended to them.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1230320
Posted Wednesday, January 4, 2012 11:40 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, December 21, 2012 1:46 AM
Points: 28, Visits: 59
Thanks, a blindingly obvious solution, only you need to know about it. I didn't know you could filter the index because the last time I REALLY looked at an index, you couldn't and didn't encounter one using a where clause since then.

Since their cleanup process was obviously flawed, did they implement the suggestion and did they run into problems with the process if they did? Since the data is not unique to begin with how did you handle creating the index? Create a new table with the index and copy into it, then replace the table. Determine the unique rows, set everything else to 0, when it fails to create, try again?

Looked up something: The Database Engine does not allow creating a unique index on columns that already include duplicate values, whether or not IGNORE_DUP_KEY is set to ON. If this is tried, the Database Engine displays an error message. Duplicate values must be removed before a unique index can be created on the column or columns.

It can't be clustered on a filtered index can it?
Post #1230501
Posted Thursday, January 5, 2012 1:34 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 10:22 AM
Points: 588, Visits: 2,555
@GSquared That's a really good example of where constraints would have saved a catastrophe. The basic design wasn't at fault, I reckon, but the implementation sure was! Thinking of your description...
they were totally certain it was working perfectly until I did a simple Select ... Where Active = 1 Group By pseudo-ID Having Count(*) > 1,

.. I put into every database I write an 'impossible' stored procedure that runs on the schedule (daily) and at startup, double-checking all these rules across the database, and emailing me if anything fails a check. As well as making sure I sleep at night, the code is a very handy reminder of some of the more arcane domain rules in the database. Has it ever emailed me? er.. Yes. The impossible always happens in IT.

@Ken There was a great article by Alex Kuznetsov here Yet another SQL Strategy for Versioned Data that explains this technique using filtered indexes that can be used for all sorts of temporal databases.



Best wishes,

Phil Factor
Simple Talk
Post #1230535
Posted Thursday, January 5, 2012 6:26 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
Ken Lee-263418 (1/4/2012)
Thanks, a blindingly obvious solution, only you need to know about it. I didn't know you could filter the index because the last time I REALLY looked at an index, you couldn't and didn't encounter one using a where clause since then.

Since their cleanup process was obviously flawed, did they implement the suggestion and did they run into problems with the process if they did? Since the data is not unique to begin with how did you handle creating the index? Create a new table with the index and copy into it, then replace the table. Determine the unique rows, set everything else to 0, when it fails to create, try again?

Looked up something: The Database Engine does not allow creating a unique index on columns that already include duplicate values, whether or not IGNORE_DUP_KEY is set to ON. If this is tried, the Database Engine displays an error message. Duplicate values must be removed before a unique index can be created on the column or columns.

It can't be clustered on a filtered index can it?


Clean up of the existing data would have been easy enough. For each ID with multiple actives, keep the one with the latest data (there were easy ways to determine that based on audit trails), and set the rest to 0. Then add the filtered unique index to prevent future issues.

Would cause whatever flawed code was creating the duplicates in the first place to start throwing errors. But that's a good thing long-term, because they had no faintest clue what code was causing the violations, so errors in production were the only way they'd ever find it.

I don't know if they implemented my recommendations or not, since I left that place before anything was done on that.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1230692
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse