Is Data Integrity really important or not

Question

Is Data Integrity really important or not

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 1

Interesting thoughts in Chapter 6

http://www.red-gate.com/specials/Ebooks/Defensive_Database_Programming.pdf

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

doobya (7/24/2010)
because the proc had to select the user account to check the balance already
the foreign key constraint was unnecessary
a single check constraint is quick but that isn't a common scenario
checking many check and foreign key constraints which will never return false
is a waste of resources - if you are running servers at 80% that extra could
push you over into scaling up - in that case looking at 5 figure sums coming out
of my pocket!
set @var = 1
if @var <> 1 raiserror('no point in this line existing however cheap it is', 16, 1)

You must have better coders and more time than the devs I keep running into. I've always implemented constraints to keep me, my devs and data analysts from getting sloppy.. That's always been the thought with front-end apps: one slip-up and poof there goes data integrity, and often enough, with no way to get it back.

And no - it's not real flexible (it's not supposed to be). You either require a relation to be present, or - you don't.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

DataDog Hall of Fame Points: 3305 More actions · Answer 3

I've always implemented constraints to keep me, my devs and data analysts from getting sloppy

yes - especially when you are evolving the application (and doubly so when making online updates)

anyway servers are much cheaper now and I haven't used that unchecked approach in years

i think there is alot more to data integrity than db level stuff - sanity checking, fraud control, the kind of checks that look at the bigger picture

no point in implementing loads of checks if the system transfers £1,000,000 to complete strangers in 10 minutes because of a logic error

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 4

doobya (7/24/2010)no point in implementing loads of checks if the system transfers £1,000,000 to complete strangers in 10 minutes because of a logic error

:w00t: interesting approach.

Shall I understand you are proposing to do sloppy data modeling and sloppy physical implementation just because you assume - probably correctly - that developers are going to do a slopply job at programming time?

If this is the path of the industry in the years to come it is better for me to find a nice place to retire - the sooner the better :doze:

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

doobya (7/24/2010)
I've always implemented constraints to keep me, my devs and data analysts from getting sloppy
yes - especially when you are evolving the application (and doubly so when making online updates)
anyway servers are much cheaper now and I haven't used that unchecked approach in years
i think there is alot more to data integrity than db level stuff - sanity checking, fraud control, the kind of checks that look at the bigger picture
no point in implementing loads of checks if the system transfers £1,000,000 to complete strangers in 10 minutes because of a logic error

I completely agree that there are more levels of data integrity than what you implement in the database layer, but it's the only place to implement some type of validation. It's just too easy to bypass relational integrity in a front-end app, or mis-implement it. One WTF and you have orphans you can't even reconstitute.

I for one am a big fan of multiple levels of checks, implemented in all layers.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

DataDog Hall of Fame Points: 3305 More actions · Answer 6

I had a look at that ebook on defensive programming and the first thing I read was shocking:

ALTER PROCEDURE dbo.SetEmployeeManager

@EmployeeID INT ,

@ManagerID INT

AS

SET NOCOUNT ON ;

UPDATE dbo.Employee

SET ManagerID = @ManagerID

WHERE EmployeeID = @EmployeeID ;

followed by "As long as EmployeeID is the primary key on the dbo.Employee table, this procedure will work correctly."

I would shoot myself if I wrote that!

What if @EmployeeID does not exist? no error!

if you expect a single row to be updated you have to check:

if @@rowcount <> 1 raiserror('oh no', 16, 1)

doesn't everybody do that (sort of thing) ?

[edit] he was illustrating another point - not a good example - he has a section on @@rowcount later

Steve Jones - SSC Editor SSC Guru Points: 734664 More actions · Answer 7

I think doobya's approach can save performance if you do app checking since it's more scalable. The DB is a single, shared resource, and anytime you can move cycles off that machine to the application servers(s), it makes more sense. If you are doing this in stored procs, I don't see any performance benefit, and I might argue that you ought to be validating/submitting the info needed from the app anyway.

However, this only works if you have extremely tight control over the application interfaces, AND users cannot use something like Excel/Access to hit the database. If they can, you have to keep RI in the db.

I'd also say that if your application evolves to use something like Excel, it is going to be a pain to retrofit RI back in, though it shouldn't matter. If it does, then you didn't have all the checks you thought you did.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 8

Steve Jones - Editor (7/25/2010)
I think doobya's approach can save performance if you do app checking since it's more scalable. The DB is a single, shared resource, and anytime you can move cycles off that machine to the application servers(s), it makes more sense. If you are doing this in stored procs, I don't see any performance benefit, and I might argue that you ought to be validating/submitting the info needed from the app anyway.
However, this only works if you have extremely tight control over the application interfaces, AND users cannot use something like Excel/Access to hit the database. If they can, you have to keep RI in the db.
I'd also say that if your application evolves to use something like Excel, it is going to be a pain to retrofit RI back in, though it shouldn't matter. If it does, then you didn't have all the checks you thought you did.

I would take this a step further. If the only way a user can access the database is through the app - then it may be fine.

What if there happens to be a data integrity question? The dba will need to research the issue and often that is done through SSMS. When that investigation happens in SSMS and there is no RI, the DBA is inevitably going to take more time to ensure his findings are correct.

I think there is need to have RI in the database. If the devs want to use RI in the app, implement it in the form of business rules etc. That is fine.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

DataDog Hall of Fame Points: 3305 More actions · Answer 9

giving users SELECT access opens a big can of worms

related to read locks interfering with normal OLTP operations

I either give them access via stored procedure (that releases locks asap by using temp table)

or give them access to hot standby and they accept 5 minutes latency

So far I found SQL replication way too fragile for production use - maybe that is just me?

it also causes big headaches with 24/7 operations as you can't make online schema changes

I read somewhere that indexed views somehow solve the adhoc query locking problem

but I don't believe that to be true - those indexes still need to be locked in the face of reads and writes - just like anything else

Overall I agree that more constraints are better - but I am a software engineer, not a robot clinging to a book of rules - so I like to consider the pros and cons of many different approaches

Steve Jones - SSC Editor SSC Guru Points: 734664 More actions · Answer 10

I fail to see how read locks to a temp table or to a user are substantially different. Perhaps you can list? There are huge sites, places like MySpace, the NASDAQ, that run OLTP operations and they handle queries and selects fine. The SQL CAT team, AFAIK, has never given guidance that this is an issue, nor that RI has an issue and they listed their largest customer as one that does 40,000 inserts/sec

Replication has not been too fragile for me, and online schema changes can be made. Are you on v7/2000? There were substantial limitations in those versions.

I think that you might have had valid reasons for doing things the way you have, but I might suggest that you look to increase your knowledge and learn how some of these things work. RI isn't an issue, nor are temp tables necessarily the cure for locking/blocking issues.

DataDog Hall of Fame Points: 3305 More actions · Answer 11

Steve Jones - Editor (7/25/2010)
I fail to see how read locks to a temp table or to a user are substantially different

[edit]

ok I see where I am going wrong now - of course the output is buffered - the only time there is sync between query execution and client

is when output buffer is full

I will my refresh my locking knowledge ... maybe I am going wrong somewhere

what happened to dbcc traceon (1200,3604) ? doesn't seem to work in SQL 2005 ?

[EDIT] dbcc traceon (1200,3604,-1) works ok

but is this not correct:

if you need to select a consistent set of records, and therefore need to use isolation > read committed

the locks will be synchronized with the sql client reading the data - if the app is processing the rows slowly - locks are held longer

so select into temp and then select from temp after transaction has committed

I don't think SQL Server buffers selected records by default?

(select under read committed should be ok as only brief row locks are held)

(also you have no way of stopping users from specifying "with (tablock)" or writing ridiculous queries)

Steve Jones - Editor (7/25/2010)
Replication has not been too fragile for me, and online schema changes can be made. Are you on v7/2000? There were substantial limitations in those versions

Replication: I last used it in SQL 2000 and was not impressed - haven't look at it since

ferasdeeb SSC Veteran Points: 281 More actions · Answer 12

It is not always a good idea to count on the application in order to ensure data integrity.

In my opinion it is very important to do both DB and application data integrity, especially when most of the IT companies or the development teams have both DB admins and software developers, and in most cases each of these people have their own responsibilities to ensure integrity.

Another thing, many developers currently use content management systems for application building, in such cases these development environments mostly base on the rules and checks entered on the DB level for application integrity as well.

I hope that these ideas might be helpful.

Best,

Feras Dib.

Steve Jones - SSC Editor SSC Guru Points: 734664 More actions · Answer 13

Buffering isn't something I've typically worried about. Are you selecting millions of records? If not, I wouldn't be concerned here. The default isolation works fine.

Replication was a little shaky in SQL 2000, but it has vastly improved.