Is Data Integrity really important or not

  • Interesting thoughts in Chapter 6

    http://www.red-gate.com/specials/Ebooks/Defensive_Database_Programming.pdf

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • doobya (7/24/2010)


    because the proc had to select the user account to check the balance already

    the foreign key constraint was unnecessary

    a single check constraint is quick but that isn't a common scenario

    checking many check and foreign key constraints which will never return false

    is a waste of resources - if you are running servers at 80% that extra could

    push you over into scaling up - in that case looking at 5 figure sums coming out

    of my pocket!

    set @var = 1

    if @var <> 1 raiserror('no point in this line existing however cheap it is', 16, 1)

    You must have better coders and more time than the devs I keep running into. I've always implemented constraints to keep me, my devs and data analysts from getting sloppy.. That's always been the thought with front-end apps: one slip-up and poof there goes data integrity, and often enough, with no way to get it back.

    And no - it's not real flexible (it's not supposed to be). You either require a relation to be present, or - you don't.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I've always implemented constraints to keep me, my devs and data analysts from getting sloppy

    yes - especially when you are evolving the application (and doubly so when making online updates)

    anyway servers are much cheaper now and I haven't used that unchecked approach in years

    i think there is alot more to data integrity than db level stuff - sanity checking, fraud control, the kind of checks that look at the bigger picture

    no point in implementing loads of checks if the system transfers £1,000,000 to complete strangers in 10 minutes because of a logic error

  • doobya (7/24/2010)no point in implementing loads of checks if the system transfers £1,000,000 to complete strangers in 10 minutes because of a logic error

    :w00t: interesting approach.

    Shall I understand you are proposing to do sloppy data modeling and sloppy physical implementation just because you assume - probably correctly - that developers are going to do a slopply job at programming time?

    If this is the path of the industry in the years to come it is better for me to find a nice place to retire - the sooner the better :doze:

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • doobya (7/24/2010)


    I've always implemented constraints to keep me, my devs and data analysts from getting sloppy

    yes - especially when you are evolving the application (and doubly so when making online updates)

    anyway servers are much cheaper now and I haven't used that unchecked approach in years

    i think there is alot more to data integrity than db level stuff - sanity checking, fraud control, the kind of checks that look at the bigger picture

    no point in implementing loads of checks if the system transfers £1,000,000 to complete strangers in 10 minutes because of a logic error

    I completely agree that there are more levels of data integrity than what you implement in the database layer, but it's the only place to implement some type of validation. It's just too easy to bypass relational integrity in a front-end app, or mis-implement it. One WTF and you have orphans you can't even reconstitute.

    I for one am a big fan of multiple levels of checks, implemented in all layers.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I had a look at that ebook on defensive programming and the first thing I read was shocking:

    ALTER PROCEDURE dbo.SetEmployeeManager

    @EmployeeID INT ,

    @ManagerID INT

    AS

    SET NOCOUNT ON ;

    UPDATE dbo.Employee

    SET ManagerID = @ManagerID

    WHERE EmployeeID = @EmployeeID ;

    followed by "As long as EmployeeID is the primary key on the dbo.Employee table, this procedure will work correctly."

    I would shoot myself if I wrote that!

    What if @EmployeeID does not exist? no error!

    if you expect a single row to be updated you have to check:

    if @@rowcount <> 1 raiserror('oh no', 16, 1)

    doesn't everybody do that (sort of thing) ?

    [edit] he was illustrating another point - not a good example - he has a section on @@rowcount later

  • I think doobya's approach can save performance if you do app checking since it's more scalable. The DB is a single, shared resource, and anytime you can move cycles off that machine to the application servers(s), it makes more sense. If you are doing this in stored procs, I don't see any performance benefit, and I might argue that you ought to be validating/submitting the info needed from the app anyway.

    However, this only works if you have extremely tight control over the application interfaces, AND users cannot use something like Excel/Access to hit the database. If they can, you have to keep RI in the db.

    I'd also say that if your application evolves to use something like Excel, it is going to be a pain to retrofit RI back in, though it shouldn't matter. If it does, then you didn't have all the checks you thought you did.

  • Steve Jones - Editor (7/25/2010)


    I think doobya's approach can save performance if you do app checking since it's more scalable. The DB is a single, shared resource, and anytime you can move cycles off that machine to the application servers(s), it makes more sense. If you are doing this in stored procs, I don't see any performance benefit, and I might argue that you ought to be validating/submitting the info needed from the app anyway.

    However, this only works if you have extremely tight control over the application interfaces, AND users cannot use something like Excel/Access to hit the database. If they can, you have to keep RI in the db.

    I'd also say that if your application evolves to use something like Excel, it is going to be a pain to retrofit RI back in, though it shouldn't matter. If it does, then you didn't have all the checks you thought you did.

    I would take this a step further. If the only way a user can access the database is through the app - then it may be fine.

    What if there happens to be a data integrity question? The dba will need to research the issue and often that is done through SSMS. When that investigation happens in SSMS and there is no RI, the DBA is inevitably going to take more time to ensure his findings are correct.

    I think there is need to have RI in the database. If the devs want to use RI in the app, implement it in the form of business rules etc. That is fine.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • giving users SELECT access opens a big can of worms

    related to read locks interfering with normal OLTP operations

    I either give them access via stored procedure (that releases locks asap by using temp table)

    or give them access to hot standby and they accept 5 minutes latency

    So far I found SQL replication way too fragile for production use - maybe that is just me?

    it also causes big headaches with 24/7 operations as you can't make online schema changes

    I read somewhere that indexed views somehow solve the adhoc query locking problem

    but I don't believe that to be true - those indexes still need to be locked in the face of reads and writes - just like anything else

    Overall I agree that more constraints are better - but I am a software engineer, not a robot clinging to a book of rules - so I like to consider the pros and cons of many different approaches

  • I fail to see how read locks to a temp table or to a user are substantially different. Perhaps you can list? There are huge sites, places like MySpace, the NASDAQ, that run OLTP operations and they handle queries and selects fine. The SQL CAT team, AFAIK, has never given guidance that this is an issue, nor that RI has an issue and they listed their largest customer as one that does 40,000 inserts/sec

    Replication has not been too fragile for me, and online schema changes can be made. Are you on v7/2000? There were substantial limitations in those versions.

    I think that you might have had valid reasons for doing things the way you have, but I might suggest that you look to increase your knowledge and learn how some of these things work. RI isn't an issue, nor are temp tables necessarily the cure for locking/blocking issues.

  • Steve Jones - Editor (7/25/2010)


    I fail to see how read locks to a temp table or to a user are substantially different

    [edit]

    ok I see where I am going wrong now - of course the output is buffered - the only time there is sync between query execution and client

    is when output buffer is full

    I will my refresh my locking knowledge ... maybe I am going wrong somewhere

    what happened to dbcc traceon (1200,3604) ? doesn't seem to work in SQL 2005 ?

    [EDIT] dbcc traceon (1200,3604,-1) works ok

    but is this not correct:

    if you need to select a consistent set of records, and therefore need to use isolation > read committed

    the locks will be synchronized with the sql client reading the data - if the app is processing the rows slowly - locks are held longer

    so select into temp and then select from temp after transaction has committed

    I don't think SQL Server buffers selected records by default?

    (select under read committed should be ok as only brief row locks are held)

    (also you have no way of stopping users from specifying "with (tablock)" or writing ridiculous queries)

    Steve Jones - Editor (7/25/2010)


    Replication has not been too fragile for me, and online schema changes can be made. Are you on v7/2000? There were substantial limitations in those versions

    Replication: I last used it in SQL 2000 and was not impressed - haven't look at it since

  • It is not always a good idea to count on the application in order to ensure data integrity.

    In my opinion it is very important to do both DB and application data integrity, especially when most of the IT companies or the development teams have both DB admins and software developers, and in most cases each of these people have their own responsibilities to ensure integrity.

    Another thing, many developers currently use content management systems for application building, in such cases these development environments mostly base on the rules and checks entered on the DB level for application integrity as well.

    I hope that these ideas might be helpful.

    Best,

    Feras Dib.

  • Buffering isn't something I've typically worried about. Are you selecting millions of records? If not, I wouldn't be concerned here. The default isolation works fine.

    Replication was a little shaky in SQL 2000, but it has vastly improved.

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply