QA for Data

  • Comments posted to this topic are about the item QA for Data

  • We have stringent rules at work for the promotion of all code. It doesn't change at all just because there's a "data emergency" of the nature you speak except for the urgency. No one is authorized to make changes to data in production without it running the gauntlet. Not even me and I'm the DBA for the company.

    I will also state that we're well practiced at such things and, for real emergencies, the problem can be flown through all the required steps in very short order when a "drop everything and fix it now" emergency is authorized. There's no excuse for cowboys even in the face of an "emergency urgency". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We found the best way to do this is to :

    1) Have a change request logged which includes the issue, the script to correct and a backup of the data. Unless this is a known fix a 'show of hands' is required from at least two teams to agree on the change.

    2) I have a separate database on my production servers which I use to record changes which are made. I made a simple proc which I put at the top of any updates/inserts/deletes which records the change in this database. This allows for change requests to be created after the change is complete if required (after an emergency change)

    Regarding the update/insert/delete I always request that the statement is done with a select to check the data, the same select to backup the data into a table (even if it is just one record), the insert/update/delete, then the first select again to see the data change. All of this is wrapped in a rolled back transaction with the exec of the proc which records the change. This way we can clearly see what the record was - and what it will be and, when we are happy, it creates a record of the change and a backup of the existing data.

    Seems long winded but after writing the select and getting it correct, you just need to copy and paste this with a couple of adjustments for the backup and the update/insert/delete.

    *oh, and always remembering to set the update date and update user id (if possible) to make any queries regarding the amendment easier to track back in future*

  • It Depends...

    Most companies I have worked with allow "trusted" people to change it on an adhoc basis. Even when the "trusted" people have a dangerously simplistic knowledge of sql! The it turns into a finger pointing exercise and people running for cover.

    Proper ITIL does require an audit trial, which is a relief.

  • Doing data reconciliation in a Data Warehouse means we have to do fixes in short order on short notice and get it done correctly.

    Simply because it means that someone's pay is affected. And that be better right.

    Rights to update / insert / delete are restricted on a table by table basis and agreed beforehand (change request implementation).

    So our approach is:

    - we know the dates we most likely need to amend data so raise a change request to get approved ahead of time

    - in case it is an emergency raise a change request to go along with the data change

    - all data changes are logged in a separate audit log table with before state (after state is then in the operational environment and can be compared easily)

    Overall a little overhead with full audit log to roll back if necessary.

    Yes, mistakes happen (oops, why are there 100000 rows affected???) and are easily recovered. Flexibility and delivery are affected positively, company procedures regarding change requests are followed. IT has less work to do running just some scripts. Took about two years to arrange with everyone and now that it is working everyone agrees it is a good process.

    Now this works because:

    - the Data Warehouse is a collection of different data sources

    - our team are the data custodians of the Data Warehouse (better informed than IT)

    - our team work closely with all relevant data entry parties

    - our team has more knowledge about data entry procedures than most data entry parties

    - we are customer service driven

    - data must be right before reporting

    Do we change operational transactional data? No. We request it, make suggestions where necessary and guide IT. Operational databases are inaccessible for change to us. We just know the data better than IT. It's just one of those quirks when the Data Warehouse team is in Finance and not in IT.

  • That's a very unusual article I had to check the date wasn't April 1st

    If you do this on my team: don't let me catch you 😀

  • We created a small app that I run all my sql through that updates production. I no longer have database permission to run sql (that inserts/updates/deletes) in production. I can't even execute stored procedures. Oh I can run selects and even create # or @ or even ## tables, but no new tables. Everything run successfully through this app will enter the sql into a table in production so there is a record of everything done to production data. We also have to have a JIRA ticket to assign to the sql run. These entries are then randomly audited every month, the sql against the ticket, to make sure what was done in sql to production is reflected on the ticket. It's a long ways from the old days when I could just fix the problem on the spot, all because there are bad people out there that do bad things.

  • In our case:

    1. We create help desk item and place request - few seconds

    2. Prepare queriy(es) to fix issues.

    3. Save the records ASIS (based on criteria to fix) in separate database separate table with the same structure but name as table name + helpdesk ticket.

    4. Run query to fix data and then save query in help desk ticket.

    This way we always have original data before corection and query how data was fixed.

    Unfortunately, in our company is always cowboy's methods. When client is screaming it is almost always the case. I am trying to say that we have to be careful but we have many people with rights directly access production server data on the back end. And I can't fix it. At least after some fight I was able to restrict dbo rights and leave only read-write-exec stored procs. 🙁

  • In our case:

    1. We create help desk item and place request - few seconds

    2. Prepare queriy(es) to fix issues.

    3. Save the records ASIS (based on criteria to fix) in separate database separate table with the same structure but name as table name + helpdesk ticket.

    4. Run query to fix data and then save query in help desk ticket.

    This way we always have original data before corection and query how data was fixed.

    Unfortunately, in our company is always cowboy's methods. When client is screaming it is almost always the case. I am trying to say that we have to be careful but we have many people with rights directly access production server data on the back end. And I can't fix it. At least after some fight I was able to restrict dbo rights and leave only read-write-exec stored procs. 🙁

  • Where I work, a data analyst interfaces with the end user to determine what (if anything) needs to be updated, inserted, or deleted. If they decide to proceed forward with a data fix, then they'll open a ticket and describe what needs to be done. A developer codes the script, and attaches it to the ticket. Finally, a production support DBA deploys the script, but only after executive management has signed off on it.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • In some of these cases we obviously don't have time to go thru a proper development/test/deploy cycle. When quick data changes are needed, and that same change/fix is needed on an ongoing basis within a stored proc; we've been known to code the proc in production just to get it running. Then copy it BACK TO development :w00t: Not a best practice for sure; and we rarely do it. And of course we cover ourselves with all the code and data backups with the ability to revert quickly.

    Ken

  • But the important thing to understand is that any scenario where data in production needs to be "fixed", the root cause is a defect or missing functionality in the application, database programming, or ETL process. It shouldn't be situation normal or occasional for DML scripts to be executed in production.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I am a one stop shop so I have no one to check up on me. Almost always I create a backup of the table, do a select statement to verify potential results, update based on the select statement then double check and finally save the script with a little documentation. I've made 2 mistakes in the years doing it and always appreciated having the backup table!!! I keep the backup table for FAR too long though. I only delete them when they are a year old.

  • John Hanrahan (4/16/2015)


    I am a one stop shop so I have no one to check up on me. Almost always I create a backup of the table, do a select statement to verify potential results, update based on the select statement then double check and finally save the script with a little documentation. I've made 2 mistakes in the years doing it and always appreciated having the backup table!!! I keep the backup table for FAR too long though. I only delete them when they are a year old.

    Rather than making a copy of the entire table, you can leverage the OUTPUT clause of the INSERT/UPDATE/DELETE statements to copy only those rows that were affected.

    For example, if this is the basic update statement:

    update MyTable

    set p = p + 1

    where d >= '2015/05/01';

    Then the following will perform the same update as above, but affected rows are insert into the MyTableBackup table.

    update MyTable

    set p = p + 1

    output deleted.* into MyTableBackup

    where d >= '2015/05/01';

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Jeff Moden (4/15/2015)


    We have stringent rules at work for the promotion of all code. It doesn't change at all just because there's a "data emergency" of the nature you speak except for the urgency. No one is authorized to make changes to data in production without it running the gauntlet. Not even me and I'm the DBA for the company.

    I will also state that we're well practiced at such things and, for real emergencies, the problem can be flown through all the required steps in very short order when a "drop everything and fix it now" emergency is authorized. There's no excuse for cowboys even in the face of an "emergency urgency". 😉

    That makes total sense to me. Perhaps the last sentence should have been:

    There's no excuse for cowboys especially in the face of an "emergency urgency". 😉

    Maybe.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply