Oops, I deleted that data

Question

Oops, I deleted that data

Viewing 15 posts - 46 through 60 (of 73 total)

You must be logged in to reply to this topic. Login to reply

jermitts 57867 SSC Enthusiast Points: 138 More actions · Answer 1

Developers could certainly have read access to production data. If a production change needs to occur to data for whatever reason, the production change should be done by an ops DBA, who would document the request, the change approval, the testing of the change script, the outcome of the testing, the user approval of the test results, the production execution, and the outcome of the production execution. This is the *most basic* form of segregation of duties. Even if your "ops DBA" is just a technician who isn't a developer. If your company is publicly held or owned by a publicly held corporation in the US, this isn't even an option, it's a requirement.

If your developer has write access to your production databases, you're operating on a development database, and operating far outside best practices.

Mike SSC Enthusiast Points: 118 More actions · Answer 2

We maintain "TableName_deleted" tables for our data tables, each of which is the same structure as the data table it reflects (except for no key fields), plus one field for the user and another for the time stamp -- these reflect who and when. Triggers are set for deletes and updates so that every change or delete results in the original record(s) being backed up to the "TableName_deleted" tables along with precisely when using GetDate() and by who using sUser_SName(). Thus we can see every single change/delete to a record, when it was made and by who, and go back to any version of a record we need to.

This works great for us, but our database is not huge such as discussed in this article, and we don't have hundreds of millions of changes, so maybe this isn't a great method for everyone.

I'm pretty new -- maybe temporal tables is the better way to do this same thing, but I haven't had time to learn yet just what those are.

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 3

Mike wrote:

We maintain "TableName_deleted" tables for our data tables, each of which is the same structure as the data table it reflects (except for no key fields), plus one field for the user and another for the time stamp -- these reflect who and when. Triggers are set for deletes and updates so that every change or delete results in the original record(s) being backed up to the "TableName_deleted" tables along with precisely when using GetDate() and by who using sUser_SName(). Thus we can see every single change/delete to a record, when it was made and by who, and go back to any version of a record we need to.
This works great for us, but our database is not huge such as discussed in this article, and we don't have hundreds of millions of changes, so maybe this isn't a great method for everyone.
I'm pretty new -- maybe temporal tables is the better way to do this same thing, but I haven't had time to learn yet just what those are.

I attempted something like that about 10 years ago

I created an audit table something like

TableName varchar(255)

FieldName varchar(255)

oldvalue varchar(max)

newvalue varchar(max)

datetimemodified datetime

using the pivot function i could get a version of any table at any given point - i don't have the code for it any more

It worked nicely on data sets that were not written to very often, but these days i wouldn't dream of doing it.

MVDBA

jermitts 57867 SSC Enthusiast Points: 138 More actions · Answer 4

Table triggers will kill your performance and are generally frowned upon. If your database is small and your users are few it won't matter, but this isn't a scalable solution.

I'm curiousy why not storing the primary key? How would you rejoin rows to related records in other tables with foreign key data that would also be restored? The primary key is vitally important.

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 5

jermitts 57867 wrote:

Table triggers will kill your performance and are generally frowned upon. If your database is small and your users are few it won't matter, but this isn't a scalable solution.
I'm curiousy why not storing the primary key? How would you rejoin rows to related records in other tables with foreign key data that would also be restored? The primary key is vitally important.

not sure if that was a reply to me - but I just forgot to put the PK on the post (I had to type it on the fly before I get going home)

MVDBA

aveek22 Mr or Mrs. 500 Points: 509 More actions · Answer 6

That's interesting, @mvdba. Being a developer, I keep myself restricted from fire fighting with the production environment. It is a nightmare for me to update values in production directly, without using transaction statements...

Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
https://twitter.com/avikoleum
https://www.linkedin.com/in/aveekd22/

Mike SSC Enthusiast Points: 118 More actions · Answer 7

Mike wrote:

Oops, I guess I wasn't clear so I'll try again.

The field(s) for the key is/are included in the _deleted tables. But the _deleted table does not use that for a key. For example, if [Table1].[RecordID] is the key, then [Table1_deleted].[RecordID] does indeed exist. But [Table1_deleted].[RecordID] is not designated as the unique key because it needs to exist in multiple rows.

This reply was modified 6 years, 1 months ago by Mike.

jermitts 57867 SSC Enthusiast Points: 138 More actions · Answer 8

Oh, that makes much more sense! I implemented a similar strategy once when there was a complaint that an unknown person was deleting and modifing rows from outside the erp software. They were found. 😉

thisisfutile Hall of Fame Points: 3669 More actions · Answer 9

Jeff Moden wrote:

thisisfutile wrote:
One takeaway I had from your post is that you use tables in another DB and reference them in the main DB via synonym. We have a web server that references our software DB in this manner and I always wondered if speed would be impacted by such a model (I've never tested it mainly because I'm not the developer) but it seems to be performing well all of these years. After reading your post I find myself thinking, "It can't be too bad if Jeff Moden uses it!" Any caveats you can think of for this model? Keep in mind, we have a 5GB database and a very low amount of web traffic (we're a small member-based company, not a Fortune 500).
We use synonyms a lot between databases so we can keep the general functionality separate and give us lines of demarcation for all sorts of things including the isolation of large tables and similar things. One caveat that we've run into in the past is similar to what we run into with views. We don't generally use "Hungarian Notation" to identify different objects and so it sometimes takes people a second or two to figure out if something is a table, view, function, or synonym. But it takes virtually no time to do such a thing.
Another more important caveat is that you can't use DRI (FKs) between tables that live in different databases. This hasn't been a problem for me because I've only split of the largest of tables and all of those are some form of audit or history or transaction table that has no FKs. If I did have such a need, I'd simply make a copy of the usually very small PK table. If you updated one of the copies, you'd have to update them all. Extended properties on the tables are a good thing to remind folks and we do use them. If that doesn't work for you, you could always make a "do nothing" function that has schema_binding on the table with a note in the flower box to remind people.
An example of why we don't use any form of "Hungarian Notation" is that today's table could become tomorrows view or synonym and vice versa. If you think not using "Hungarian Notation" is confusing, just wait until you run into a view with a "tbl" prefix or a synonym with a "tbl" prefix.
There is another suggestion that I have that has also helped us a lot. If you move a table to another database, always leave an empty copy of the table in the original database (of course, it will be differently named than the original because the synonym will use the original name to keep from having to change code). In a pinch, that allows the database to be copied without having to copy the database with the large table in it. This works particularly well for audit tables, which usually turn out to be the largest tables in most databases and are usually not essential if you need to "get back in business" after a massive crash or when you need to make a copy of the database for Development or Test purposes. All you need to do is rebuild the synonym(s) to point at the local table(s).
Also, I've not done specific performance testing for the differences between hitting a local table in the same database or a table in a different database on the same instance, but I've never found a difference during performance tests of related stored procedures affected by such a change.

Fantastic! Thank you for the feedback. I never considered this: "If you think not using "Hungarian Notation" is confusing, just wait until you run into a view with a "tbl" prefix or a synonym with a "tbl" prefix." I know my brain and when I see a little "v", I immediately know it's a view and if it was indeed a synonym, I might spend way too many hours not realizing that's where it was.

Thanks a ton for sharing. I think 25% of all that I know I've gleaned from your posts. It's much appreciated!

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 10

I once told a friend, I aim to please... I sometimes miss but I'm always aiming. I'm humbled by your comment. Thank you for the very kind feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

brian.pawson Newbie Points: 2 More actions · Answer 11

I've started using MERGE for updating records because UPDATE just freaks me out. It takes more thinking out but will get faster with practice and it's good to get familiar with merge.

Something like below. I can highlight the inner SELECT to verify the number of records. Maybe wrap in a BEGIN TRANS - COMMIT as belt and braces approach. Our tables all have triggers for insert, update delete to copy rows to corresponding history tables but performance impact on large databases might be unpalatable (our tables are less than a million rows).

MERGE       account                                     TARGET
USING      (SELECT  id,
                    COUNT(*) OVER()                     item_count,
                    0                                   active_yn,
                    account_code
            FROM    account
            WHERE   account_code = '4c')                SOURCE(id,item_count,active_yn,account_code)
ON          TARGET.id = SOURCE.id
AND         TARGET.account_code = SOURCE.account_code
WHEN        MATCHED
THEN        UPDATE SET TARGET.active_yn = SOURCE.active_yn;

gkrs Old Hand Points: 320 More actions · Answer 12

I think rollback commit is a good standard approach especially if you have triggers / lots of related tables. peer review prior to committing the changes and also CTE's if the delete is particularly complex or using the temporary tables e.g. select * into, to breakdown the deletions where possible.

For our back office suite of software, which incorporates accounting ledgers, we also have a selection standard sp's for regularly requested data removal e.g. where customers have made errors in processing and are not expecially keen on using e.g. opposing postings to correct. We make a charge for these amendments to discourage the requests and to ensure we get the correct details in writing but maintain the sp's which are often used in the software as for standard data movement/amendment and to standardise our usage and minimise wrong deletions

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 13

jermitts 57867 wrote:

Table triggers will kill your performance and are generally frowned upon. If your database is small and your users are few it won't matter, but this isn't a scalable solution.

I've not had such a problem with table triggers even with column level audit triggers. We did at first because someone prior to the current team tried to make the code portable between tables. It was taking 4 minutes to update just 4 columns of 10,000 row update on a 137 column table (yeah... that's a legacy problem, as well) because of the audit trigger. But it wasn't our code.

After I fixed it, it was within milliseconds of being as fast as the naked insert.

I agree that triggers are generally frowned upon but that most likely got its start because of things like the legacy triggers I talked about above. Triggers aren't the problem... it's the way some (most?) people write them that's the problem... especially for things like column level auditing. They also do "worst practice" things like auditing inserts, saving both the inserted and deleted table entries for audited updates, and other really bad things that they just don't know about.

And, yes... absolutely... I'll also agree that, like anything else, triggers are not a panacea. They have a place (other than the trash bin 😉 ).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 14

MVDBA (Mike Vessey) wrote:

The best thing we ever invested in is Our DR test server - regular automated restore tests and also re-usable as a data playpen - i'm hoping to get our restore tests to an interval of 3 days (so that our playpen is a little more usable)

Totally agreed here. We actually have 2 of these. One is smaller and does nightly restores of two critical databases from the backups that were just taken. Good place to do DBCC CHECKDB on super critical stuff. The largest of those databases aren't huge compared to what some people have but it does weigh in at almost 1TB.

We also have a "DevReview" system. Because I'm the only DBA, it's most "MY" system. The CPU and Memory isn't as beefy as prod but I have slightly more disk space on it than prod. We can restore the entire prod box using a "Nimble" snapshot in less than 5 minutes. We deploy our new code to it before deployments to production so we can do a RedGate compare with production to check for regressive code. We use it to test massive data changes or changes to massive imports both for accuracy and performance. And, in the area of being "MY" server, I use it develop POP code (Proof of Principle Code) that would bring the Dev box to its knees and also because the Dev box doesn't have nearly the data on it that prod does.

It also makes a great place to do DBCC CHECKDB. It's got close to 5TB of space and it's understood that nothing permanent should live there except a little code to help with the "Nimble" flash restores.

We also go one step further. I work really close with the folks on the infrastructure team. We were practicing "DevOps" for about 2 decades (the director of that group and I worked together at a previous company, as well) before anyone ever thought of the buzz word. We know what we can do when we get together to solve a problem and we know what each other can do separately.

With that, not only do we have the 2 DR/Test servers and databases, the infrastructure team also gave me a 1TB scratch drive on Prod and a 700GB scratch drive each on Dev and Staging. They're all reserved for DBA stuff. I made it clear that I'd never store any permanent data or program-ability on those drives in return. I use it for things like large index rebuilds (to keep the MDF files from blowing up when rebuilding a nearly 200GB Clustered Index, for example) and other DBA things. If you don't have one of these DBA-only scratch-drives at least on your prod box... you need to ask for one with the same conditions... exclusive use and never save anything permanent on it. I really lucked out... the Infrastructure Team knows what I have done in the past when I had the room and so they double-downed on the prod box by putting the DBA-Only drive on SSDs like the rest of prod.

It also makes a great place to store seriously temporary (keep for less than 24 hours, normally) copies of tables when doing modifications and updates. Of course, you should also continue to follow proper backup protocols even when this is available.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 15

p.s. Also, remember that the answer is always "No" unless you ask. If you don't have an "IN" with the infrastructure team or the people holding the purse, also remember that you might only get to ask once... so do your homework, write it down as a formal request, have people review it for concept, etc, and then submit it.

One great selling point is that of "over-provisioning SSDs" and "over-provisioning freespace" to handle sudden success and the immediate growth of data that goes with it. On the Dev and Staging boxes, it can be justified not only as having a temporary space to do important things without having to stand up another box, but in a horrible pinch, it could be used as a "Get back in business" server if your on-premise dedicated prod box ever failed and something went wrong with your DR plans (which will happen to you someday because people don't practice DR well enough or often enough).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)