SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Worst Practice - Not Having An Archive Plan For Each Table

By Andy Warren,

Every time you build a table you need to decide right then if and when data should be removed from it, either to be archived to offline storage or to just be deleted. How many of your applications have one or more tables that are just for logging - maybe something a process records in case something breaks, or an auditing table to track changes? Or tables that keep a real history of something done, but that history just isn't worth much a year or two later?

I've seen this happen several times over the past year, but one incident lately really brought it in to focus. I was doing some maintenance on a very old process app that emailed various groups any time certain changes were made in our project management system. This is really a workflow notification process. Each time it runs it would calculate which changes had occurred, email those changes, log them to a table, and update the table so the notification didn't get repeated. An example might be that a project went from in progress to final - to indicate to accounting that they need to close out the invoicing process. Even if they don't get the email, how bad is it? The data still has the correct status and it's reasonably easy to see what should be where.

So what good is the log table? If the process breaks in mid process (ah, the reason for the above mentioned maintenance), you can just restart it since notifications that were completed won't get processed again. No reason to look at the log table. Possibly if you made a change to the logic and wanted to validate the output you could use it for a few days to double check? Beyond that, would I ever care what email got sent to who for something relatively trivial like this?

Remember, I'm not saying that all log tables are junk. They definitely are not! But in this case it was really a debugging tool (also known as CYA). So how much history is worth keeping to handle things like 'hey, I didn't get notified on that last week'?

Thirty days? A year? Anyone want to guess how much history we have? It starts in 1996!

It's horrible, truly horrible. We're reindexing, backing up, storing data that has NO value to us. We're spending server IO and cpu time inserting the records. Yes, we do that stuff as part of our daily maintenance but it adds up. How many more of these do I have?

It's not just log tables either. Do you really need those contacts you added from a seminar back in 1998 that never panned out? Or the list of all changes made to the orders table from 1997, years after the books have been reconciled and closed?

If you decide when you build it, it's easy. Six years later? I have to make sure no one has decided to report on the data, use it to trigger/augment some other process, etc. I can't just truncate the table!

I'm still working out the best way to handle these issues, so far I've been implementing one 'CleanUp' job per database that has multiple steps containing things like 'delete from history where dateadded < getdate()-7'. Not everything is that simple of course, sometimes you have to handle getting it on tape or CD first. Another issue is that you don't always know what makes sense for archiving when you build it. My recommendation is to either review all your tables once a year or set a reminder (in a SQL job!) to notify the DBA in a year that an archive plan needs to be implemented. An alternative technique is to manage the task from within the various applications. Sometimes that seems right, sometimes not.

So what about it? Got an archive plan for your tables? Have a story worse than mine? Let's talk about it! Post your comments in the attached discussion forum and we'll see what happens!

Total article views: 9170 | Views in the last 30 days: 5
Related Articles

An Automated Process to Archive Big Tables

Many databases have large tables with hundreds of millions of rows. However, many of these tables ar...


Email Changes

We've changed the way we send email for the newsetters. Read about our new address and whitelisting....


Archiving Data

Archive Data From Production DB into Archive DB


data archiving

how to do data archiving


Data Archiving

Data Archiving