Worst Practice - Not Having An Archive Plan For Each Table

,

Every time you build a table you need to decide right then if and when data

should be removed from it, either to be archived to offline storage or to just

be deleted. How many of your applications have one or more tables that are just

for logging - maybe something a process records in case something breaks, or an

auditing table to track changes? Or tables that keep a real history of something

done, but that history just isn't worth much a year or two later?

I've seen this happen several times over the past year, but one incident

lately really brought it in to focus. I was doing some maintenance on a very old

process app that emailed various groups any time certain changes were made in

our project management system. This is really a workflow notification process.

Each time it runs it would calculate which changes had occurred, email those

changes, log them to a table, and update the table so the notification

didn't get repeated. An example might be that a project went from in progress to

final - to indicate to accounting that they need to close out the invoicing

process. Even if they don't get the email, how bad is it? The data still has the

correct status and it's reasonably easy to see what should be where.

So what good is the log table? If the process breaks in mid process (ah, the

reason for the above mentioned maintenance), you can just restart it since

notifications that were completed won't get processed again. No reason to look

at the log table. Possibly if you made a change to the logic and wanted to

validate the output you could use it for a few days to double check? Beyond

that, would I ever care what email got sent to who for something relatively

trivial like this?

Remember, I'm not saying that all log tables are junk. They definitely are

not! But in this case it was really a debugging tool (also known as CYA). So how

much history is worth keeping to handle things like 'hey, I didn't get notified

on that last week'?

Thirty days? A year? Anyone want to guess how much history we have? It starts

in 1996!

It's horrible, truly horrible. We're reindexing, backing up, storing data

that has NO value to us. We're spending server IO and cpu time inserting the

records. Yes, we do that stuff as part of our daily maintenance but it adds up.

How many more of these do I have?

It's not just log tables either. Do you really need those contacts you added

from a seminar back in 1998 that never panned out? Or the list of all changes

made to the orders table from 1997, years after the books have been reconciled

and closed?

If you decide when you build it, it's easy. Six years later? I have to make

sure no one has decided to report on the data, use it to trigger/augment some

other process, etc. I can't just truncate the table!

I'm still working out the best way to handle these issues, so far I've been

implementing one 'CleanUp' job per database that has multiple steps containing

things like 'delete from history where dateadded < getdate()-7'. Not

everything is that simple of course, sometimes you have to handle getting it on

tape or CD first. Another issue is that you don't always know what makes sense

for archiving when you build it. My recommendation is to either review all your

tables once a year or set a reminder (in a SQL job!) to notify the DBA in a year

that an archive plan needs to be implemented. An alternative technique is to

manage the task from within the various applications. Sometimes that seems

right, sometimes not.

So what about it? Got an archive plan for your tables? Have a story worse

than mine? Let's talk about it! Post your comments in the attached discussion

forum and we'll see what happens!

Rate

5 (1)

Share

Share

Rate

5 (1)