Worst Practice - Not Having An Archive Plan For Each Table

  • Glad to see someone brought this up, I must say this is a very important topic Andy.

    After working from a financial institution who had money to blow on DASD and coming to a company which have a limited space and no archiving implemented/designed. A solution had to be developed. I came up with about 3 ways which I think are good enough solutions.

    In this scenario, the company would like to keep the data as far back as they can and only about 3 mths of live/production data. So ...


    First, you could make a copy of the live db and name it arch_db and write to 2 places live db and arch_db. And just do a monthly cull, no worries.

    Second, your developers may not necessarily want to change their code, or some dont even use SPs and use embeded sql in their code. So... if you have no choice, you can use Triggers to keep tab of the changes, deletes, inserts. And apply these to the archive db, nightly, 5 mins or as needed.

    Thirdly, if you're only in the design stage, if you know that the data is going to grow exponentially, you could break up the tables into Months ... therefore it's easier to move old tables into an archive db and searching in the current db is as fast as you have tuned your server.

    Any questions please post, be glad to help and share the codes to my arch procedure.




  • Great Andy,

    There is an issue here of "who owns the data".

    Your particular example is something that can be decided by the IT department as it is their own internal logging process.

    The problem comes when ownership is unclear and no-one is prepared to say "Yeah, its old, we don't need it" even though no-one has used it, knows anyone who has used it, has heard of anyone using it.

    I think any project that involves storing data needs a particular piece in the functional requirement specification that deals with this issue before the project is built.

    In the case of contacts databases over in the UK we have this thing called the data protection act. One of the things it states is that data must not be stored for an excessive amount of time. Failing to have a decent archiving procedure is actually in breach of the law.

    I worked for an organisation that was scared to archive data because we never knew when we would be asked to plot trends in customer behaviour over 'n' years. 'n' being the amount of data you finally decided to retain in your system plus one.

    The way that I got around the problem was to have a set of archive tables with no indices that simply contained data over 12 months old.

    The users knew that if they needed to search this old data they had to specifically query the old data and that it would be slow to retrieve. Surprise surprise, they rarely bothered.

    I tended to archive the archive tables into MS Access and burn this into a CD and duplicate it. AS MS Access could store up to 2Gb and the archive per month was under 650Mb this was feasible. Storing it in Access on a CD-ROM meant that a user who specifically needed the archive could receive a copy and query it at their leisure.

    It also made retrieval of data back into SQL Server easy.

  • As a developer I have got to commend a DBA for bringing this up. I very rarely deal with DBA's and am mainly responsible for my own databases.

    In my experience most developers develop a product without thought to Entity Life History which is a very simple check on whether data should be created and checks that it is later destroyed (archived).

    This is fine at the development stage and will normally stand up to expected usage testing, but the knock on affects of creating data that

    1. is never used so there was never a good reason to create it except it existed in the old system

    2. is never archived (destroyed) so it slows down querying for more recent useful data.

    are larger than most developers give credit to.

    Mr Peter Livesey

    Mr Peter Livesey

  • Great article Andy.

    I fully agree. With more and more archive tables, the resources on the Server do increase and who knows, if you have a fair number, it could even impact on overall performance...although you'd need to be doing a lot of archive processing.

    We do archiving, but all we really do is copy the days orders and all it's other associated information that we download from JDE into history tables within SQL. This has no impact on performance as we do this around 10pm before we do backups so were ok. We only have 2 tables which track changes/updates throughout the day. As the changes/updates are small we have no issues with this.

    And the other issue you raised is how long is the data valid? Yes, the cost of disks is next to nothing, but why use up resources when you don't have to? Personally, I'd keep as much space free as possible.

    Fortunately, the management here agreed that we only need four months history within SQL Server. With that, we run a job on the first of each month to delete all history over 4 months. On first run, this removed almost 2 million rows!!

    Clive Strong


  • One of my clients had a customer complaints system that was killed every Monday morning because (badly written) management reports were being run against the database server whilst it was being used to log customer calls from 90 service desk people. Used to grind to a halt and drop connections regularly. This was a third party app, so changing the (dynamic...) SQL or archiving a chunk of the data was a non-starter and would have invalidated the support contract.

    Various solutions had been suggested by the vendor: putting a copy of the database onto a separate server purely for reporting ; upgrading the network ; upgrading the client PCs (odd suggestion as it was evident from the start that it was a server performance problem, even though they had visited the site on a number of occassions to diagnose the problem); and upgrading the server to quad 2.4Ghz Pentium Xeons (yep, we were considering going into weather forecasting as a side line...)

    Once the previous DBA left, we got into analysing the problem properly, and it transpired that virtually all of the queries were triggering table scans on big tables containing 18 months of archive data which couldn't be moved. So we set about creating some decent indexes and giving the supplier a roasting over the lack of them in the original product.

    Guess what? Reports ran in seconds not hours(!), users did't get booted off anymore, and customers didn't have to wait 5 minutes for their details to appear on the operators' screen. I guess the lesson of this is that if archiving isn't possible, then look at performance tuning next, before anything else. Costing it out, the time spent tuning worked out 20 times less expensive than upgrading the server - a very sensitive point to management in these times of cost cutting. So if you can't archive, tune. And if you can archive, look at tuning anyway 🙂

    Jon Reade

    Edited by - jonreade on 11/12/2002 05:47:43 AM


  • ..and of course simply bunging in another processor adds to your licensing costs, never mind the cost of RAM, disk and tapes.

  • Hi

    Its far from worst practice, but is something to consider. Archiving off data and making it subsequently accessible for historical reporting (typicaly old user queries and auditing) is difficult to plan and manage. I have worked on a lot of databases in my time as a DBA, and I have to say, very few analysts really take the time out to address this issue. I must say also that with the modern DBMS, large amounts of data can be searched quickly and effectively, server (cou, IO etc) performance is excellent and with the advent of grid-computing and highly distibuted systems with super quick periperials, archiving in MANY cases tends to be a non-issue and not worth the added costs in terms of application complexity.

    Saying all of that, auditing tables and logs (where the friggen developers log everything you can possibly think of in their apps) is really tough and is the bane of a DBA's life, especially in terms of # indexes (yes - queriyign auditing tables is just as complex as the live ones, resulting in more indexing and speed issues).


    Chris K

    Chris Kempster
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • quote:

    ..and of course simply bunging in another processor adds to your licensing costs, never mind the cost of RAM, disk and tapes.

    Yes, that was pointed out to them as it was per-processor licencing they were using. Something the vendor ommitted to mention in their 'recommendations'.

    Jon Reade


  • This topic is one that I've had in the back of my mind for months now. We're currently mid-stream in a total rewrite of a major application that involves transferring data from a Sybase SQL Anywhere DB to MS SQL Server 2000. (that is an issue in itself, but one one I'm too worried about right now) What has been a concern for me is the need to trim the size of our result sets when desktop applications are querying the server for records. Current design, as well as business process and legal issues, have meant that we must keep all historical data accessable. When someone goes to run our app, they're most likely only concerned with the last weeks worth of data. But, when someone goes to run reports they're going to want the last 3 months of data, and that could be in the 100s of thousands of rows, all the way up to current. I guess my biggest obstacle is what to do with foreign-keys.?? there is always a possibility that a user will need to update an old record, and I don't want to lose data integrity. so, maybe we have been doing worst practices, but I don't know a solution. ? does anybody else?

  • Archiving is a very important issue that needs to be discussed and sorted out with the customer

    before any kind of project implementation. We have come across several instances where customers

    donot want to physically delete data from the DB, but to just flag the record as 'deleted' even though it becomes redundant.Over a period of time we end up with so much junk that query performance takes a beating.In a few cases we have discussed with the client and arrived at periodic intervals when we can move/delete such data to other file formats.We have scheduled jobs backend that would remove redundant stuff and move them to separate tables, where the DB admin could possibly have a look and decide if the data could be deleted.

    This issue is of crucial importance since in the long run your DB could have junk than can hamper querying.



  • One approach I've seen is that if a user needs to edit an archived record it gets moved back with the "live" data so that all constraints and app logic work correctly. Not bad really.



Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply