Blog Post

Table Backups and the TableArchive Database


A discussion of how a separate database is used for storing table backups.

In a recent article, one of the things I mentioned was saving table backups before important and/or risky processes. For example, I generally do this before truncating the next year school assignment and school waitlist tables and reloading the new assignments/waitlists. Sometimes these backups are removed soon after, but some backups are kept on-hand for a few months or a year in case there’s ever a question down the line about what the input data was when we ran the process.

As I discussed, the table backups generally are given a name with the format: [TableName]_yyyymmdd_[ReasonToExist]. But where do I put this type of table? The answer is a database specifically created for such backups. At Boston Public Schools, this is the TableArchive database.

When I first started at Boston Public Schools, it was common practice to put such backups directly onto the main production database. This might be a reasonable practice if the number of tables in the database is under 40-50. When you get to hundreds of tables, having the backups on the main production database is simply messy. Because of this, I created TableArchive.

One problem with having backup tables on the main production database is it causes bloat – we have copies of production data on a number of servers, backups of the production databases going back several days, etc. Including these never-changing tables in these backups simply causes backup/restore processes to take longer and chews up valuable server disk space where one backup of the TableArchive database is likely sufficient.

There are other issues as well storing these tables on the main production database. It makes finding the tables with real data more difficult as you need to wade through a number of mixed in backup tables. It makes cleanup of the backups more difficult – there are more tables to look through when you want to clear out the old, unnecessary backups. Also, it is simply a better database server organization separating these tables into their own database.

We recently did an analysis of our main production database – more than 60% of the 500+ tables were either table backups [some from many years ago] or were obsolete. We’re in the process of removing tables or placing them into TableArchive as appropriate. If we can shave several gigabytes off the backup size and significantly reduce the number of tables we have, the cleanup effort will be a success.

One consideration I originally had was to set this database up on a separate server, not on the main production server. At the time, we did not use linked servers at BPS, so I decided to place the TableArchive database on the main production server. The rationale was the backup tables should be query-able – we might want to write queries to make comparisons between the backup and the actual production version of the table. Our newly hired DBA has linked some of the servers together, and therefore there’s a possibility we might move the TableArchive database off the main production server entirely. At this point doing so is low priority, but I’ll post an update if this occurs.

So how do others handle table backups? Do others do something similar to what we’re doing at BPS? Do others rely solely on entire database backups? Are the table snapshots directly on the production server? Maybe some development groups never need to make backups of individual tables? Please leave a comment and let me know.