Data Warehouse Backup Strategy

  • We’re migrating from SQL server 2000 to 2008 and I’m trying to come up with a backup strategy for our data warehouse on the new servers. I’m particularly focused on one DB that is most frequently used. I want to minimize the size of the backup file(s) and the impact that the backup has on the servers. Here is the background info:

    - We’re a relatively small shop and so have limited resources (time). No dedicated full time DBA.

    - We currently load 100+ flat files each night to 100 + SQL tables. We’re talking about 60-75 GB of data for these files/tables. We don’t have access to only the changed records from the source systems, so the tables are dropped and re-created during each nightly load.

    -We also have numerous tables that persist – typically these are tables where the data files from the source system are so large they can’t possibly be dropped/created each night. Instead, we extract a subset from the source system and insert/update the persistent tables. These tables account for about 150 GB.

    - The remaining tables in the DB take up another 100 GB. It’s a mixed bag here in terms of availability requirements – some are important (in the event of failure, restore would need to be within 24 hours) others could wait a week or more.

    - In the current SQL 2000 environment, using a simple backup model, the backup is about 350GB uncompressed. This goes nightly to a SAN, and from there to tape offsite.

    For a multitude of reasons, we’re not performing an all-at-once migration, so we’ll have one SQL2K and two SQL2008 servers running simultaneously. If we took the overall same approach as we did in the past, that would amount to about 1.25 TB of backups…not desirable! Especially since much of the data is dropped/recreated.

    Aside from utilizing compression software (considering Redgate SQL Backup Pro), are there other ways to reduce backup overhead (processing resources and storage resources)? I thought perhaps we might do file backups with the persistent tables on primary, then the drop/load tables on secondary file(s), but it doesn’t seem you can (safely/reliably) restore only some of the file groups. I’ve seen ‘undocumented’ ways of doing that, but I like to sleep at night…

    I thought about truncating the tables each night before the backup, but that will require significant redesign of existing processes in terms of timing etc. and resources are tight. I could do this with some of the largest tables that we drop/reload, but we also have some larger tables that we don’t drop/recreate each night. That means this approach will help a bit.

    Any ideas that don’t involve a complete redesign of the DB? If I had unlimited time and energy, I can see where separate DBs would be best (and not just for the backup strategy), but that’s not practical for us.

    Also, This is my first post...so suggestions on etiquette/process are greatly appreciated.

  • Using compression or a software like RedGate will certainly help.

    I would suggest using multiple backup files on large databases. That will reduce the IO stress and will help when exporting out of SAN. Test it, but in my experience, it is usually faster than using one single file.

    You do not mention if you have several LUNs , but using a dedicated LUN for you backup drive will also help, as you will divide the work (read and writes) while you are backing up your databases.

    Another technique is put those big tables on their own FileGroups. Then you can run File Group backups, allowing you to run backups at different times: one for very large tables and other one for small or system tables, which reside on different FG, but you need a FULL backup anyway, I believe. Here is a nice link about that topic: How can SQL Server 2005 OnLine Piecemeal Restore improve availability?

    The best approach: reduce data size. If you can archive old data or remove unneeded one, backups will run faster.

  • I'll have to check on the LUN with another person. I was focused on the file groups as a way to avoid backing up all the data, so I overlooked filegroups as a way to temper/throttle the impact of a single large backup. This approach will also address some of the timing/process issues as I can backup at the appropriate time. Thanks very much!

    I'll have a trial of Redgate Backup Pro on the server in the next couple of days, and hope that will also make managing multiple filegroup backups easier.

  • jamessfrench-839278 (4/18/2013)


    I'll have to check on the LUN with another person. I was focused on the file groups as a way to avoid backing up all the data, so I overlooked filegroups as a way to temper/throttle the impact of a single large backup. This approach will also address some of the timing/process issues as I can backup at the appropriate time. Thanks very much!

    I'll have a trial of Redgate Backup Pro on the server in the next couple of days, and hope that will also make managing multiple filegroup backups easier.

    I changed job almost a year ago. Used to work for a big, big, big company 😉 ... they had their own backup solution and never had to worry about that aspect of the SQL design.

    Now, in my new role and job, I am also in charge of backups, which I like. This is when I discovered RedGate SQL Backup and let me tell you, it is a good piece of software. The compression algorithm is really good, saves you space and time.

    Ironically, I am also having backup issues, but I do have around 1TB of data now, I'll have 8TB once I finished another Project. Even though my SQL design is really good, the SAN does not perform well, it is too slow for writes. So I will have to backup directly over the network. But if your SAN has good IO numbers, you may ask your IT guys to create separate LUNs: one for Data, one for Tlogs, one for tempdb, one for Backups. Separate LUNs segregate the IO workload and your backup jobs won't be competing for disk resources in the middle of the nigh.

    I also suggest create several backup files for big databases, let's say, anything above 100GB. It will run faster, and will transfer faster over the network too.

    Good luck!

  • I already did some testing on multiple backup files vs. a large single files and did see a reduction in time, though I'll continue monitoring and take a closer look in the next week.

    Our in house guys work with a consulting firm on most of the SAN and the configuration - I believe, but don't knwo for sure, that when the performed the most recent updats to the SAN, they had SQL in mind...I'll be asking some more detailed questions now for sure!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply