Disk-based Backup in a Small-Business Environment
This article is a follow-up to a recent editorial by Steve Jones on this topic. There have been numerous trade journal articles over the past two plus years about the efficacy of using inexpensive IDE or SATA disks for backup vs tape. The purpose of this article is not to rehash those well documented pros and cons, only to describe our transition to a home-grown disk based solution.
Our company is a small financial services firm with approximately 50 employees that utilize a combination of thin clients and traditional PCs all running MS Windows. The servers are a combination of rack-mounted Dell
PowerEdge servers which are being replaced as they reach their EOL with SuperMicro 1U servers, all of them running W2k or W2k3. While it is beyond the scope of this article to discuss why we have migrated from Dell to a white-box solution, it has been my experience over the past many years that if you have the technical expertise in-house, a SMB is much better served by this type of solution than with the major vendors (Dell, HP or IBM). Even with the highest level of warranty/maintenance offered, SMBs simply do not have the buying power to get the attention of those firms. Additionally, the price differential allows you to provide a more robust solution for the average SMB back-office environment.
From late 2001 until late 2005, we had been using a traditional backup methodology with two LTO (rev 1) tape devices (DAT prior to that). Each device was capable of 120GB (compressed) and data was backed up using the latest version of Backup Exec with appropriate agents for SQL, Exchange and open files. We used a traditional tape rotation sequence and stored a full set of backup tapes at an offsite facility once per week.
As the business grew and with the advent of SOX (Sarbanes Oxley) as well as our own internal disaster recovery requirements, the backup window and protection provided by the tape devices was reaching the limits of that solution (both in space and the length of time in the backup window).
To create a backup solution that would exceed the governmental regulations as well as our internal requirements and be flexible enough to grow and improve our disaster recovery / restoration requirements. With tape, our exposure for data loss in the event of total system failure or a disaster (assuming the most recent tape was 100% accessible) was a minimum of 1 day up to a potential of 1 week. This window did not apply to the actual financial data which is maintained separately by the custodians, only the normal day to day, routine business data. Regardless, we wanted to reduce the exposure to a fraction of that, ideally to 1 hour or less (depending on the type of data).
We had been using software imaging for several years. Originally in the form of Ghost to push desktop images to clients to decrease downtime in system failure and later to the original Power Quest V2i solution for servers. (PQ was later acquired by Symantec [as was the company that made Ghost], and this solution was recently re-branded as part of the BackUp Exec family). I had tested the PQ product in the server environment and it worked well, but was limited to restoring to like hardware. While this was acceptable for the majority of system failures we were likely to experience, it would not meet our goals for disaster recovery. The thought of having to reinstall and reconfigure 6 servers with all of the associated patches, was not something I wanted to face. The effort of simply keeping the documentation current for that possibility, was extremely time consuming and tedious.
This has been an evolving solution for the past 24+ months and it will continue to evolve and change as hardware and software solutions improve. The most notable improvement to backup systems that I have seen is Continuous Data Protection (CDP) solutions. At the time I was researching this almost 2 years ago, there were very few options in the CDP field and all were outside our budget range. Since that time, there are several vendors in this space (including an MS solution) and the price point is becoming feasible for an SMB. As such, we are looking at including CDP to compliment our current methodology.
Since we have the requisite IT experience in house and the market was evolving rapidly due to both technology and acquisitions (Symantec buying Ghost, PQ, Veritas, etc), I chose to start with a homegrown solution to validate the plan and then add to and improve it over time.
STEP 1: Purchased an inexpensive 4 disk IDE JBOD enclosure from Promise Technologies and connected it to our existing backup device (a 1U Dell NAS ) via SCSI.
We installed 4, 400GB (largest available at the time) IDE drives and configured it as JBOD, giving us a 1.6TB device (1.45 usable). This was to be used as nothing more than incremental storage, so the amount of space was more important than RAID 5. Total cost (2 years ago): Under $1500 (not including the NAS). Of course this same solution would be much less expensive today and have a greater capacity.
Using the imaging software (V2i) we created weekly base (full) images directly to the JBOD device. We then pulled the images to our LTO tape drives for offsite storage. Since our total compressed image size at that time was under 120GB, we could store over 2 months worth of weekly images on the JBOD device and move an entire weeks worth of images onto one LTO tape cartridge.
While using server images improved our restoration time in the event of a normal failure, in the event of a disaster where all hardware was lost, we were still likely to be facing a full OS installation with data restore, which while an improvement, failed to meet all of our goals.
STEP 2: Approximately 1-year later, Symantec added the ability to create incremental images. This would allow us to easily capture incremental changes on an hourly (or less) basis and reduce the likelihood of data loss from 1 day to 1 hour or less. However, we still faced the daunting task of OS reinstallation in the event of total system loss.
STEP 3: This year (2006), Symantec (and others) added the ability to restore an image to totally different hardware. In the event of a total system loss, we can bring up a system on most any hardware that is capable of running the imaged software vs being limited to sourcing hardware that is technically similar to the system lost (or keeping spare parts for emergencies).
To test this, we purchased a SuperMicro 1U server with two SATA-II 250GB drives, configured as RAID-1 with a third drive configured as a hot-spare and redundant power supplies. We had a 2U Dell PowerEdge that was almost 5 years old that had reached EOL. This was a 5-drive SCSI system with the OS on two drives configured as RAID 1 and the data partition on 3-drives configured as RAID 5. The new system had twice the total disk space and was a much higher performance system that cost less than 50% of what a comparable system would cost from Dell or HP IF you could get a similar configuration which you cannot. The Dell system hosted W2k server configured as a Terminal Server.
We first migrated the OS and Data partitions (C and D logical devices) to the new hardware. After a phone call or two to Symantec, (their documentation at the time had no details on how to do this) the system came up and ran flawlessly. We documented the process, wiped the drives clean and repeated the installation in less than 1 hour!
We then tested upgrading this system to W2k3 knowing that if we had any issues, we could take it back to its original state in less than 1 hour. The upgrade ran without problems and the server has been running without any problems or a single event log error for several months.
STEP 4: Get rid of tape! With the increasing size and speed of SATA devices and decreasing prices we purchased a small 4 drive hot-swappable external SATA-II device from Addonics along with a PCI card and several removable drive bays that allowed us to connect it to a standard PC. Now instead of pulling the images off to tape each week, we copy it across our LAN onto a single 250GB SATA II drive, which is then removed and taken offsite. The SATA tower, card and removable drive bays were around $800 and the cost of
the drives only gets cheaper. Granted, you can buy solutions from Dell, HP, Iomega and many others that do these same tasks, but you will pay appreciable more for the same functionality.
All of this process is automated using a batch file (some habits die hard) that first deletes the oldest backup set from the JBOD and then creates a folder for the current set and copies the images and incrementals into a new folder named for the current week. The imaging software creates a new base/full image of ALL drives on all of the servers which currently totals about 175GB. Since this is run over the weekend over a gigabit switch, the total time to create all of the images is less than 8 hours. A copy of the latest images is then moved to the removable SATA drive which is sent offsite first thing Monday morning. I also use some in-house developed software to copy the images to the SATA drive and check for errors and send notifications via SMS if problems are detected. (All backup data is both password protected and encrypted.)
Since the most likely catastrophic event we will experience is a fire and secondarily theft, we needed a way to move data offsite more frequently than once per week or we still faced the possibility of a one week loss. We accomplish this by transferring the incremental images offsite via secure FTP. Fortunately, the amount of data we have to push offsite hourly is relatively small and can be accomplished with our current bandwidth.
Going forward, as we look at integrating CDP into this solution we will also explore moving the incremental CDP data offsite in real-time as well. In theory this would give us a solution that until recently would be impossible for a SMB. In the event of a total loss, we can purchase off the shelf components from local vendors and have basic services restored within 24 hours (assuming hardware availability). I have also tested bringing up base images as virtual machines. VMWare supports the V2i disk format and I have successfully booted several of my images as virtual machines this is a great timesaver for testing.
There are some limitations and concerns with imaging software, specifically as it pertains to SQL, Exchange and AD stores (transactions getting out of sync). These limitations are well documented by the imaging vendors and we do still backup this data separately each and every day using more traditional (agent based) solutions. The data is still stored on disk and moved offsite with the images, however I have restored all of these from full and incremental images without problems in both test and migration scenarios. However, I do understand the potential for failure and this is yet another reason for my wanting to add CDP to compliment this solution.
This is meant only to give you a brief overview of our current approach. If you have suggestions on how we can improve this (within the constraints of a SMB budget), I welcome any and all feedback.
There are several areas that I did not mention that are part of our disaster recovery plan and current backup methodology. Those were excluded primarily as they were not relevant to this context or due to security concerns. Additionally, I have no affiliation with any of the vendors mentioned. There are many that make these same devices and software. The ones mentioned are simply those I chose at the time and continue to use.
About the author: I have been involved with computers since the dark ages learning how to program originally in Fortran using lined paper and punch cards. I have been the partner/co-founder of 3 different software startups, each having varying degrees of success. Currently in semi-retirement, serving as the ersatz CTO for The Henssler Financial Group in Atlanta, GA.