Automated Database Provisioning for Development and Testing

If your development team needs to work on anonymised copies of the current production database, and if changes are being delivered rapidly as well, that could mean a lot of time and routine DevOps work copying databases. SQL Clone was designed for tasks like this. Grant Fritchey investigates whether you save time, effort and scripting over the more traditional approach, and at what point it makes sense to use it.

It’s a snap to do the first release of any piece of code, because nothing stands in the way: no compatibility issues, no dependency problems. It’s different when you have existing code that you have to maintain while you add the new features. When you have to adjust the existing functionality safely, things can get tricky very fast. This is why it is so important to have a set of tests for your code. You need to be confident that everything about your existing functionality continues to work while you adjust the existing code.

Now, let’s toss in the complication of having databases to process. All that testing just became that much harder because you also need to deal with the fact that you have existing data. How will your code-changes affect that data? To ensure fast, safe deployments, you will need to have comprehensive tests for the database, as well as for your code.

This means that you will need both development and test databases, maybe several of them, with as much data as your production system, and as similar as possible in its distribution. You may even need to have copies of your production databases in your testing environments and development environments. If so, then you will probably now have to deal with the process of cleaning that data to remove or modify sensitive information, because of regulatory requirements or just plain common sense. You then have to have mechanisms in place to provision your development and test environments with all this clean data. That’s a lot of work, and it will take time. You also have to account for the fact that your databases might be large, which means your database provisioning will take more time and require more storage space.

Redgate SQL Clone can help with provisioning a large database in a timely manner without straining disk storage, whether the data is artificially generated or is the ‘clean’ anonymized version of the production data. This article is going to explain how you would automate these processes using native SQL Server tooling and then show how you can automate the same processes using SQL Clone.

The Setup

I’m going to use a copy of the StackOverflow database, which weighs in at around 118 GB, to represent our production database. I’m assuming that you already have a process in place for taking a nightly full backup, and restoring it to a secondary server for testing, so our automated provisioning process will start from the point where the latest production backup has already been restored to a secondary server.

From there, we’ll need to run through the following automation steps, first using native SQL Server tools and then using SQL Clone:

 

Automation steps using SQL Server tools

Automation steps using SQL Clone

1.

Clean sensitive data out of the database prior to releasing to testing and development

Clean sensitive data out of the database prior to releasing to testing and development

2.

Back up the cleaned-up database

Create an image of the cleaned-up database

3.

Restore the clean database to a test environment.

Create clone in the test environment

4.

Restore the clean database to a development environment.

Create clone in the development environment

For this set of tests, I’m going to do everything on a single SQL Server instance, merely changing the database names to represent each environment. In reality, we’d be working with three different instances; one for clean-up, one for testing and one for development, but that wouldn’t change any of the automation steps, beyond requiring different connection strings for each instance. You will have to imagine the additional network burden.

Having established an automated process for producing a clean copy of the production database, each night, steps 3 and 4 would ideally be automated and secured to the point where “self-service” becomes possible. In other words, so that testers can quickly spin up multiple database copies for performing parallel database test runs, with realistic data, and so that developers can quickly create multiple database copies to, for example, compare side-by-side the behavior of two possible code solutions to the same problem.

In each case, the automation step would just be run locally in the same way that I’m going to automate it as part of the larger process. It would only involve getting the local restore process run.

Automation Using SQL Server Tools

Setting up the scheduling or triggering of the automation script is a simple part of this process, so I’ll leave that for you to figure out on your own. Instead, I’m going to focus on the PowerShell, yes, PowerShell, commands you need to use. If you’re not already using PowerShell to automate your system processes, the time to get started with it is long overdue.

Much of what we’re going to do uses T-SQL, but you need a way to control T-SQL across different environments, which in the absence of PowerShell would require setting up Linked Servers and other mechanisms that, frankly, most smart DBAs and database developers avoid where possible.

With our Stackoverflow database already restored from backup, we’ll run a simple clean-up script. modifying all emails to use the standard, safe, email address of ‘noemail@example.com’. I had to bump up the default QueryTimeout because I’m modifying so much data on a machine that’s not that fast. This is something you’ll want to keep an eye out for as you create your own data cleansing scripts. More complex clean-up scripts will usually take longer to run.

In a real cleansing operation, you’d be performing more complex cleansing procedures, and would also need several subsequent operations to ensure that no artefacts remain in the logs, data pages, buffers, or in the query store, that would allow a technical user to defeat your masking. However, such details are beyond the scope of this article and would be a distraction from its main theme.

With the data cleansing completed, I will back up the now-clean database:

From there, steps #3 and #4 are effectively the same, restore the clean database the test and development servers:

Of course, restoring multiple test and development databases to several different machines won’t, usually, run in a serial fashion like this. If it’s a self-service process then developers and testers will restore copies on an ad-hoc basis, and each restores will take the same time as step #3.

If this were a fully automated end-to-end process then if step #3 takes X mins and you need to restore 8 database copies, then you know that a serial process will take at least 8X. Depending on your set up, you may be able to reduce this somewhat by multi-threading the restores.

SQL Server Tools: Results

The following times where recorded when running these scripts against a modest Azure Virtual Machine running the latest version of SQL Server 2016 and the latest StackOverflow database:

Step

Time

Running Total

Clean Data

2 minutes

2 minutes

Create New Backup

57 minutes

59 minutes

Restore Test DB

93 minutes

152 minutes

Restore Dev DB

97 minutes

249 minutes

The times to highlight are the 57 minutes to create the clean backup and then about 90 minutes that each additional database will take to restore from the clean backup. Those times are not going to change and each of those 90 minute restores are necessary before any given environment is ready for use, so the idea of refreshing an environment more than once in a day is unlikely to be conducive to most work streams.

Automation Using SQL Clone

To repeat the setup, but using SQL Clone, the core process is effectively the same. I’m going to assume that we start at the same place, a restored, but dirty, database. We arrive at the following process:

  1. Clean sensitive data out of that database prior to releasing to testing and development.
  2. Create a new image of the cleaned data
  3. Create a clone of the clean image in a test environment.
  4. Create a clone of the clean image in a development environment.

We’ll once again use PowerShell to run through this process. First, we connect to the clone server:

We’re ready to clean up the data:

You’ll note that this is identical to Step #1 of the process using SQL Server tools. The cleanup scripts we have will be the same whether we’re using SQL Clone or not.

Now we have to create an image of the clean database:

This is the one operation in the SQL Clone process that is costly in terms of time and resources. Creating an image is the tax we pay in order for the clone processing to be as fast as it is. With the image in place, we can create clones for Testing and Development:

Similar to running the restore, every time you recreate a clean image, you’re just repeating the same process. Unlike the restore process, each of the clones you’re going to run in all your various test and development instances will only take about 15-20 seconds. Speaking of which, let’s see the results of the SQL Clone test provisioning process.

SQL Clone: Results

Step

Time

Total

Clean Data

2:29 minutes

2:29 minutes

Create Clean Image

91 minutes

93:29 minutes

Create Test Clone

12 seconds

93:41 minutes

Create Dev Clone

6 seconds

93:47 minutes

As you can see, the cost, the tax if you will, for getting the incredibly fast access of a cloned database is the time it takes to create an Image. However, the Image creation process is only used in limited steps, similar to the backup in the original approach. You then use that Image multiple times, saving about 90 minutes each time you create a Clone. Additionally, you’re only moving the data the once, at the time you create the Image; so you’re also saving disk space in all the places where you create a Clone

Conclusion

It’s pretty easy to see that it takes just about the same amount of work to set up automation using SQL Server tools to do a restore or to use SQL Clone. The amount of code involved is nearly identical. The differences come in two places. First, and most dramatic, is the amount of time saved. Going from 249 minutes to 94 minutes in total across this whole process is a significant time saving.

In addition, you have savings in terms of storage, because with SQL Clone you do not require 100 GB per instance in Development and QA, for each restored database copy. Each clone will occupy only tens of MBs of disk space.

As you increase the number of servers needing provisioning after the cleaning process, SQL Clone becomes more and more attractive.