Things to Consider
The marketing documents for database cloning tools make them sound as if they are the greatest inventions since the wheel. There are cloning tools such as Delphix Dynamic Data platform, and more recently, Redgate’s SQL Clone. Since SQL Clone is priced so that you don’t have to have the combined fortunes of Jeff Bezos and Bill Gates to acquire it, many SQL shops are giving these tools a serious look. That look is well deserved. If you plan carefully, the end result can be most helpful to both DBA and developer.
Let’s be clear up-front: SQL Clone is an outstanding tool. Managed properly, it can dramatically enhance the power and effectiveness of a development team. At the same time, it can preserve vast amounts of storage space, thus bringing your infrastructure manager from tears of sorrow to tears of joy as the growth forecasts for the non-production SAN drastically drop. Your biggest personal decision may come down to this: Tesla or Genesis? Or, if your boss is really happy, perhaps both.
However, with great power comes great responsibility. As I dodge shots of web from an upset Spider Man for stealing his slogan, I will attempt to take us through some considerations as more shops begin to adopt such a great tool.
Let’s Define Terms
In order for everyone to understand what is going on, we need to define terms. The SQL Clone world looks differently at things than a conventional restore-to-lower-environment would look at things.
The Image: An Image database is a master copy of the database. The clone tool uses this to create the small-and-fast clone databases.
The Clone: The Clone database is a virtual copy of the Image database. It contains a small amount of data that maps to the Image, and then stores differences between the Image and the Clone. A 4TB image could create a clone that is initially as small as 16MB (that is not a typo)!
Who Gets to Clone Databases?
Part of the power of tools, such as SQL Clone, comes from the notion that any number of developers could set up cloned databases on the fly. Why? If a developer has a bug to be fixed, and then the fix to be tested, he or she can take a snapshot of the master, set it up, and repeatedly break the database without affecting anyone anywhere else. The developer could even attach the clone to a developer-edition SQL Server on the developer’s own machine. Sounds wonderful, correct? It could be.
However, there are considerations:
Compliance: Does the Clone contain data that implicate HIPAA, SOX or PCI? If so, has the data been masked at the Image level, or do you need to mask the sensitive data in each clone? Can the testing be valid if the data are scrambled, masked or deleted?
Security: Does any data in the image require protection from the eyes of the developer? For example, data on executive compensation, trade secrets, etc.?
There are certainly other considerations as well. Redgate has heard us and is already working on automatic data masking and permissions management. For now, a smart DBA might use scripts to mask data in the image, and perhaps to drop unneeded sensitive tables from the clones. Why, for example, would someone testing a shipping application need the financial tables?
Therefore, it may be useful to have only DBA deliver properly-amended clones to developers upon request. This may be inconvenient, but it is still much faster than a full refresh from a backup.
When the Developers Want a Complete Refresh from Production
Here is the weakness in the image-to-clone technology. Let’s say that you have one image and ten clones. But the image database is months old, and now the developers need current data to debug a newly-arisen problem. A Clone won’t help. What to do now? The options are this:
- Create a refreshed database in a lower environment. Quick, dirty and saving no space at all.
- Create a new image to clone while retaining the old image. Allows a new point of reference from which to begin creating clones, but if the older clones are needed then the space savings are gone again.
- Replace the image with a new one. This retains all of the space savings but will necessarily blow away all of the existing clones. Careful planning is required to avoid adverse impacts
- Create a clone and use a tool such as Redgate's SQL Data Compare to make needed data changes to the clone from the refreshing database. This is a solution that makes one clone that will do the job, while preserving part of the space savings of cloning. It takes much longer to prepare, but may be an appropriate emergency solution.
This is where careful planning becomes quite critical. Some approaches that can help ameliorate the issue:
- Scheduled Image Refreshes: Whenever a major release comes up to production, it would be a good time to refresh the image, apply the necessary masking, and retire all clones from the previous image.
- Bite the Bullet and Have Two Images: If you have a rolling two-image solution, you can have one database at, for example, version 2.1 and another at 2.2. When 2.3 comes into system-test or production, and development has a branch working on that, the 2.1 image and clones go away. Developers then have the appropriate image from which to generate clones. You lose some of the space savings but you may have a more developer-friendly environment.
Cloning a Clone?
What happens when your development path hits a fork in the road? Developer A has clone B and developer C needs to make changes to do something else? And those changes have two conditions: (1) They cannot use a new clone because Developer A has done needed work; and (2) They cannot be done alongside developer A’s work on the existing clone?
Right now, the solution is a bit of work, because it is similar to the complete refresh scenario. You will need to clone again, let’s call it Clone D, and use SQL Compare and SQL Data Compare to synchronize Clone B to Clone D. It's extra work, but it still achieves space savings and it is far faster than a full backup/restore refresh. I’m sure that, as the tool matures, technology that will make this process easier will come along.
SQL Clone and other clone-and-snapshot tools are the next iteration of solid DevOps technologies for SQL Server. The DBA will find his or her job easier once familiar with these excellent tools. Go get a demo today and play. You will be glad that you did. With some careful planning, DBA will stop being a four letter word to your developers.
John F. Tamburo is the Chief Database Administrator for Landauer, Inc., the world's leading authority on radiation measurement, physics and education. John can be found at @SQLBlimp on Twitter. John also blogs at www.sqlblimp.com.