Restoring a production database to some testing, development, or other server is a process that many DBAs and developers are familiar with. We may have automated scripts that perform this on a regular basis, every day or week, or we may manually do the restores on an adhoc basis, but it's something that probably happens in most IT departments.
Actually for the sake of disaster recovery, this is something that every DBA should practice periodically.
However, just restoring production databases into a test environment can introduce issues. Exposing real production data for testing is something you should avoid, especially if you work with external clients that must test the application. I've had more than a few clients be a little concerned over their real data being on a "test" system.
When I was in college, there was a programming contest every year that we were encouraged to enter. It's grown, but it's still an interesting idea. It's the obfuscated C Contest, where you try to write a program with source code that is very difficult to read. People view the code and try to figure out what the program does. I've seen one line programs that were thousands of characters, code that was written as an ASCII picture, like the one above.
I have seen some products that aid in test data generation, something I think is very valuable, but I've not seen a product that will anonymize data? In the past I've written scripts that would run after a restore on a test system and replace names, phone numbers, and other identifying information with random data. In this way we'd likely prevent anyone that was using the test system from seeing data that they don't have access to in the production system.
This isn't always the best solution. If it's not automated, people don't always run the scripts. Also from one test run to the next, you don't necessarily have the "same" information to view. Also, if you have people testing the new system v the live one, they'll expect to get the same results, which might not be the case if you've obfuscated the data.
Personally I think you should automate the restore of a production system to test or development system and include scripts to change data, especially financial and privacy information. Then run some standard reports that allow people to see the raw data against which they can check the changes or reports that they are testing.
It's better than having "real" information exposed because of programming errors.
The Voice of the DBA
The podcast feeds are now available at sqlservercentral.podshow.com to get better bandwidth and maybe a little more exposure :). Comments are definitely appreciated and wanted. You can get feeds from there.
- Quicktime Podcast - MB
- Windows Media Podcast - MB WMV
- iPod Video Podcast - MB MP4
- MP3 Audio Podcast - MB
Today's podcast features music by the beautiful, jazzy, Robin Stine. Check her music out at www.robinstine.com.
I really appreciate and value feedback on the podcasts. Let us know what you like, don't like, or even send in ideas for the show. If you'd like to comment, post something here. The boss will be sure to read it.