Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Using a Staging Environment for Tracking Down Bugs

A discussion why having a staging environment with data that mirrors production is vital to tracking down issues.

Being able to replicate production bugs is highly important. A key aspect to this is having data that is similar or [preferably] identical in a non-production environment where the issue can be tracked down. Maybe somehow a NULL value got into a database field. Or maybe there’s an issue with the new business logic that just got rolled out. Whatever it is, if the issue can’t be reproduced, fixing it is much more difficult.

At Boston Public Schools, we have a staging environment for tracking down issues. There’s both a staging web server with a clone of the production website as well as a staging database server with data that is refreshed on a nightly basis. In almost all cases this setup has proven sufficient for tracking down issues and replicating user-reported bugs. It helps us get the issue resolved without the risk of modifying real production data.

At this point the staging refresh process is fully automated. A SQL Server backup on production is run early in the overnight period, and at some point later a SQL job is run that copies the database backups to the staging database server and refreshes the databases. Because it takes over an hour to run, it is set up to run nightly, not more frequently than that.

Can the process be improved? Certainly. We recently hired a DBA, and he’s introduced us to the benefits of using database replication. It’s quite possible that the staging data refresh could be changed to use replication instead of database copies and restores. At first, we’d likely use snapshot replication and at some point transition to using transactional replication [which would allow the data to be kept up to date and in synch with production, thus making it even easier to track down bugs]. We’re not there yet, though.

Truth be told, the setup we have now is significantly better than what we had a few years ago – no nightly data refresh at all. In other words, to track down an issue, we had 3 options:

    1) We could log in to the production environment and change real data to replicate the issue.
    2) We could hope that the bug would show up on the development server [which would likely have out of date data].
    3) We could wait an hour or more to copy the latest production backup over and do a restore on the development environment.

None of these options are good solutions – we’ve advanced quite a bit from where we were before.

Comments

Posted by Anonymous on 8 June 2011

Pingback from  Dew Drop – June 8, 2011 | Alvin Ashcraft's Morning Dew

Posted by Chris Harshman on 8 June 2011

We do something simmilar, have a copy of the production database on a different server that the on-call developers use to track down issues, or get use to test reports on production data without using the resources of the production or QA databases.  We stick to the backup/restore method though.  Replication may be tricky to deal with in this type of situation since the developers will have a moving target they are trying to deal with then.

Posted by David Voss on 13 June 2011

If the data in your staging environment is subject to change due to testing / troubleshooting within the staging environment, then you'll need to refresh from a backup of the production system to ensure it's back in synch with your production system.

An extreme example... if in your staging environment you delete a student record (and corresponding course enrollments) and then 30 minutes later in the production system there are grades entered for that student, what happens when that data is replicated and there's no student record anymore?

At the University I currently work at, we refresh our Training environments every night.  Our users know that they can go in there and make any changes, delete records, add records, etc. and the next day a new copy will be available.  They love it for troubleshooting (so do I!) and for testing out new processes.

A side benefit is that each day you're also validating the previous night's backup.

For our reports and data warehouse project, we use replication.

Posted by bbaley on 14 June 2011

We have multiple environments, used for different purposes and treated differently for much the same thing - DEV and TEST (your Staging).

TEST, is generally loaded from backups done at a specific point in time and is generally used for applying updates (staging) based on a "known good state" of production, so we do not automate this process.

DEV, only gets restored when requested.

The transactional replication is an interesting idea for support - as I get constantly asked to refresh particular databases based on support issues.

Posted by Rob Nickolaus on 14 June 2011

I like the idea of having a full test set of data to reproduce production issues. Also, depending upon where this data gets copied to, you may be able to use the setup for performance testing with production levels of data.

The information contained in a production database could contain sensitive information that may need to be sanitized or scrambled. There can be scripts set up to do that. In that event, replication wouldn't work very well since you're constantly pushing corrections back in. Also, testing the database backup every day is a great plus here.

Posted by Imran Ashraf-452633 on 14 June 2011

Not having test environments not only leads to Audit failures but more importantly can lead to downtime due to insufficient testing. My current clients seem to be surprised when I requested that a critical system should have some sort of non-production testing environment. There view was that they don't need one as the business do not require it. Trying to explain that Auditors would not be too happy that any new patches, upgrades or development is done against a Prod system did not deter them. Manage to convince them when a security patch for OS was applied and killed the Server. The downtime manage to get the business to complain sufficiently to get the IT Managers to cater for a Test environment. The lesson being that as DBA's we should advocate the best possible environment that we are being asked to support. Let the business and managers know clearly the risk's involved in not haveing a non-production environment to test on, after all in this example it was not the DBA that killed the server!!!!

Posted by Bill Beadenkopf on 14 June 2011

Great thoughts, everybody.  I love the concept of getting to prove your nightly backup.  That's something that often gets missed until it's too late!

Posted by DBA on 14 June 2011

Great thoughts everyone...Restoring production nightly backup to staging is certainly a great idea. It will give you updated copy of data in staging as well as backup set will also verified. But, database will be offline (restoring stage) and we will not be able to query it for time being. In a case where database size is huge, it will take good amount of time to restore. Some companies have developers working at different geographic location and they need staging database also up and available 24*7, here we must consider down time with this approach. Any thoughts?

Posted by Bob Lee on 14 June 2011

Andrew.

Depending on what version of SQL you are running it sounds like a Snapshot would suffice for your review.  And it's almost instantaneous.  Word of caution though.  Once you are done with your debugging you'll want to get rid of the snapshot.

Posted by AndrewZwicker on 14 June 2011

Thanks for the comments everyone!  We in fact have 3 non-production environments [in my view, the more the better, because they all have valid reasons to exist]:

1) Development - Updated on demand, not too frequently.

2) User Acceptance - Updated prior to rolling out significant changes for acceptance testing.

3) Staging - Updated nightly.

Each environment has a database server and a web server.

Another environment we sometimes seem to have the need for [but that we've made due without having] is a training environment.  [We generally use either the testing or staging environment for training instead].

All 3 environments will need to be refreshed in a couple weeks when we run our 'School Year Flip' process [likely on June 30th], when we archive current year student data and move future year student data into the current year tables.  I'll very likely have a post about this in the upcoming couple weeks.

Posted by tobyteel on 14 June 2011

My company is thinking of doing something similar (automatic pulling and restoring of backups from production with scrubbing).  I'm curious if you have purchased or found a tool for doing this, or if everyone is just writing their own batch of scripts to handle it.

Posted by Jon Russell on 15 June 2011

Yes, none of those options are good. Restoring a recent backup for research and development is good practice. As stated by others, I like restoring from a backup. Not only does this provide recent data for research, but also proves that your backups are good.

We use both backup/restore and Transactional Replication. We restore a backup every night for researching issues. We also have another read-only database that is kept up to date in near-real time with Transaction Replication. Even though this replicated database is primarily used by analysts for reporting, developers often use it if they need to see data that occurred after the backup.

A word of caution: Understand what you are restoring. Restoring or replicating a production database may not be acceptable in certain compliance and regulatory scenarios.  Even if you do not have specific regulations you must work with, it is still a good idea to scrub out sensitive data. Encryption is also an option. Steve Jones just posted a good article worth reading on security (www.sqlservercentral.com/.../74299).

Leave a Comment

Please register or log in to leave a comment.