How Do You Decide to Rollback?

  • Comments posted to this topic are about the item How Do You Decide to Rollback?

  • The big thing that has changed for me in the past 10 years is the size and nature of the deployments.  Project deployments were big things carried out at 2am with core staff on-site and available.  There weren't that many rollbacks but we weren't a stranger to them either.

    These days changes are kept as small as possible and can only be merged into the release branch if all unit and integration tests pass.

    We might deploy an empty table in one release, a view on the table in another, the process to populate the table in another.  If something fails then it sticks out like a sore thumb because it tends to happen when a small deployment uncovered it.  As part of our database deployments we rehearse deployment and rollbacks and as both are small developing and testing the rollback is a minor overhead.  Rollbacks are extremely rare and when they happen it tends to be when a major foundation stone change is deployed, for example a critical cloud infrastructure change.

  • Working in government, a rollback plan is always required, in detail. Also, we don't ever make small deployments, unless the project itself is small. Deployments occur only after several months, then it's one massive, big-bang deployment. Because of the size of deployments, it is necessary to have rollback plans in detail, because you can almost guarantee that there was will something which will go wrong. I wish we wouldn't do it this way but asking us to not make gigantic changes once or twice a year is like asking people to stop breathing air.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • It's interesting, that I have clients doing both things mentioned above. Some still do big rollbacks, usually with a plan, albeit not a well tested one. Quite often I see rollbacks don't succeed (is that a second deployment failure?) or the scrambling to make ad doc fixes.

    For the "DevOps" customers, they still tend to batch up changes, and often have pieces of work related to the db changes, and they'll "Big bang" a few small changes. So the new table, new proc, and view changes all go together, rather than separately.

    Very few people actually will do a release ad hoc to the db for a single object, like I might change a typo or font size in a web page.

  • For me, a rollback occurs when the release is having a high financial impact on the company.  All of the releases I do are for internal-only tools and systems and go through several stages of development and UAT prior to going live.

    95% of the time, when a release goes sideways and a rollback is requested, it is because the release introduced a bug.  When this happens, we investigate what caused the bug, evaluate the effort to correct the bug vs rolling back the release, and pick the appropriate action.  Most of the time, fixing the bug is the desired action as rolling the code back means we are going to be rolling things back, fixing the bug, and re-releasing the application.  Plus, since the bug was not found with internal or external testing, it is likely that the bug is not a "show stopping" bug, but more of an annoyance that they can live with for a few days.

    The 5% of the time when the release goes sideways, the transaction will get rolled back and no harm is done.

    If the release involves data changes or table structure changes, the first step we ALWAYS do is to make a copy of the data that will be changed.  This gives us a snapshot of what the data looked like prior to our changes and allows us to revert the changes without doing a restore.  Downside is it wastes some disk space as we now have a table that is VERY likely never going to be used again.  On the plus side - all of these new tables have the current date in the table name so at a future time, we can drop all tables that are over a year old as it is VERY unlikely we are going to be rolling those back.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Great tips, Brian. The type of thing I've done as well. Any data changes mean we make a copy and waste space for a few days until we are sure things are OK.

    Of course, this is why I don't like TB dbs. I can waste a lot of space there.

  • The most important step is to understand the point at which you can actually roll back.  That might be anytime, for example if you've just changed some text or a graphic.  That might be at a very specific point during a deployment where changes have become customer facing, ie the new app has been released and is in use.  And increasingly as we're dealing with external apps and cloud systems more it might be never because you are subject to someone else's role out deadline.

    For deployments that will have a hard point where rolling back becomes not an option it's important to have people proactively monitoring the deployment and communicating issues early.  And have people who can make the decision to roll back involved during the process, don't just run an hours long deployment and hope it works.  That of course also means your deployment process has to provide enough feedback to see any potential issues.

  • ZZartin wrote:

    The most important step is to understand the point at which you can actually roll back. 

    I think this is a really important point.  That is a step on which we must fall down, or else o'er leap.

    I worked on a system that had a rollback mechanism to recover data for the previous day for a production table.  The problem was the system was a "tactical fix" intended for a single data source refreshing 500,000 records.  By the time it reached 40 data sources and 23 million records we found that its rollback mechanism could never be enacted within a realistic timeframe.

    The downside to small deploys is that people stop looking far enough ahead to avoid unnecessary work and often the bigger picture. This is a human problem.

    There are many plus sides of which some are as follows.

    • Any peer review is small, simple to understand and quick to approve/reject
    • It's easy to spot sparsity or even gaps in testing
    • It's quick to test both deploy and rollbacks.
    • You are constantly delivering something. Never underestimate the benefit of a management pacifier.
    • Blocking issues become very clear and receive the attention they deserve.
  • That's interesting what you said, Brian, about you company not typically rolling back a release even with a bug, unless the bug is really egregious. Because we release seldom, I've not seen too many rollbacks. It just takes a long time to see many rollbacks in action. But of the one or two I've witnessed, it's always been the case that if a bug is discovered, then the automatic reaction is to initiate a rollback immediately. That of course, changes the conditions under which the bug was discovered, which might be difficult to reproduce in a development or test environment. So, it makes it even longer before the next big bang release.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work wrote:

    That's interesting what you said, Brian, about you company not typically rolling back a release even with a bug, unless the bug is really egregious. Because we release seldom, I've not seen too many rollbacks. It just takes a long time to see many rollbacks in action. But of the one or two I've witnessed, it's always been the case that if a bug is discovered, then the automatic reaction is to initiate a rollback immediately. That of course, changes the conditions under which the bug was discovered, which might be difficult to reproduce in a development or test environment. So, it makes it even longer before the next big bang release.

     

    Hmm... at least where I work the motivation to push through with roll backs only being a last resort is that typically deployments are introducing new features and bug fixes, some of which might be highly anticipated and part of something beyond the scope of just an internal IT change.  So it very much becomes a question of how bad any new bugs actually are.

     

     

  • Ours fall into ZZartin's bucket too - if the new bug is low impact or low probability of reproduction or low risk, we tend to not roll back.  If the bug is high impact or highly reproducible or high risk, we evaluate if we should roll back or fix the new bug with a stronger priority on fixing the new bug.

    Rollback is more of a "last resort" solution for us. I honestly don't remember the last release we did that needed to be rolled back.  We test the rollback scripts out on live prior to releasing them, so we know the rollback should work.  Most, if not all, of our releases are pretty small, except when we are building brand new tools or adding major features to an existing tool.  Large changes like that though undergo more extensive testing where we make sure that the launch scripts work and can be repeatedly run and the rollback scripts bring things back to a state that matches live.  SQL Compare works great for verifying that the rollback scripts work for objects, and SQL Data Compare works great for verifying that the data rolled back successfully.

    Rolling back means removing features OR re-introducing bugs; both things are highly undesirable and make the department look bad.  That being said, releasing a new version of a tool that takes down production is also highly undesirable and makes the department look bad.  That happened with a different department (not mine thankfully) - they released an update to their tool that had a bug in it such that it couldn't communicate with the database. Not sure exactly what the bug was, but it took down the company for a day while they blamed the database and assured my team that they didn't change any code recently.  My team scoured the database to see what changed and where problems were only to come up empty.  When we discovered it was their mistake, they corrected it and business continued as per usual, but now any time they tell me the database is wrong, the first thing I check is if their tools were recently updated.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I'm with David Poole on this.  Deployments should be small and often.  Don't like the result?  Do another "forward" deployment, never backward.  If deployments are automated and fairly easy, there's little value in planning for rollbacks in the day-to-day world.

    Large, waterfall type releases are another matter, though.

  • larry.blake wrote:

    Large, waterfall type releases are another matter, though.

    Yeah, I know this only too well.

    Kindest Regards, Rod Connect with me on LinkedIn.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply