Automating database changes and releases is essential to move to an iterative development process where we are continuously deploying changes into production. This goal which although is still some-way off in the distance for most, has to be the direction for any serious development team who are looking to maximise their effectiveness in today’s more agile business world.
When automating database changes and releases we need to not only think about the benefits and technical complexities of making the changes but also to think about what happens when something goes wrong. What do we do when we make an incremental change for the 4th time that day and disaster hits critical business data is lost?
The first thing to make clear is that we have to target the level of precautionary activity to the level of risk that the business has to damage to the specific data. Two examples are where we are changing a table that is used as a temp store and cleared out every hour as opposed to a history of which medications have been given to a patient. These two scenarios have very different requirements and the level of risk to data should be well understood when doing any release, automated or manual.
Once we understand the level of risk that the business has for the data then we can plan for what happens when something goes wrong. This should be beautiful music to a DBA’s ears because every DBA should be constantly concerned about and testing their disaster recovery plans and in particular what their recovery point objectives (rpo) is and how they will meet it. This management of the Recovery Point Objective (RPO) is effectively what we need to make sure we understand and bear in mind. When we understand the importance of the data and what the effective RPO is we can plan and build our automated deployment pipelines.
Rollbacks take two general forms, the first is to take a backup before you deploy and if you have any problems restore that backup. The reality is that you should not even have to create the backup as you should already have a backup strategy with a defined RPO, so you should be able to just use that. It makes sense, of course, to verify that is correct and in place when you build your automated pipeline. The backup can be used to restore any lost data or to reverse the changes made to the code / schema.
The second form which just reverses the changes to the code / schema is to use the same tools that generate the deployment scripts to generate rollback scripts. Typically this is done by reversing the direction of the comparison from the direction that was used to generate the deployment script. For example if you generate a deployment script by comparing source control to your live database, the rollback script can be generated by comparing the live database to source control before the deployment changes it.
Personally, I used to generate rollback scripts for each deployment but rarely found they were needed and if they were needed then the tooling generated scripts fast enough it was simple to generate a script at a moment’s notice so nowadays I do not as a matter of course generate rollback scripts.
Rolling forward or fixing forward is something that is actually more likely especially in an automated world where we can make fast changes. I have worked in a variety of industries and for some organisations where any deployment failures would have caused a rollback using a restore and a re-scheduling of the deployment at a later date. In other organisations with a higher appetite of risk this would have been seen as a waste of time and any issues would have been fixed. Both of these approaches were correct for each individual organisation and the faster more agile company was able to get the code fixed and working quickest but also with an amount of data loss that the slower company was unable to accept.
Fixing forward is also sometimes forced where something has gone wrong but has been undetected after the initial release. If there have been a number of changes the simpler option is often to fix the issue and address any data problems as opposed to trying to extract any data changes before restoring an old backup and re-applying the data changes. There is of course no reason that this would just apply for automated deployments, mistakes can happen and must be planned for everywhere.
Fixing forward is exactly what it sounds like, if something goes wrong you find out what the problem is and you fix it, you push the change through the different environments and then you push it to production. It is not perfect and you may find that you end up doing quite a few small releases before the problem is fixed but having an automated pipeline you have the ability to do that.
To re-iterate you absolutely must have a valid RPO for the data, this has to be agreed by the business and the plan must be tested regularly - there is no replacement for this, whether you do automated deployments or not. Once you have the RPO you can then decide whether to roll back or fix forward.
No matter what you decide to do, these issues should be reviewed when everything is up and running again and actions taken and actioned to stop the same or similar issues happening again. Part of building an automated deployment process is continuously improving it and making sure that where there are short-falls these are removed over time.