The Case For DB CI
Like a proud dad I’ve watched the Tech department at my current firm grow from a three person exercise in cowboy coding to a 25+ person agile wonderland. Along the way a lot of engineering maturity had to be internalized – perhaps nothing as difficult as database CI.
The benefits are numerous:
- Source control provides both a change history for the evolution of our schemas and a framework for managing contributions by multiple engineers over a project’s lifecycle
- Automating deploys (and, just as importantly, rollbacks) removes the friction that normally accompanies the release of new database schema or procedures
- Aligning our data-layer best practices with those followed by our other code layers simplifies the problem space covering code and deploy management
It was a bit of journey to get where we are now, but, like most journeys, the destination is made that much sweeter by the adventure undergone to reach it.
Preparing the Team
For the team to be successful they were going to have to develop new engineering muscles, and this was not something that was going to happen overnight. New methodologies needed to be learned gradually so they would have time to sink in and become rote.
We started by switching from shared development databases to dedicates databases local to each engineer. Every engineer was furnished with SQL Server client tools and Redgate’s SQL Source Control. The next step was introducing Git as our source control system. While most of our app engineers were familiar with Git (or, at least, SVN), for the data engineers this was a new concept that came with a learning curve. However, without a shared competence in branching, committing and merging we wouldn’t have the foundation we’d need to build out the rest of our process. So, we drilled Git basics and established a modified version of GitFlow that we felt would work well with development and deployment nuances.
There was a lot to get used to here. For example, managing merge complexity by controlling local db drift; this required getting in the habit of fetching/pulling frequently from our source repos and using SQL Source Control to Get Latest. Also, committing frequently into feature branches and pushing changes up to origin. However, pushing into the base branches for each environment was only done according to well-defined definitions of done.
Putting it into Practice
Having established our source control best practices it was time to chase down the big prize: automated deployments. Prior to automating a process it’s always critical to understand the process in terms of the tools you’ll be using and, especially, the risks you might face at each step along the way.
For us the process was using SQL Compare’s (SC) CLI (command line interface) to generate the diff in state between a branch of our Git repo and a database in a specific environment (e.g. Dev, Test, UAT or Prod). We set up XML config files to manage our SC setting – one for each shared environment. At first, we would have SQL Compare output the reconciliation script to a file so we could study it and understand how it was trying to apply the changes.
This was a fairly bumpy road for us; there was a lot of nuance in how SQL Compare did its job that we had to work either through or around. For example, there were issues with deploying FileStream objects and NET CLR assemblies and procedures, which several of our database depended on. The solutions ranged from excluding certain classes of objects in our XML configs to resigning ourselves that some features would need to be managed manually (like FileStream) and their state not represented in source control (since these were extremely slow changing aspects of the system, this didn’t present major issues).
Perhaps the biggest hurdle involved databases that participate in transaction replication. Many aspects of replication will not play nice with using SQL Compare + Git to manage schema state. For example, the constraints that manage identities when you use SQL Server’s built-in identity pool management will almost always be out of synch with what’s in source control, but overriding them on deploy would be disastrous (note that our roadmap has us moving to bi-directional replication in the near future which should help resolve these issues).
In the early stages we would mitigate this by manually scrubbing the scripts produced by SQL Compare of any commands that would break our replication. Once we understood largely how to identify these commands, and under the guidance of an excellent Simple Talk article by Dave Ballantyne (https://www.simple-talk.com/sql/database-administration/cleaning-up-sql-server-deployment-scripts/), we constructed a Powershell script that would do this scrubbing for us. And then we drilled.
Starting a couple of weeks before our next scheduled Production release, we would hold regular rehearsals where we would run the latest changes through this process into our replication sandbox. Every time we came across a command that threw an error, we’d stop, figure out how fix the issue either by tweaking our XML configs or our Powershell script, and then start all over. Once we were consistently able to deploy changes into our sandbox without issue we knew we were ready for production deploys.
The final piece of the puzzle for us was wiring this process up to Bamboo, which is Atlassian’s build & deploy management software. Configuring Bamboo to talk to the SQL Compare CLI on our build server and making it aware of the paths to our XML config meant that that we could have any merges into specified Git branches trigger an automatic deploy of that branch to a database on a specific environment. For example, once an engineer had completed work locally on a feature branch (again, according to an established Definition of Done), he/she would merge it into the Develop branch, which would trigger a deploy of those code changes to the appropriate database on our Development server. Having passed integration testing and code review, the changes would be merged into Test and deployed to our Test environment for further QA.
And with that our database engineers had finally caught up to our app engineers in terms of the sophistication of their deployment strategies. No more manual generation of migration scripts (and their rollback counterparts!); no more inadvertent overwrites of one engineer’s changes by another who just happened to be working on the same stored procedure; no more missing source of truth for what the state of an environment’s db should be.
As I said above, we haven’t completely mastered the process. Some next steps are using the Redgate DLM tools to add automatic unit testing to our build processes; getting the next generation of our replication model to play nice; improving our flow process through source control. However, the work we put in provided significant improvements in what was a mission critical process. And the maturity that was gained by every engineer who participated serves as an investment into future endeavors.