A Version Control Strategy for Branch-based Database Development

Our existing database development and release process was holding back team productivity and causing frequent deployment delays. It was manual and error prone. Developers worked on a shared development database, with all changes committed to a 'monolithic', centralized version control repository. The dependencies this introduced, and likelihood of conflicting changes, meant database development was restricted to a 'serial' process, tackling their backlog one task at a time.

We set out on a "mission" to bring support for parallel development workstreams and process automation to the database. This article explains how we achieved the first milestones:

Migrating our centralized, 'monolithic' repos over to Git, a collaborative version control system (VCS) with strong support for topic-based development workstreams
Implementing a Release Flow branching strategy to support collaboration and parallel working, and a Pull Request workflow for automated build and release.

I'll focus on just how we implemented this new version control strategy, but these changes happened alongside a shift from using shared databases to dedicated, per-developer databases. We could not fully exploit Git's potential to help us split the development work into isolated, task-based branches unless we could also provision isolated databases, per branch, per-developer.

The migration to a different VCS sounds straightforward but should not be underestimated. As well as moving all the existing repos over Git, we also had to split them down into more manageable "per customer-per-project" repositories, with fewer dependencies.

Legacy process: centralized VCS, monolithic repos, shared dev databases

The development teams comprised both .NET and database developers. In the legacy system, all of the bespoke application code, for each customer, was stored in a single, monolithic repo. Within each customer's folder were multiple folders containing that customer's various projects (Visual Studio solutions). There was a separate repo containing all shared components.

Similarly, the database code was stored in a separate monolithic repo, with folders per customer and within each of customer folder was a folder per database containing the code (a SQL Source Control project).

legacy version control architecture

Initially, all these repos were hosted in Subversion (SVN), although later the team switched to Team Foundation Version Control (TFVC) for their application code. Both are so-called centralized VCSs, where all historical data is maintained on the server and developers only have one version (usually latest) locally. Before making changes, the developers check out files, and operations require a connection to the server.

All developers made changes to files in the same monolithic repo. There was also no branching. This caused maintenance and performance problems (long commit times, for example) as well as dependency problems. For example, when several developers needed to make changes to shared application components there was a high risk of altering code in a way that would disrupt the work of others. Often developers had to wait around for other developers to finish a task, when more than one user story was being developed in a repo.

Similar problems affected the database development work. A developer working on database A1 would see their own work mixed up with commits to other customer databases. The lack of branching and use the shared development model for database work meant there was no way for a developer to work in an isolated fashion, free from the fear of disrupting the work of others. Overall, it was very hard for developers to work in parallel; they were restricted to working on one "story" at a time from backlog.

When the time came to release the required changes, there were other problems. There was no concept of releasing a known and immutable version of a shared application component or of a database. In fact, with all developers working in one repo, with no branching, there was no real visibility into what code base was released to customer. To release, a developer simply copied what was built on their computer to the customer's environment. The same applied to the database changes, a manual, state-based comparison from Test to Prod was the source for a release of database code.

All this was the long way from the DevOps ideal of isolated and parallel development workflows that result in well-defined, immutable and thoroughly tested release artifacts.

DevOps: distributed VCS, dedicated repos, isolated dev databases

For all the above reasons, it was soon clear that no centralized VCS system, coupled with use of shared databases, was ever going to support the isolated, parallel development workflows that the team needed.

Git on the other hand is a distributed VCS. That means that you don't check out from a server, you clone the server to your local machine. Developers can commit each set of changes locally, in a branch, and perform version control operations such as history and compare without a network connection. And local is local until you decide to push your changes to the server. If you create a branch to experiment, commit code to it, but eventually realize it's not going to work, you can just delete the branch; nobody ever noticed it because it was all local!

Git supports very lightweight branching. Creating a branch is nearly instantaneous and switching back and forth between branches is a piece of cake. To the team, it seemed that if they could understand and master this feature it would unlock the door to a much more controlled and efficient development and release workflow.

Migrating to Git: breaking down the monoliths

Before the team could start to exploit Git's branching capabilities, they knew they needed to deal with the problem of the oversized repos, and so remove some of the inefficiencies and unnecessary dependencies that hampered their development work.

The application code was split into per-project repositories, with each shared component also in its own repo. Similarly, the databases repo was split down into one repo per customer, containing only that customer's databases, with each Database folder containing the code for that database (a Flyway Desktop project).

DevOps version control architecture with Git

Thanks to this info on GitHub the team achieved to split up dozens of repositories with the preservation of all commit history. Other useful tools for getting the migration done, technically, are:

Git-Svn is a tool that provides bidirectional flow of changes between a SVN and Git, so super easy for migration
Git-TFS does the same as above, but then for TFVC and Git, also bidirectional

Improved development and release processes

For the application code, the team had now split the codebase into much smaller functional pieces, where every project (Visual Studio solution) had its own Git repo. This had immediate benefits. It became much easier to maintain each repo, and performance improved. There was much less 'inference' between developers' changes as only developers working on a specific project or component were committing to that repo, instead of the big repo that everyone was using.

For the database code, there were multiple dependent databases per customer, so each repo now represented a "SQL instance" containing all the required databases. When a developer creates a branch, the 'isolated database' to support that branch actually consists of an "instance" of interconnected databases, so avoiding build problems caused by missing dependent objects.

For both application and database code, the use of dedicated repos gave much better visibility into exactly what was in a release, and what had change from the previous release, and made it much easier to write automation logic for the build and release process.

The team can now build, test and release each application component as an immutable build artifact. Each build is a new versioned artifact, so there's never any doubt exactly what is being released, and customers can use the latest or older versions of the same shared component, as required. Similarly, there is much better visibility into exactly what database changes are being released, which objects in which databases are affected and how, since the team can track these changes object-by-object in the version control system, via the schema model. These schema model changes are linked to the release and can therefore be used to see the changes between releases.

tracking database changes through the version control system

But it doesn't stop there, developers can also link work items while committing changes, or when running one of the checks that take place as part of the pull request workflow (covered shortly). This means that the version control system can also show what work items went into a specific release.

tracking work items through development and release process

Git branching Strategy

If you adopt a branching strategy for your team, you collaborate better and spend less time managing version control, and your team spends more time developing. You could say that adopting a Git branching strategy is a minimum requirement, for team-based development.

So, the team need to decide on a branching strategy, including a branch naming strategy, and then on the release workflow that will allow changes tested changes to be merged back to the parent branch, and deployed to production.

Release Flow

The pros and the cons of different strategies are discussed widely elsewhere, see this link for a comparison. Of course, you could also develop your own strategy with the help of this page on the Microsoft website.

The team adopted the Release Flow branching strategy, as used by the VSTS (now Azure DevOps) team at Microsoft. It is a scaled trunk-based branching strategy, with short lived 'topic' branches and a main branch that is always in a releasable state. The release flow branching strategy has a main branch plus three branch categories (topic, release, hotfix):

main: all topics start and end in the main branch. It must always be in a releasable state, and so represents the latest release candidate. All changes must be thoroughly tested before merging and any conflicts dealt with immediately.
topic: we create a new topic branch any and every time we want to write some code, whether to implement a feature or develop a fix a bug. A topic branch is always created from and merged back to main. The only exception to this rule is for hotfixes to production – see below.
release: the release branch is created from main' at the end of a sprint and is a frozen state of the main, at the point of branching, which is the deployed to production.
hotfix: a hotfix for a production-disrupting issue is always started from the 'release' that is in the Production environment. Hotfix changes are merged back into release but also cherry-picked into main.

git release flow strategy

Naming your branches

Another important agreement between the team members is how to name these branches. Although the branch categories are part of the branching strategy, the exact naming of each branch is not, and if the team can adopt a standard naming convention, then it makes working with branches so much easier.

branching naming standard

Branches names are constructed as follows:

<BranchCategory>/<RelaseId><->ext<ReferenceId>-PascalCasedDescription

Where….

BranchCategory – is the type of branch. In the release-flow strategy this will be topic, release or hotfix.
ReleaseId – is a unique identifier for each release, in this case comprised of the sprint and iteration of a release. For example, the second release of sprint 97 will be named s97.2. The ReleaseId is only included in Release and Hotfix branches, as follows:
- A Release branch is simply named as <BranchCategory>/<RelaseId>
- A Hotfix branch is always named according to the release from which it as created. So, the name of a hotfix branch to fix release s97.2 will start with s97.2-
ReferenceId – is an identifier from an external system, to identify the work item or bug to which the branch relates, for example ext10658 (all lower case).
PascalCasedDescription – is an optional description written in Pascal Casing (e.g., RebootTest)

Choose your release identifiers based on what works best for your team. For example, it might make sense for your team to have a ReleaseId that reflects a location, such as Atlanta, Brussels and so on. However, it's worth bearing in mind that an Azure DevOps repo sorts the branches alphabetically on name, so choose carefully – you won't want to see branches ordered illogically in your branch overview!

Of course, the same naming convention is applied to both the .NET repos and database repos. This also helps the team coordinate the release of application changes with the required database changes. If there is a user story with database and .NET changes, there will be a topic/a branch for both the database as well as the .NET repo. If a user story requires only database changes, then topic/a will only exist for the database repo, and in testing the team will just deploy the latest software release. Conversely, for a user story with only .NET changes there is a topic/a only for the .NET repo and the latest database release is used in dev and test.

The Pull Request workflow

The 'other half' of your branching strategy is understanding the process by which the code for a new topic, bug or hotfix is merged back into the main branch and eventually into a release to production.

To control the flow of code changes between branches, we issue a Pull Request (PR), which a formal administrative process in the version control system (in our case Azure DevOps-hosted Git). We can build review and test steps into the PR workflow such that code from a topic, for example, can only be merged into main if it passes the required tests and meets certain other criteria.

After a few iterations, each time incorporating feedback and improvements from retrospectives, the team arrived at a PR workflow based on used of Pull Request triggered artifacts, with the 'merge criteria' established in the branch policies of the target branch. The optimized PR workflow now looks as follows. You'll notice there are relatively few manual steps in the optimized workflow. Azure DevOps takes care of a lot of the process steps automatically:

Automated pull request workflow for releasing database changes

For a tester to do their job, the developer must first create a Pull request, essentially a request to ask if the topic branch changes are 'good enough' to get merged in to main. This is the point in the process where a formal code review must be made, build / test validation can be mandatory and linking to a work item from the board is possible for a better provenance. All these checks are done because the target branch, main, has them set up in its branching policies.

This PR is sent for code review. When the reviewer approves the changes, the developer generated a Flyway migration script, and the pull request is approved.

This approval one of the first steps of completion is met, but we still have a few boxes to check. The reviewer can enroll the PR for "auto complete" which means that the PR is allowed to complete automatically, and be merged into main, but only if all subsequent checks and tests "pass". However, to test the new release candidate, containing the latest changes, we must build the new version and release it to the Test environment.

To do this, we queue the release candidate for a pre-merging build. The key point about a "pre-merging build" is that it will include any changes committed to main in the meantime – so it's a build of the latest version in main, plus the changes from the topic branch (it roughly means "build as if the merge already is done"). The main purpose and advantage of this way of working is that it reveals merge conflicts immediately, rather than later when they can do more damage.

If the branch has been long running, it would be useful for the developer to review the changes made to main, since the topic branch was created, to check for possible conflicts, such as by generating a Flyway 'dry run' script containing all the changes to main between the version at time of branching and the latest version.

If the build of the new version is successful, it is released to Test, by deploying a clone of the latest database release in main, and then migrating it to the version available in the pre-merge build, using the Flyway engine (we'll return to those details on the provisioning of those clones in the next article). The new release is evaluated by the tester. If it all checks out, the tester accepts the release and, thanks to the auto-complete option, the code is now automatically merged into main. From there, a release branch is created for the deployment to production.

These steps are also highlighted in the diagram below depicting the end-to-end DevOps pipeline that we built using Flyway and Azure DevOps (which I describe in a previous article):

automated DevOp s pipeline with flyway enterprise

Summary

Our goal was to transform a 'serial' and manual database development and release process into one that supported isolated, branch-based development and automated releases.

The legacy process was characterized by monolithic, shared version control repos and shared development databases. Developers worked "serially", afraid of disrupting the work of others. There was little visibility into the changes that went into any release, and no concept of the "immutable build artifacts" required for effective deployment automation.

The new process involved a big change for the team but fixed all of these problems. Database developers could now work in isolated branches, free from the fear of disrupting the work of others. When work was complete, a pull request workflow would see it reviewed and tested, and on approval merged into the main branch to get deployed. This process is now fully automated, with built-in approval steps. Since its introduction, the team has both delivered a higher frequency of releases, and seen a significant reduction in the number of deployment issues reported.

Implementing a git branching strategy for a complex database development was a tough nut to crack, but quickly paid dividends. While the team works more efficiently and spends less time fighting deployment problems, the customer gets fixes and new features faster. A win / win result.

A Version Control Strategy for Branch-based Database Development

Legacy process: centralized VCS, monolithic repos, shared dev databases

DevOps: distributed VCS, dedicated repos, isolated dev databases

Migrating to Git: breaking down the monoliths

Improved development and release processes