The Challenges of Resetting Databases

I was working on a demo recently where we had a database in version control and a development database. This was a team environment, with a few of us making changes and syncing them across our dev systems using git. We had some advanced technology with our dev environments in containers, Flywaydb, and GitHub. Once we had our scenarios working, someone wanted to reset our git repo and capture a new database image.

However, when we reset the repo back, we had some issues with the database. In this case, there were changes in the database that didn't exist in the repo, giving us a mismatch. Not a big problem, but cleaning things out to get the db to match the repo, without putting those changes into the repo, was a challenge.

A developer I was working with got a little frustrated, because when working in C#, there is no state. If we reset the repo and sync our local copy, we have everything ready to go. However, a database repo isn't the same because there is often a database that exists separately.

This is the main challenge when working with databases, relational or otherwise, in a development environment. Experiments, bug fixes, even testing data changes persist over time. Resetting data to repeat tests, or even automating tests, can be hard.

This is one reason I think containerization and subsetting of production datasets will become very important over time as we try to ensure we can react to business requirements and keep our teams coordinated. These technologies ensure we always have a known starting point for our databases. At least at any particular moment. We certainly need to update this foundation as we deploy changes to databases.

Hopefully Microsoft, more vendors, and us as developers help advance these technologies, as well as help all developers build more skills. I'm grateful to Andrew Pruski, Anthony Nocentino, and others for the information they share about containers and databases. Hopefully we see more people engaging in these areas over time.