I was browsing online the other day during a break and stumbled upon an article on the modern architecture stack for a startup. It's not a bad read, and as someone who's worked in startups, it's interesting to see what others think. As with many of these articles, it has a lot of practical advice, and it's also not relevant for many of us because we don't often work in greenfield development.
However, we could view adding something new to our application as greenfield. It's not completely the same, but there are similarities when we start a new feature that doesn't exist anywhere. Some of their advice, like docker-izing everything, won't apply, but there is one thing that does apply: data.
There is a quote, which I really like: "...what is the point of an API running on top of an empty DB? Manually entering necessary data shortly leads to depression (and the risk of increasing the duration of development cycles). Hence, we prepared a curated dataset that was inserted into the local DB to be able to play with."
That's similar to the advice I give when speaking on DevOps. Invest in a curated data set for developers. As this group learned, you can start to use this in testing, CI, etc. and it makes life better for developers. Heck, if you're using version control and you create a data set for yourself, save the insert statements, put them in a folder, name the script for the table, and share it with others.
It's a small change, but it's one that pays big dividends over time. If others use your INSERTS, and you use theirs, all of a sudden you have a good curated set. If you maintain this as you find bugs or strange things customers enter into production, you don't need a restore of production; you can add more data and run just those statements. Heck, if you add columns, add some data. You'll do that anyway to test your new column so maintain this as a script. Or update your data and use SQL Prompt to create INSERT statements for all the data and replace the entirety of your data script.
A big part of development is maintaining data. While there are virtualization solutions, like Redgate Clone, those can be cumbersome and expensive. They do solve problems, and they might be a good fit for your situation. However, I really like smaller sets of data that duplicate our problem domain. For those of you dealing with time-based problems, include scripts that "update" dates and times to simulate problems from today or yesterday rather than last week. The limit is the creativity of your team, and as you maintain this dataset, everyone benefits from small changes made over time.
Version control has been a boon to sharing software projects between developers, but underutilized as a way of sharing data as well. Add some data scripts to your project and you might be surprised how much easier it is to work as a team.