The Data Model Matters

I ran across a statement that seems exciting to me as someone that has written a lot of code in their career. It said: "Many of the "modern" software practices of the last decade were early adaptations to this shift, even if we didn't articulate them that way. Immutable infrastructure. Stateless services. Containers. Blue-green deployments. Infrastructure as code. These ideas all share a common premise: never fix a running thing. Replace it."

These are a few sentences in this piece on the death and rebirth of programming. That's how a lot of software developers have viewed the world during the last decade and we've seen a lot of software advances in that time. The very successful developers and teams, who often speak at conferences and publish papers have adopted many of these practices. Serverless, containers, lots of tests allowing continuous deployment of new objects into complex environments that scale to levels many of us never thought possible. These are the very high performances talked about in the State of DevOps report every year.

At the same time, many people reading about these successes and trying to emulate them struggle. So many customers I know want to use containers, but struggle. Many teams lose control over serverless functions and stateless systems, having issues with immutable infrastructure. They revert, or often combine, older ways of building and deploying software with some of the techniques they read about.

If they struggle with stateless systems, it's no wonder they struggle with the really, really important stateful ones: the databases.

Databases are state machines. We evolve and grow them. NoSQL systems were developed to try and deal with some of the scale issues with relational systems, but they often push the immediate problems of concurrency and efficiency to the side, invoking eventual consistency and redundant data models that keep multiple copies of data around for quick access. They also defer one of the strengths of relational systems, aggregating lots data, to another system, usually a data warehouse, data lake, or some other architecture.

That works great, though it comes at the cost of more compute, more latency to develop and produce those aggregations, and more cost to store all that data in yet another place. That's not to disparage those designs. They work well and handle workloads most relational systems couldn't manage.

However that brings to mind two things. One, perhaps that easy and instant aggregation isn't as important as we think. After all, often companies at that size never have a view of all their data. It's changing too often, yet they are successful. Secondly, if you don't have the funding to manage that complexity (both in machine and human resources), perhaps you ought to focus on what is important in this age of cheap code changing often.

Build a strong data model and write efficient SQL Code.