Don't Let Corner Cases Drive Your Design

If you graph computer/query cost against the size of data, you can get four quadrants:

small data, small compute (most CRUD app queries)
small data, big compute (complex BI queries for this quarter, most reporting)
big data, small compute (logs, audit data)
big data, big compute (complex BI queries across all our data)

If you examine the costs here, 1 is the cheapest, with 2 and 3 having a similar cost. Number 4 is expensive, and it's why we often have big boxes running our database server software. However, where is most of our work? The majority is in quadrant 1, with 2 getting the second most action. 3 might rarely exist, as does 4, but we often design for 4. We have to as we don't want phone calls, ever. What we want is to provision a system large enough that we don't hear many complaints about performance. On premises, many of us have over-provisioned systems to handle the peak load to avoid phone calls.

Can we handle the peaks or the really important things that someone thinks are important? Everyone thinks their workload is important, and it is. To them. However, there are plenty of cases where someone could think about designing for specific types of workloads, rather than just aiming for quadrant 4. I've got an image of different types of workloads that I grabbed from the Small Data 2025 conference. For example, if I am working with things like Time Series data or streaming analytics, I might not need huge compute. I might be storing a lot of data, and I need space, but the compute is low. The analysis of that data, however, might be compute intensive.

This is a reason why we might separate analytic systems out as they often are in quadrants 2 and 4, and we might want serverless or scale up/down systems to handle the rare cases, and get a real cost for them. I found it particularly interesting that the Bronze tier might be where we have big data and big compute, but once we've moved to Silver or Gold, we might have lower compute and data requirements. This makes sense as Bronze is more staging, but it is a good reason why we might aim for a Gold layer in our organization and only keep that data for the long term; it's more cost-effective.

Often, for simplicity, we build a bigger system for all types of queries. In other words, we are letting corner cases drive our design. That might be required, but it might not be. In this area of cost concerns, especially in the cloud, designing systems with appropriate resource usage is something that might override the analyst's desire for queries across all data running as quickly as order lookups in an OLTP system. This might be even more true if we can predict some patterns in our workloads during system design. We can't scale up or down instantly, but in a lot of places, I wish I had been able to scale financial or reporting systems up for a few days as we close out the period and scale them down for the rest of the month.

When building a system, think about the practical nature of your requirements and assign a cost to them. Let users know what workload you're building a system to handle and set expectations on performance and cost. If you do that, you can let others decide when we handle corner cases and when we don't. That's often a much easier conversation when we have cost numbers to help customers understand the implications of their request.

Bad Data and Dirty Databases

by Additional Articles

SimpleTalk

Database Design and Implementation

Many years ago, my wife and I wrote an article for Datamation, a major trade publication at the time, under the title, “Don’t Warehouse Dirty Data!” It’s been referenced quite a few times over the decades but is nowhere to be found using Google these days. The point is, if you have written a report using data, you have no doubt felt the pain of dirty data and it is nothing new.

2024-09-27

What does SQL NULL mean and how to handle NULL values

by Additional Articles

MSSQLTips.com

This tip will help you understand what NULL means and different ways to handle NULL values when working with SQL Server data.

2024-07-08

Poor Database Design Realities

by Steve Jones

SQLServerCentral

Database Design and Implementation

Steve sees that poor database design is the reality of the world and we have to work around that.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

You rated this post out of 5. Change rating

2024-06-07

283 reads

Discuss

Department of Redundancy Department

by Additional Articles

SimpleTalk

Ever wonder all the reasons that we use databases instead of file systems? While we don’t think of it too much anymore, the first reason that databases came into existence was to remove redundancies.

2024-04-12