SQLServerCentral Editorial

Data Modeling Information

,

Data modeling is something that we should all be doing when altering the schema in our databases. I'd like to think that most people spend time here, but I don't think that's the case. I think plenty of people think "I need to store a piece of data" and they pick a string or numeric datatype and start stuffing in values. If in doubt, just pick a string. It's why I think we have lots of dates stored in string columns because that was someone's first thought.

There was a post recently that talked about storing data in its highest form. It was interesting to me because these are the type of decisions I try to make when designing a table. What is the best form in which to store data? The authors talk about picking not only a type that easily converts, but the fields that make it easiest to work with the data in different ways.

I do think that the aggregations or calculations that we need to perform should influence your data type. If you are measuring something, use a numeric. In fact, in their example of movie times, integer is probably the best type. While many databases and languages have time datatypes, some represent a measure of time (timespan), while others represent a clock (T-SQL time). Either might work for movies, but in aggregations, the T-SQL time will have issues beyond 24 hours. An integer is a better choice, assuming we don't care about seconds.

The second part of the post looks at multiple values, in this case customer loyalty points earned and redeemed. A simple running sum is what we might store in a database, though the application class might need two fields. Of course, modern software often totals these things for a customer as part of gamification and inducement to engage more, so maybe a data store would also want to store the title earned and redeemed, with a calculation to show the balance.

The one thing that I might add for developers to a post about modeling is the need to consider operations at scale. While using a bit more or less storage often doesn't matter for any row or any operation on a singleton set of data, when we scale across millions of rows, little things matter. Consider how your data might be aggregated and what happens if you have millions of rows to work on. There a better design decision can out perform a poor one by many orders of magnitude.

That and generate lots of data to test. You ought to know how to quickly mock up a million rows to check your queries. You might have a million rows in production.

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating