RE: Disk Is Cheap! ORLY? – SQLServerCentral

SSCoach

Points: 16430

January 3, 2011 at 12:10 pm

Thomas-282729 (1/1/2011)
I could make the same argument against any data type. "Think of the savings you are wasting by using a 4-byte integer instead of a 2 byte integer for a primary key!" In the grander scheme of things, what matters is whether affects actual real world performance given actual data sizes. The vast majority of databases do not even contain a million rows must less 10 million in any one table and thus tiny differences in datatypes will make no perceptible difference. What does make a significant difference are data integrity rules.

All things being equal, there is no reason not to take advantage of methods to be as efficient as you can. We are not talking about data integrity; we are talking about data validity. If your system has good reason to make use of an INT instead of TINYINT or DATETIME instead of SMALLDATETIME then by all means do that as you would be negligent not to. I am talking about not being wasteful. And I do not buy the argument that the "vast majority of databases do not even contain a million rows". To whatever extent that is true at the moment, it is a decreasing trend.

This is more an argument about archive tables than it is in-use tables. I've seen a few systems that had a one or two tables in the 50 million row+ range and those were primarily archive, auditing tables or data warehouses (where you can play tricks to cut down the pointer size). However, let's not forget that archive tables serve a very different purpose than in-use tables.

Yes they have a different purposes but there are benefits to be gained by some foresight for both archive and transactional tables, as has already been discussed.

When modeling, the choice of implementation data type (vs the data type in the model) is secondary. You do not care whether you use an int, smallint, tinyint or bit when modeling a database. What matters is that the value is an integer with some given range. That the DBA decides they can use a smallint instead of a 4-byte integer is done at the implementation stage. Still important, but not a part of modeling.

Data-modeling is two parts: logical then physical. Implementation that is so far removed from the reasoning for the logical design is much more prone to error. And again, nobody is talking about constraining the logical model to fit into the physical model. We are talking about making good implementation choices based on the logical model and the reasoning behind the logical model.

A few extra minutes of work now can save many hours of work in the future so why not do it?
Because it can also cost you a significant amount in development time. If you chose a tinyint for example and after the system goes into production it is determined that this is too small, it is likely that quite a bit will have to be changed.

This is why we are talking about making reasonable choices as opposed to arbitrarily using the smallest datatypes available. In my experience, I have spent more time reducing datatypes than increasing them because the vast majority of the time people choose INT over TINYINT to store values from 1 to 5.

Completely disagree. I've run into many issues with that ridiculous max value. Example 1: you have a table that tracks when an entity is "valid". You have a start and end date. Null end date = currently available. The client side code wants to use a max datetime value to represent that some entity is valid in perpetuity. In most programming languages, that date is 9999-12-31. Can't do it with smalldatetime. You have to store a mystery constant somewhere in the middle tier code that is used to represent that value. Example 2: copyrights. 10 years ago, you might have thought it safe that no copyright could extend beyond 2079 for something today or in the past. Surprise, surprise thanks to Disney, now it is a 100 years. Example 3: contracts. Yes, I've run into contracts that specify certain rights 20 and 50 years into the future. Saving those four bytes, in the long run, simply do not provide nearly enough benefit to justify them. If you are worried about storing time values, then add a check constraint which prevents time values from being stored in that datetime field.

You only disagree because you missed the point I was making. The point is: be sensible. That means using the best datatype for the purpose. Using an INT to store values 1 - 5 is not sensible. Using a DATETIME to store future dates that might be 20 - 40 years out is entirely sensible and nobody would begrudge you that choice. So your examples are all instances where the business-case dictates that DATETIME is the correct choice. Great. This takes nothing away from what I (and others) have been saying. And that you found an edge-case where copyright law was changed is just that: an edge-case. And if there is some distinction between "currently available" and "always available", then again that is either a good case for DATETIME or maybe a different modeling choice to begin with.

As for conversion, you better hope that your ORM does not account for the difference between smalldatetime and datetime. If so, that means additional unit testing to ensure that the change of the datatype will not break the existing code or that the code accommodates the expanded values.

1) People use ORMs because they provide an easy data-layer. It takes little to no effort to update the ORM model.

2) Why would you even mention testing in the sense that it might not happen? Don't all changes get tested? Not only was the article about future planning which implies that it is up-front work that is already being tested, even if we go back to make changes the assumption is that regression testing is still be done. Regression (not Unit) testing is assumed to be part of a project whether it is a new feature or updated feature (which includes changes to datatypes).

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR