Disk Is Cheap! ORLY?

Question

Disk Is Cheap! ORLY?

Viewing 15 posts - 31 through 45 (of 101 total)

You must be logged in to reply to this topic. Login to reply

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 1

SanDroid (12/31/2010)
I only hear this naive argument from someone that has not seen the price of server class storage. When it comes to Enterprise class systems, nothing is cheap.

I agree that the word “Enterprise” added on to either hardware or software adds a healthy amount to the price. But sadly, there are quite a few people that have never looked into these prices.

gonsas (12/31/2010)
The article is pretty sweet, I'd just like to mention an error on the part refering to SMALLDATETIME... this data type does not discard time data; it's just more imprecise and covers a smaller date range

Thanks. Regarding the discussion about SMALLDATETIME, I was not saying that the datatype itself discards the time component (although in a sense it does discard the seconds and milliseconds since it is only accurate down to the minute). What I was saying is that for people that only want the DATE portion of the DATETIME and will not care about the Hour and Minute (and will usually set them to 00:00), then it does not make sense to use the full DATETIME datatype since the SMALLDATETIME also has the date but the less precise time is not an issue since it is being ignored anyway. This of course is solved starting in SQL Server 2008 with the DATE datatype, but for those who are not there yet, there is no reason to waste the extra 4 bytes on DATETIME (8 bytes) when SMALLDATETIME (4 bytes) is functionally the same as far as the MM/DD/YYYY (or YYYY-MM-DD) portion goes (assuming of course that you do not need to handle dates outside of the January 1, 1900, through June 6, 2079 range).

dforck (12/31/2010)
I find it amusing that he bothered to mention using char instead of varchar, but never mentioned that using nvarchar, or nchar, effectively doubles how much space is being used for that column.

I did not mention VARCHAR vs NVARCHAR because that is a trickier situation. I find it easier to predict how many items a table might have and so can plan for one of the various INT types but whether or not the data I will have in this field will ever contain Unicode characters is more difficult. Also, while I might be wrong about this, I find that people typically err on the side of using VARCHAR too often as opposed to using NVARCHAR too often (but maybe not). Lastly, the list I gave was not meant to be exhaustive but was a good starting point for people to think about the data they are storing and WHY they need to think about it. However, this is still a good point to make so I will include it if/when I update this article. Thanks for pointing this out.

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

David.Poole SSC Guru Points: 76299 More actions · Answer 2

Raw space is cheap but IO and concurrency isn't.

One point to watch out for is how well the records fit on a page of data. The first example I gave in http://www.sqlservercentral.com/articles/Database+Design/70314/ showed that a couple of bytes can make the difference between 2 records per page and 1 record per page so 1 million rows with and extra 2 bytes per row can be dramatically more than 2 million bytes extra.

If you have a table that suffers a lot of updates then it may actually be worth your while to pad your table so that lock escalation doesn't impact concurrency. Concurrency at the expense of storage!

There is a wider topic here, and that is attempting to throw hardware at the problem. I started a discussion on LinkedIn and there are some interesting opinions on the subject.

One point that was made was that communicating the need for such attention to detail is a tough sell to non-technical people. Even the most technical savvy fall prey to the pretty user interface and mechanically functioning demo without being aware or caring about the mechanicals underneath. This phenomena is not limited to software development. There is a highly profitable roadside breakdown/recovery/repair industry that relies on people choosing their vehicles on non-pragmatic considerations.

LinkedIn Profile

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 3

mwenner (12/31/2010)
Can someone clarify one sentence in the article:
>>
So you need to keep in mind things like (for SQL Server at least) NonClustered indexes typically contain the data from the Clustered index (exceptions are: Unique Indexes and NonClustered Indexes that contain the Cluster Key)
<<
I just reviewed the structures of non clustered indexes (http://msdn.microsoft.com/en-us/library/ms177484.aspx) and from what I see, it does not store the clustered index data. Instead of a RID (used when the table is a heap), a NonClustered index points to the clustered index key.

Hello Mark. The confusion is partially due to the Microsoft documentation that does not more explicitly define the term "clustered index key". The "clustered index key" IS the data of the Clustered Index. So your reading of the MSDN article is correct in that the NC Index points to the Clustered Index Key, which is specifically the data for the fields defined as the Clustered Index.

So looking back at Steve's example table in his description of this subject: if the Clustered Index were on (LastName, FirstName) and those values were unique so we could ignore the 4-byte uniquifier issue, you could create an NC Index on Status. In this case you could have a simple query like:

SELECT FirstName

FROM [TestTable]

WHERE LastName = 'Smith'

AND Status = 5

which would treat the NC Index as a covering index since it has all three of those fields in it. If the NC index did not have this data then it would have to go back to the main table for it. But in this case, the actual key fields for the NC Index would be: Status, LastName, and FirstName.

Now, Steve mentioned "Regardless of what columns are in the nonclustered index, the clustering key is there" which is not always true. It is most often true, but not for the cases of a unique index or if the NC Index contains the Clustered Index Key. Craig Freedman does a good job of showing all of these cases in the following blog:

http://blogs.msdn.com/b/craigfr/archive/2006/06/30/652639.aspx

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 4

James Stephens (12/31/2010)
While I agree in general principle with the gist of the article, I don't think everyone needs to run into work this weekend and take their servers down and rebuild all their tables with the 'correct' data types.
...
And let's look at the real savings (not talking about i/o yet--just disk cost).
...
Really though, excellent info and by all means use it if you're in the modeling stage, but if you've got an existing system, you really need to do the math and see how much you're really going to save and if you have the resources to do this.

Hi Jim. For that first part I certainly agree. The main thrust of the article was to get people to think about how they model tables more so than giving a laundry list of what to correct in existing systems (although for systems with performance problems it might be worthwhile making some changes).

As far as that second part goes, I disagree because what I was trying to get at with the article was not simply that there is a monetary savings here but a performance savings that is impacting to users. So, I don't think it would be fair to say in this conversation "let's not talk about I/O" because we need to talk about ALL of it since it is not one isolated piece of the system that is affected by these decisions.

For your last point, going back to the first point, I again agree that there is a cost-benefit analysis for refactoring any data structures, but hopefully by thinking of these issues up front as systems are being designed there will be less of that work to do in the future :-).

Charles Kincaid (12/31/2010)
Still the "disk is cheap" argument has one place where it must be stressed. Backup. Some of my customer complain at the database size and don't want to backup because of cost. I point out that here in the U.S. we have electronics retailers that have terabyte or larger USB drives one the shelf and they are cheap. Well less expensive than a day of business down time for certain.
...
Had this been a busy day for me I might have skipped your article based on the title. Your observations should almost be required reading as prep for any design meeting.

Hello Charles. Good point about the backups since it would be sad for anyone to not have backups. But to be devil's advocate about the topic at hand: aren't the MTBF numbers on the off-the-shelf drives much lower than the SCSI drives? Storing backups on a device that might prove unreliable and hence not be able to be restored from is not a backup at all. But maybe I am just being nit-picky here ;-). I will say, though, that a backup on cheap storage is certainly better than no backup at all (which I believe was your point :-)).

And I appreciate your very positive comment as well as you not having a busy day and hence not skipping the article :-).

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Steve Jones - SSC Editor SSC Guru Points: 739750 More actions · Answer 5

Solomon Rutzky (12/31/2010)

Now, Steve mentioned "Regardless of what columns are in the nonclustered index, the clustering key is there" which is not always true. It is most often true, but not for the cases of a unique index or if the NC Index contains the Clustered Index Key. Craig Freedman does a good job of showing all of these cases in the following blog:
http://blogs.msdn.com/b/craigfr/archive/2006/06/30/652639.aspx

I believe the clustered key is always there. Even if you have a unique NC index, the clustering key must be there so that you can look up the actual data. If the clustered key column is a part of the NC columns, then the key is there. It's not stored twice, but it is in there.

Leendert van Staalduinen SSC Eights! Points: 877 More actions · Answer 6

Excellent article. This is how I design table fields for over ten years. I did expect that everybody uses the arguments for this parsimonious approach, until I found that associates used int for fields having only two or three values. Indeed, "Disk Is Cheap"-like arguments were used not to follow my minimal approach.

Thanks for the article. Now I feel less alone.

M.vr. gr., Leendert.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Steve Jones - SSC Editor (12/31/2010)

Solomon Rutzky (12/31/2010)
Now, Steve mentioned "Regardless of what columns are in the nonclustered index, the clustering key is there" which is not always true. It is most often true, but not for the cases of a unique index or if the NC Index contains the Clustered Index Key. Craig Freedman does a good job of showing all of these cases in the following blog:
http://blogs.msdn.com/b/craigfr/archive/2006/06/30/652639.aspx
I believe the clustered key is always there. Even if you have a unique NC index, the clustering key must be there so that you can look up the actual data. If the clustered key column is a part of the NC columns, then the key is there. It's not stored twice, but it is in there.

I agree Steve. I read the referenced blog post (perhaps too quickly), but I didn't see anywhere a mention about any case where the clustering key isn't carried on nonclustered indexes.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

GPO (12/31/2010)
This sort of thing does have its amusing side sometimes. An organization I used to work for hired a data warehousing "expert" to fix the mess their warehouse was in. After many months of non-delivery and reporting queries taking 30 minutes to execute etc he decided that all SQL-based performance options had been exhausted. What we needed, he explained, was a "hardware solution". He put together an extravagant wishlist and complained to his boss that his failure to deliver was down to his boss's failure to provide him with the right components. So his boss filled the wishlist down to the last screw and presented him with it. This effectively took away any excuses for ongoing failure. Lo and behold, the hardware made very little difference to performance and rendered his position in the organization almost untenable. Certainly destroyed his credibility. The take-home message for me was that while hardware may be continually falling in price, be careful what you wish for. You'd better be damn sure hardware IS the problem before you go blaming it.

There are definitely a WIDE range of consultants out there and I think that many of them have no business calling themselves consultants, and they give those of us who DO know what we are doing a bad rap. I have been called in to clean up the messes left behind by quite a few of them. I am "The Wolf" when it comes to the SQL Server relational engine. 😀

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 9

TheSQLGuru (1/1/2011)
Steve Jones - SSC Editor (12/31/2010)
I believe the clustered key is always there. Even if you have a unique NC index, the clustering key must be there so that you can look up the actual data. If the clustered key column is a part of the NC columns, then the key is there. It's not stored twice, but it is in there.
I agree Steve. I read the referenced blog post (perhaps too quickly), but I didn't see anywhere a mention about any case where the clustering key isn't carried on nonclustered indexes.

I think there was a misunderstanding. I was speaking in terms of additional disk space usage so to me the term "is there" meant "is there taking up additional space", such as would be the case if it were stored twice. I was just speaking in terms of how to calculate the size of the resulting index. So now it seems that we actually do agree. Sorry for the confusion.

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

dpersson-635827 SSC Enthusiast Points: 121 More actions · Answer 10

You lost me at a few minutes vs a couple of seconds. If that were truly the case you would have to add an index no matter what, a tinyint to a bigint is not going to cause that kind of slowdown.

The one area you failed to mention is versatility. Most applications are tightly coupled with the underlying data. If I have a username field set to 20 characters to save space, when I need to increase this field, it's not simply a matter of altering the table. Every application using this data will have to be checked, and unit test will need to be updated(not fun). Furthermore, the fragmentation caused by altering a column would more than cancel out the savings of making it too small to begin with. You can always add/update/column to help keep pages in order, but that's easier said then done. If the column is indexed, primary key, large table... it could bring a website down, and any maintenance window is too much for management.

Save yourself a future headache, besides disk is cheap.

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 11

dpersson-635827 (1/1/2011)
You lost me at a few minutes vs. a couple of seconds. If that were truly the case you would have to add an index no matter what, a tinyint to a bigint is not going to cause that kind of slowdown.
The one area you failed to mention is versatility. Most applications are tightly coupled with the underlying data. If I have a username field set to 20 characters to save space, when I need to increase this field, it's not simply a matter of altering the table. Every application using this data will have to be checked, and unit test will need to be updated(not fun). Furthermore, the fragmentation caused by altering a column would more than cancel out the savings of making it too small to begin with. You can always add/update/column to help keep pages in order, but that's easier said than done. If the column is indexed, primary key, large table... it could bring a website down, and any maintenance window is too much for management.

Please keep in mind that the article was more so about planning how you approach new modeling as opposed to going back and fixing existing issues. My examples were cases where I did go back to fix issues because these ideas were not kept in mind and that did cause issues that warranted the time spent on the changes.

Also keep in mind that we are not talking about an isolated table but instead the overall idea of how a system is modeled so it would be many cases of BIGINT vs. INT. And these decisions affect many areas which do indeed affect performance. Since all parts of the system are affected together the best approach to modeling is a holistic one.

About your particular example:

1) If a change needs to be made for performance then it doesn't matter how tightly coupled the app is since new features could also require the same amount of work and testing.

2) There is no need to set a variable length column to any length just to save space since the actual bytes used are all that matters (in this case; I am ignoring the space left on the row for additional columns since we are not talking about that here). Meaning, a VARCHAR(20) field takes as much disk space as a VARCHAR(50) field since a 15 byte value is 15 bytes no matter how large of a value can be accepted for the field.

3) If you need to increase a variable length field, then yes, it is "mostly" a matter of altering the table as increasing a variable length field is a meta-data only operation. The existing data doesn't change. The only other thing to change could be the UI if there is JavaScript validation or a MaxSize option set on the form field. And you might need to update input parameters to any Stored Procs that accept data from the application so they don't truncate the data coming in.

4) Unit tests do not need to be updated (unless the Business Layer code is doing field size validation) but this is not related to the database since Unit Tests do not hit a database or any external resource. If they do then they are not Unit Tests. Validation Tests might need to be updated, but this is part of any change to a system so no more work than any other new or changed feature.

5) Fragmentation is a non-issue since again increasing the size of a variable width column is a meta-data only operation that happens instantly. And if any operation does cause fragmentation that would be handled by normal index maintenance that should already be happening.

6) If table changes are necessary and would cause a table lock (decreasing a variable size field or changing a fixed-width datatype) then you can: create a copy of the table with the ideal structure, create a job to pull the existing data over in small batches over a period of time, create a trigger to sync INSERTs and UPDATEs, and when done you simply do an sp_rename on the objects and then a sp_recompile on the table to make sure everything touching that table is updated. We do this all of the time where I work for tables with hundreds of millions of rows (50 - 100 GB) and we don't need more than a few seconds to make the swap (and drop the job) so we don't need any maintenance window.

Again, the main point of the article was not that everyone needs to go back and reduce datatype sizes of existing systems but to better plan future projects to reduce the need for such avoidable conversations about the cost of going back to fix problems when you find yourself in the middle of one.

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Thomas-282729 Right there with Babe Points: 718 More actions · Answer 12

What is never mentioned in the article is "Scale to what point?". Let me demonstrate: You are using a bigint for a primary key on a table with a billion rows. Think of the savings if you used a 4-byte int instead! A billion rows? Really? Raise your hand if you have encountered a table with a billion rows. Raise the other if it is has happened at more than handful of companies. In your example, you have 100 million rows. Really? I've been working on many databases over the past two decades and in only a handful of situations have I ever run across a table with more than 50 million rows and often they were archive tables. If you truly expect to scale out to those dimensions, then a 4-byte integer might be too tight and yes you need to pay more attention to the sizes of the data types. However, most databases are not going to scale out to those dimensions.

The arguments against guids were absolutely applicable 20 years ago, less applicable 10 years ago, nearly irrelevant now and will eventually be entirely irrelevant. Who cares that a guid takes up 4x the space if you have 100x the processing power and memory? What about 1000x? At some point, that additional space becomes irrelevant. 30 years ago people fretted over using a byte to store the year instead of two. Today, such an optimization would be considered silly. I suspect that if MS supported a datetime datatype with nanosecond resolution a two decades ago, that guids would never be used.

In general, the argument "space is cheap" is to oppose premature optimization. However, premature optimization is not the same thing as ignoring data integrity. You shouldn't use a nvarchar(max) for a first name column, not because the amount of space it consumes, but rather because a user will eventually stuff a copy of War and Peace in there. Similarly, using a smalldatetime to save four bytes is often a mistake because of its non-intuitive maximum value.

GPO SSCarpal Tunnel Points: 4574 More actions · Answer 13

@Thomas-282729

Who cares that a guid takes up 4x the space if you have 100x the processing power and memory? What about 1000x?

The argument that it's OK for data to take up more space than it should because we have the processing power and memory to handle it, discounts the question of what we're actually going to do with the data we're collecting. Thirty years ago some types of analysis might simply have not been an option to use on a day-to-day basis. You would have been telling your manager "If you want that question answered we're going to have to run it over a weekend. And we have a queue of people wanting stuff processed on weekends so you'll need to fill out this form..." Now we have the extra memory and processing power we can get our hands on answers in milliseconds, that we'd waste hours on years ago... provided we're efficient in how we set up our data. If you ask your manager whether she'd rather wait 10 minutes for her answer, or 10 seconds, she's going to go for the 10 second option. The other thing I've noticed over the years is that data doesn't really answer questions, it merely creates a framework for more detailed questions. As soon as you give the executive their dashboard of KPIs, they are going to ask "Why is indicator X falling?" We give ourselves more chance of quickly getting into the detail if we've been efficient with our data.

Thirty years ago it might have taken an organization say 10 hours to answer question x. Question x was a basic fundamental question that had to be answered. Thirty years later, the organisation has 5 times the number of employees it had back then. It has hundreds and hundreds of times more data (perhaps billions of times more data). Back then very little was collected electronically. There were rooms full of paper files instead. Now data is collected on every imaginable aspect of the organization's life. Question X from 30 years ago is passe. There are bigger questions to answer. It doesn't matter what the hardware status quo is at any given point in time, if you're inefficient with your data, retrieval, maintenance and analysis will take longer than they should. The organization down the road that is competing with you, and who DOES take parsimony seriously, will have a competitive advantage (all else being equal).

What my argument doesn't address here is the opportunity cost of parsimony. Does it cost more to hire competent DBAs than it does to get the same outcome through better hardware? You don't have an unlimited budget. How do you decide where the money gets spent?

...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 14

Thomas-282729 (1/1/2011)
What is never mentioned in the article is "Scale to what point?". Let me demonstrate: You are using a bigint for a primary key on a table with a billion rows. Think of the savings if you used a 4-byte int instead! A billion rows? Really? Raise your hand if you have encountered a table with a billion rows. Raise the other if it is has happened at more than handful of companies. In your example, you have 100 million rows. Really? I've been working on many databases over the past two decades and in only a handful of situations have I ever run across a table with more than 50 million rows and often they were archive tables. If you truly expect to scale out to those dimensions, then a 4-byte integer might be too tight and yes you need to pay more attention to the sizes of the data types. However, most databases are not going to scale out to those dimensions.
The arguments against guids were absolutely applicable 20 years ago, less applicable 10 years ago, nearly irrelevant now and will eventually be entirely irrelevant. Who cares that a guid takes up 4x the space if you have 100x the processing power and memory? What about 1000x? At some point, that additional space becomes irrelevant. 30 years ago people fretted over using a byte to store the year instead of two. Today, such an optimization would be considered silly. I suspect that if MS supported a datetime datatype with nanosecond resolution a two decades ago, that guids would never be used.
In general, the argument "space is cheap" is to oppose premature optimization. However, premature optimization is not the same thing as ignoring data integrity. You shouldn't use a nvarchar(max) for a first name column, not because the amount of space it consumes, but rather because a user will eventually stuff a copy of War and Peace in there. Similarly, using a smalldatetime to save four bytes is often a mistake because of its non-intuitive maximum value.

I hear what you are saying but I still disagree since I think we are using different assumptions.

1) Please keep in mind that this issue is a compounded one since a field can show up in other tables as well as using memory, CPU, network, log space, etc. We are not talking about one field in one table here. If we were, then maybe I would agree even if we had 500 million rows (although I would still try to model it properly if it is a new project but I wouldn't worry about it for an existing system). When people don't pay attention to datatypes they generally carry that idea onto many fields in many tables so even if you don't have many large tables you still have many fields across many tables that could be taking up resources unnecessarily.

2) Regarding the idea that most database do not get to the hundreds of millions of rows, I think that is an issue of looking back 20 years as opposed to ahead 20 years. I certainly respect your experience in the field but I think it is becoming clear that more and more applications are becoming highly data intensive and given that storing large volumes of data today is much more feasible and economical than 10 - 20 years ago, and that BI and data analysis is becoming more popular, I think we are already seeing a growing trend in data retention. And regardless of table size, if operations on a table take longer due to additional resource consumption then that does affect the user experience and more applications these days are web-based with users expecting instantaneous response times and they don’t care if there are 1000 other concurrent users on the system who might be hitting the same table(s).

3) Regardless of how many rows a table might have over a 5 or 10 year period, if I have the chance to model something properly in the beginning then I will certainly do so because there is a very minimal extra cost in terms of me thinking about the problem and maybe asking a few more questions of the person requesting the feature. A few extra minutes of work now can save many hours of work in the future so why not do it? And who can predict that their project will never grow to such sizes? We don't know right now how many customers we will have and how many new features will require additional rows and/or tables, but if I can say that it is nearly impossible to have more than 5 status values, then it doesn't matter if it becomes 10 in the future because a TINYINT will always work and the resources saved in this instance can be used elsewhere as we acquire more customers and more data. And if the data size never grows then what was the cost? A few extra minutes in the beginning?

4) I don't think it is entirely relevant to state that we now have 100 or even 1000 times more efficient hardware when the financial resources to get such hardware are not infinite. Yes, there will always be faster CPUs and more memory to add but that doesn't mean those budget items will be approved. That is a very short-sighted way of looking at this issue: a vendor selling the proper hardware in no way implies my ability to purchase it to solve my problems. And even if I did have the money to spend on a faster computer, why would I do that if I didn't need to? That is just throwing good money away and I would rather spend an extra 5 - 10 minutes at the beginning of a project to save the company $50k in the future that can instead be used for my bonus.

5) DATEIME2 datatype does not replace GUIDs since GUIDs are used as record-locators and I cannot see that happening with datatime data even if it can point to a unique record

6) Using SMALLDATETIME when the time itself is not wanted is nearly always a good idea and the limit of year 2079 is not much of an issue as a "non-intuitive max value". The reason being is that date values are nearly always, if looking into the future, looking into the more immediate future. Most dates start as being current which always work and some future dates for scheduled events or appointments that don't typically happen 50 years or more out. The current max value of a SMALLDATETIME is 68 years in the future which is perfectly workable in 99% of situations (again, where only the date portion is relevant to begin with). If you consider that computer science as a career is not much more than 40 years looking back, the max value for a SMALLDATETIME is more than that amount of time in the future. For any project that I am working on now, it is almost guaranteed that IF my company is still around in 68 years, by then they will have made changes to this data model and application many times over, especially given how many changes we have done in the past 5 years alone. And I cannot predict what will be available to use or if SQL Server will even still be around in 68 years so that amount of planning is wasted effort. And given that I have been thinking along these lines for 8+ years now, the industry has proven me to be correct since SQL Server, starting with 2008, offers a DATE only datatype that is perfect for this use-case and everything I designed in the past 6 years that uses SMALLDATETIME can easily be converted over (IF that is ever even necessary to begin with) and this entire time my projects have benefited from not being wasteful of the extra 4 bytes for the full DATETIME.

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 15

GPO (1/1/2011)
What my argument doesn't address here is the opportunity cost of parsimony. Does it cost more to hire competent DBAs than it does to get the same outcome through better hardware? You don't have an unlimited budget. How do you decide where the money gets spent?

I would typically err on the side of the human over the machine. Yes, the cost of the machine can be depreciated on tax returns for the company and won't take sick/vacation days but:

1) an employee can usually be replaced at the same cost whereas inadequate hardware requires additional spending and trying to sell old hardware doesn't recoup much of the cost

2) hardware has a very limited scope of what it can help. making one machine faster only effects that one machine whereas an employee can help fix an entire system and even help in areas that are not system related

3) hardware will never creatively contribute to solving problems: it can only ask for more hardware. an employee can re-architect a project to work more efficiently on the same hardware and then go on to adding new features.

Obviously there are limits. At some point a problem will simply require more hardware and no amount of re-engineering will help. But of course each situation needs to be evaluated on the merits of the particular situation as opposed to having a hard-and-fast rule. Meaning: "it depends" ;-).

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR