﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Discuss Content Posted by Solomon Rutzky / Article Discussions / Article Discussions by Author  / Disk Is Cheap!  ORLY? / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sun, 26 May 2013 00:16:43 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Jeff Moden (10/5/2012)[/b][hr]Solomon,I've recently had the need to refer to this article again and I thank you from the bottom of my heart.  This "right sizing on steroids" article should be required reading for anyone and everyone who has or will ever write the words CREATE TABLE.  My hat is off to you, good Sir.  Well done![/quote]Hi Jeff and thank you very much for such positive feedback. I apologize for the delayed response but new baby came along  less than 2 weeks before your comment and things have been, well, C-R-A-Z-Y!! :hehe: Also, I had thought of a few minor additions to make and wanted to get them in and published before replying.I have added:[ul][li]A reference to decreased Page Life Expectancy in the paragraph about memory usage[/li][li]Info about several other datatypes at the end of the main recommendations list, things like: SMALLMONEY, REAL, TIME, etc.[/li][li]Info about some newer features that can help save space [i]in addition to[/i] ([b]not[/b] in place of) a good model. Namely: SPARSE columns and Row / Page compression[/li][/ul]Hopefully the updates make for an even stronger article :-).Take care,Solomon...</description><pubDate>Mon, 05 Nov 2012 07:42:47 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Solomon,I've recently had the need to refer to this article again and I thank you from the bottom of my heart.  This "right sizing on steroids" article should be required reading for anyone and everyone who has or will ever write the words CREATE TABLE.  My hat is off to you, good Sir.  Well done!</description><pubDate>Fri, 05 Oct 2012 19:37:18 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote]  But a fixed-width column (numeric types, date/time types, and char/nchar/binary types) will always take up the same amount of space regardless if it is set to NULL or a value.  Meaning, if UpdateDate is a DATETIME field, it will always take up 8 bytes even if it is NULL.  And to make sure I did just test this out.[/quote]Hmm I wonder if that is true for Oracle, DB2, Teradata, et.al....</description><pubDate>Wed, 11 May 2011 19:52:12 GMT</pubDate><dc:creator>Todd M. Owens</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Todd M. Owens (4/27/2011)[/b][hr]And please think twice before setting the CreateDate and the UpdateDate to the same exact value on insert: this a clear violation of the limited redundancy principle.  If 90% or more of the columns are never updated, then there will be an extraneous 8 bytes on each row from the UpdateDate; multiply that by the number of rows where it should be null and in every table where it is misused, then the numbers get quite large quite quickly.[/quote]Hey Todd.  I did not notice until just a moment ago when I was working on something else that this statement is not entirely true.  While I still do agree that using NULL for UpdateDate when a record has not yet been updated is a better practice, that is merely just a logical consideration (outside of the potential indexing issue I mentioned before).  But a fixed-width column (numeric types, date/time types, and char/nchar/binary types) will always take up the same amount of space regardless if it is set to NULL or a value.  Meaning, if UpdateDate is a DATETIME field, it will always take up 8 bytes even if it is NULL.  And to make sure I did just test this out.Take care,Solomon...</description><pubDate>Wed, 11 May 2011 16:45:13 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Todd M. Owens (4/27/2011)[/b][hr]...if a table only allows inserts in a version type approach, there is no reason to add columns that capture UpdateDate and/or UpdateUserID.  If a table's use case(s) calls for a row to be updated, only then are the Update fields necessary.  And please think twice before setting the CreateDate and the UpdateDate to the same exact value on insert...If 90% or more of the columns are never updated, then there will be an extraneous 8 bytes on each row from the UpdateDate...[/quote]Hey there.  I definitely agree with both of these and practice them myself.There might be a circumstance, however, to have both CreateDate and UpdateDate be set upon initial INSERT (and hence UpdateDate will be the same value as CreateDate if the row is never updated).  I have seen cases where an ETL process looks at the UpdateDate field to know if it should grab that row and will have the UpdateDate field indexed for that purpose. In this case UpdateDate serves dual-duty by indicating both new and updated rows without having to scan both CreateDate and UpdateDate fields. And to only have a single field (assuming someone might suggest not having a CreateDate field in this case), it is still helpful to see the CreateDate since that information will be gone once the row is updated if you only have the single UpdateDate field. But outside of this specific case I tend to agree that the UpdateDate field should be NULL.Take care,Solomon...</description><pubDate>Fri, 29 Apr 2011 21:14:09 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Oh another thing that chews up space is [i]thoughtlessly [/i]adding or misusing a set of audit columns [u]to every single table[/u] regardless of how the table is used.  For example, if a table only allows inserts in a version type approach, there is no reason to add columns that capture UpdateDate and/or UpdateUserID.  If a table's use case(s) calls for a row to be updated, only then are the Update fields necessary.  And please think twice before setting the CreateDate and the UpdateDate to the same exact value on insert: this a clear violation of the limited redundancy principle.  If 90% or more of the columns are never updated, then there will be an extraneous 8 bytes on each row from the UpdateDate; multiply that by the number of rows where it should be null and in every table where it is misused, then the numbers get quite large quite quickly.</description><pubDate>Wed, 27 Apr 2011 11:31:14 GMT</pubDate><dc:creator>Todd M. Owens</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Todd M. Owens (4/26/2011)[/b][hr]Compliment: great article.Caveat: If you are in a Microsoft technology-only shop with .NET, C# and other such technologies then no worries. Otherwise be careful with using non-ANSI standard data types like "tinyint" and "bit".[/quote]Hello Todd. Thanks for the compliment :-). And thanks for mentioning the potential issue with non-standard datatypes. I do not specifically deal with that caveat for two reasons.  First, I had not thought of it since I have never run into any problems ;-). Second, I feel that by properly testing, unintended consequences from datatype decisions will be found. So if one starts out in their development environment with these changes, then any problems will not even make it to the QA environment.</description><pubDate>Tue, 26 Apr 2011 20:04:17 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>I know I am a few months late to this game, but I wanted to give you a compliment and issue a small caveat.Compliment: great article.  It is reassuring, for me anyway as an RDBMS-agnostic designer, that I wholeheartedly agree with your physical modeling approaches.  Sizing indexes by understanding column data types is critically important. Nulls matter, as does premature denormalization. I already apply 9 of the 10 suggestions in practice and all 10 in spirit, which leads me to the caveat.Caveat: If you are in a Microsoft technology-only shop with .NET, C# and other such technologies then no worries. Otherwise be careful with using non-ANSI standard data types like "tinyint" and "bit".  The issue isn't one of ANSI compliance for its own sake, but for the sake of compatibility with non-Microsoft external drivers, add-on reporting systems and integration tools.  Once again thanks for the confirmation on my physical data modeling approaches. - Todd</description><pubDate>Tue, 26 Apr 2011 14:08:07 GMT</pubDate><dc:creator>Todd M. Owens</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Fantastic article with some really neat real-world examples. Thanks for taking the time to write it! :-D</description><pubDate>Wed, 05 Jan 2011 18:52:58 GMT</pubDate><dc:creator>M. Kummerfeld</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Wish I had a few more databases of sub-billion row tables:-PThe thing that has made MPP so attractive in the BI world is the explosion in data volumes.  Not only are people wanting to analyse vast data sets but they are wanting to do it in near real time.Clickstream data for a fair sized e-commerce site will easily blast through the billion record table limit.  Anything that collects mechanical data is likely to have to deal with huge data volumes.  Joderell Bank generates 6TB/minute, or it did 3 years ago.</description><pubDate>Tue, 04 Jan 2011 12:00:10 GMT</pubDate><dc:creator>David.Poole</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]timothyawiseman (1/3/2011)[/b][hr]When I am thinking about the trade offs for creating an index, I will consider carefully how often the table is written to and the trade offs in write performance, but I won't give a second thought to the disk space because it is cheap relative to making my users wait.  I will create complicated and large indexed views in the same way.  Similarly, I tend to log procedures extensively.  This costs disk space, but that disk space is cheap compared to having a problem in production and not being able to track down in detail when, why, how, and who caused it along with enough information to know what the answer should have been.[/quote]I completely agree.  And that is the point of the article: since the table is the foundation for where that data will keep showing up (indexes, log tables, etc.) then making good choices at the beginning makes it that much easier to create whatever indexes and logging are needed.  Those are valid business reasons to use disk space and their impact to performance will be "appropriate" for the benefit of having the feature if the columns making them up were not arbitrarily wasting space to begin with.[quote]So, I agree, it was a good article with a good point and one that developers and DBAs should remember.  I just want to caution against taking it to an extreme.  Disk space should never be wasted, but at least in the situations I regularly see trading disk space for detailed logging or improved perofrmance is almost always a good trade.  Disk space is cheap enough to justify that.[/quote]Thanks.  And the intention of the article was not that anyone would go to the extreme but instead they would start thinking about things that they previously had not and in doing so make better choices.  It is not a matter of sacrificing something that is necessary; it is a matter of not doing things that are unnecessary.  I am not saying: do not use the disk.  I am saying: try to use disk wisely because it is far too easy to use it foolishly.</description><pubDate>Mon, 03 Jan 2011 12:34:34 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Charles Kincaid (1/3/2011)[/b][hr]You have to apply all of these through a filter of sense.[/quote]Of course.  That is what the article was about: making sensible/reasonable choices.</description><pubDate>Mon, 03 Jan 2011 12:23:03 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Nadrek (1/3/2011)[/b][hr]Essentially; know your system, know your data, know your bottlenecks, know the plateaus, know your growth patterns, and keep an eye on future technology.You have limited CPU, limited disk space, limited disk IO, limited memory (note: taking more space on disk also takes more space in RAM), limited time, limited concurrency, and limited humans.  All these factors must be balanced.[/quote]Yes, this is exactly the point I was making.[quote]Now, if SQL Server gave us a comprehensive range of integer datatypes, we'd be a lot better off; we should have both signed and unsigned 8, 16, 32, and 64 bit integers.  We don't.[/quote]I agree: UN-signed INTs (of the various sizes) would be great!</description><pubDate>Mon, 03 Jan 2011 12:19:49 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]rlobbe (1/1/2011)[/b][hr]What no one has mentioned is that you also have to consider the mechanics of the I/O.SQL Server is going to fetch a block (or several) at a time, not a arbitrary number of bytes.[/quote]Actually, this specifically was stated in the article.  I mentioned that smaller rows give a greater probability of more rows fitting onto a page.  However, I mentioned it from the forward-looking perspective in that using INT PKs for LookUp tables that have values that never even get close to 255 causes tables that have 5 or so FKs to various LookUp tables to have larger than necessary rows and hence fewer fit on a page.</description><pubDate>Mon, 03 Jan 2011 12:15:55 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Thomas-282729 (1/1/2011)[/b][hr]I could make the same argument against any data type. "Think of the savings you are wasting by using a 4-byte integer instead of a 2 byte integer for a primary key!" In the grander scheme of things, what matters is whether affects actual real world performance given actual data sizes. The vast majority of databases do not even contain a million rows must less 10 million in any one table and thus tiny differences in datatypes will make no perceptible difference. What does make a significant difference are data integrity rules.[/quote]All things being equal, there is no reason not to take advantage of methods to be as efficient as you can.  We are not talking about data integrity; we are talking about data validity.  If your system has good reason to make use of an INT instead of TINYINT or DATETIME instead of SMALLDATETIME then by all means do that as you would be negligent not to.  I am talking about not being wasteful.  And I do not buy the argument that the "vast majority of databases do not even contain a million rows".  To whatever extent that is true at the moment, it is a decreasing trend.[quote]This is more an argument about archive tables than it is in-use tables. I've seen a few systems that had a one or two tables in the 50 million row+ range and those were primarily archive, auditing tables or data warehouses (where you can play tricks to cut down the pointer size). However, let's not forget that archive tables serve a very different purpose than in-use tables.[/quote]Yes they have a different purposes but there are benefits to be gained by some foresight for both archive and transactional tables, as has already been discussed.[quote]When modeling, the choice of implementation data type (vs the data type in the model) is secondary. You do not care whether you use an int, smallint, tinyint or bit when modeling a database. What matters is that the value is an integer with some given range. That the DBA decides they can use a smallint instead of a 4-byte integer is done at the implementation stage. Still important, but not a part of modeling.[/quote]Data-modeling is two parts: logical then physical.  Implementation that is so far removed from the reasoning for the logical design is much more prone to error.  And again, nobody is talking about constraining the logical model to fit into the physical model.  We are talking about making good implementation choices based on the logical model and the reasoning behind the logical model.[quote][quote]A few extra minutes of work now can save many hours of work in the future so why not do it? [/quote]Because it can also cost you a significant amount in development time. If you chose a tinyint for example and after the system goes into production it is determined that this is too small, it is likely that quite a bit will have to be changed. [/quote]This is why we are talking about making reasonable choices as opposed to arbitrarily using the smallest datatypes available.  In my experience, I have spent more time reducing datatypes than increasing them because the vast majority of the time people choose INT over TINYINT to store values from 1 to 5.[quote]Completely disagree. I've run into many issues with that ridiculous max value. Example 1: you have a table that tracks when an entity is "valid". You have a start and end date. Null end date = currently available. The client side code wants to use a max datetime value to represent that some entity is valid in perpetuity. In most programming languages, that date is 9999-12-31. Can't do it with smalldatetime. You have to store a mystery constant somewhere in the middle tier code that is used to represent that value. Example 2: copyrights. 10 years ago, you might have thought it safe that no copyright could extend beyond 2079 for something today or in the past. Surprise, surprise thanks to Disney, now it is a 100 years. Example 3: contracts. Yes, I've run into contracts that specify certain rights 20 and 50 years into the future. Saving those four bytes, in the long run, simply do not provide nearly enough benefit to justify them. If you are worried about storing time values, then add a check constraint which prevents time values from being stored in that datetime field.[/quote]You only disagree because you missed the point I was making.  The point is: [b]be sensible[/b].  That means using the best datatype for the purpose.  Using an INT to store values 1 - 5 is not sensible.  Using a DATETIME to store future dates that might be 20 - 40 years out [i]is entirely sensible[/i] and nobody would begrudge you that choice.  So your examples are all instances where the business-case dictates that DATETIME is the correct choice.  Great.   This takes nothing away from what I (and others) have been saying.  And that you found an edge-case where copyright law was changed is just that: an edge-case.  And if there is some distinction between "currently available" and "always available", then again that is either a good case for DATETIME or maybe a different modeling choice to begin with.[quote]As for conversion, you better hope that your ORM does not account for the difference between smalldatetime and datetime. If so, that means additional unit testing to ensure that the change of the datatype will not break the existing code or that the code accommodates the expanded values.[/quote]1) People use ORMs because they provide an easy data-layer.  It takes little to no effort to update the ORM model.2) Why would you even mention testing in the sense that it might not happen?  Don't all changes get tested?  Not only was the article about future planning which implies that it is up-front work that is already being tested, even if we go back to make changes the assumption is that regression testing is still be done.  Regression (not Unit) testing is assumed to be part of a project whether it is a new feature or updated feature (which includes changes to datatypes).</description><pubDate>Mon, 03 Jan 2011 12:10:26 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>I have to agree with you.  Taking steps to ensure that disk space is not wasted is definitely beneficial.But there is a counterbalancing point: Disk space is normally cheaper than problems that can be solved by sacraficing disk space.  I hate wasting disk space and the increases in IO time that it brings, but I will trade vast swaths of disk space for even very small performance gains and normally say it is a good trade.  When I am thinking about the trade offs for creating an index, I will consider carefully how often the table is written to and the trade offs in write performance, but I won't give a second thought to the disk space because it is cheap relative to making my users wait.  I will create complicated and large indexed views in the same way.  Similarly, I tend to log procedures extensively.  This costs disk space, but that disk space is cheap compared to having a problem in production and not being able to track down in detail when, why, how, and who caused it along with enough information to know what the answer should have been.So, I agree, it was a good article with a good point and one that developers and DBAs should remember.  I just want to caution against taking it to an extreme.  Disk space should never be wasted, but at least in the situations I regularly see trading disk space for detailed logging or improved perofrmance is almost always a good trade.  Disk space is cheap enough to justify that.</description><pubDate>Mon, 03 Jan 2011 11:38:37 GMT</pubDate><dc:creator>timothyawiseman</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Points made by Thomas-282729  and others are valid for consideration.  You have to apply all of these through a filter of sense.  The other point that was good is "does this get me an extra row per page".The the other thing that has not been mentioned is those of us stuck on the [b][i][u]low[/u][/i][/b] end of the spectrum.  Mobile.  We have less space than a postage stamp and processors with less power than a one horse sleigh.  If we are not engineered to the max the user will throw the application (and the device) in the trash.  We will not have a billion rows in any table, that is true.  We don't have a billion of anything.  We do have compatibility constraints as often we are at the mercy of "how much code do you write for Sync Services".Scalability is a big consideration at both ends of the spectrum and in the middle too.</description><pubDate>Mon, 03 Jan 2011 10:53:18 GMT</pubDate><dc:creator>Charles Kincaid</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Amen!  I hear that "disk is cheap" all the time from developers. Laziness keeps prevailing where I work, and I'm doing my best to try and make developers aware of the reasons behind watching datatypes as well as nullable fields.  I believe I will forward this article to all developers.</description><pubDate>Mon, 03 Jan 2011 08:56:58 GMT</pubDate><dc:creator>amarshall-568002</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Essentially; know your system, know your data, know your bottlenecks, know the plateaus, know your growth patterns, and keep an eye on future technology.You have limited CPU, limited disk space, limited disk IO, limited memory (note: taking more space on disk also takes more space in RAM), limited time, limited concurrency, and limited humans.  All these factors must be balanced.Know your bottlenecks, but don't waste anything. If you want an 8k row tally table, use smallint, not int.  If you have a date dimension table that covers SMALLDATETIME, every single day can be represented by a unique SMALLINT value, starting at -32768 for Jan 1, 1900.Note: It's not uncommon for a 146GB 15k FC "upgrade" disk to cost right around $1000... each.  Not counting the tray cost.  Not counting the cabinet cost.  Not counting license costs for the SAN to be allowed to see it.  Not counting parity/mirror disk costs.  Not counting hot spare costs.  Not counting cold spare costs.  Not counting electricity use.  Not counting heat generation.  Plateaus are critical: Adding another couple drives is "easy" and "cheap"... until you're full, at which time you may need a new tray, which needs a new cabinet, which needs more floor space in the server room, and which needs a new electrical circuit, which needs a new enterprise UPS, which needs a new generator... and the additional heat load needs a new air conditioning unit.  Did we mention the fire suppression system, too?Now, if SQL Server gave us a comprehensive range of integer datatypes, we'd be a lot better off; we should have both signed and unsigned 8, 16, 32, and 64 bit integers.  We don't.</description><pubDate>Mon, 03 Jan 2011 08:42:18 GMT</pubDate><dc:creator>Nadrek</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Excellent!</description><pubDate>Mon, 03 Jan 2011 06:18:12 GMT</pubDate><dc:creator>anders-731262</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Yes, sorry Jeffrey, a heap index, not a hash.</description><pubDate>Sun, 02 Jan 2011 19:14:37 GMT</pubDate><dc:creator>JQAllen</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]pro-1019688 (1/2/2011)[/b][hr]Someone mentioned in an earlier post that all tables must have a cluster definition.  This is not true.  It's called a hash table.[/quote]Actually - it is called a heap, but that is just words. ;)</description><pubDate>Sun, 02 Jan 2011 15:22:17 GMT</pubDate><dc:creator>Jeffrey Williams 3188</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Someone mentioned in an earlier post that all tables must have a cluster definition.  This is not true.  It's called a hash table.</description><pubDate>Sun, 02 Jan 2011 13:33:41 GMT</pubDate><dc:creator>JQAllen</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>What no one has mentioned is that you also have to consider the mechanics of the I/O.SQL Server is going to fetch a block (or several) at a time, not a arbitrary number of bytes.What was discussed is valid, but if the cumulative byte savings, per row, don't get an extra row (or more) into a block. Your savings are Zero.A complete block will be read, whether it's full or not.The advice in indexes is still valid, you can almost certainly get extra index entries into an index block.</description><pubDate>Sat, 01 Jan 2011 16:24:38 GMT</pubDate><dc:creator>rlobbe</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote]1) Please keep in mind that this issue is a compounded one since a field can show up in other tables as well as using memory, CPU, network, log space, etc.  We are not talking about one field in one table here.  If we were, then maybe I would agree even if we had 500 million rows (although I would still try to model it properly if it is a new project but I wouldn't worry about it for an existing system).  When people don't pay attention to datatypes they generally carry that idea onto many fields in many tables so even if you don't have many large tables you still have many fields across many tables that could be taking up resources unnecessarily.[/quote]I could make the same argument against any data type. "Think of the savings you are wasting by using a 4-byte integer instead of a 2 byte integer for a primary key!" In the grander scheme of things, what matters is whether affects actual real world performance given actual data sizes. The vast majority of databases do not even contain a million rows must less 10 million in any one table and thus tiny differences in datatypes will make no perceptible difference. What does make a significant difference are data integrity rules. [quote]2) Regarding the idea that most database do not get to the hundreds of millions of rows, I think that is an issue of looking back 20 years as opposed to ahead 20 years. I certainly respect your experience in the field but I think it is becoming clear that more and more applications are becoming highly data intensive and given that storing large volumes of data today is much more feasible and economical than 10 - 20 years ago, and that BI and data analysis is becoming more popular, I think we are already seeing a growing trend in data retention.  And regardless of table size, if operations on a table take longer due to additional resource consumption then that does affect the user experience and more applications these days are web-based with users expecting instantaneous response times and they don’t care if there are 1000 other concurrent users on the system who might be hitting the same table(s).[/quote]This is more an argument about archive tables than it is in-use tables. I've seen a few systems that had a one or two tables in the 50 million row+ range and those were primarily archive, auditing tables or data warehouses (where you can play tricks to cut down the pointer size). However, let's not forget that archive tables serve a very different purpose than in-use tables. Again, I see medium size company databases regularly and rarely do I see tables even remotely close to a million rows and almost never if you exclude archive tables.  [quote]3) Regardless of how many rows a table might have over a 5 or 10 year period, if I have the chance to model something properly in the beginning then I will certainly do so because there is a very minimal extra cost in terms of me thinking about the problem and maybe asking a few more questions of the person requesting the feature. [/quote]When modeling, the choice of implementation data type (vs the data type in the model) is secondary. You do not care whether you use an int, smallint, tinyint or bit when modeling a database. What matters is that the value is an integer with some given range. That the DBA decides they can use a smallint instead of a 4-byte integer is done at the implementation stage. Still important, but not a part of modeling.[quote]A few extra minutes of work now can save many hours of work in the future so why not do it? [/quote]Because it can also cost you a significant amount in development time. If you chose a tinyint for example and after the system goes into production it is determined that this is too small, it is likely that quite a bit will have to be changed. [quote]And who can predict that their project will never grow to such sizes?[/quote]There is a cost to such an assumption. Why not assume your database could grow to 1 trillion exabytes? The obvious answer is that to build a system to scale to those dimensions would require an enormous amount of effort even though the probability that the system could grow to those levels is remote. Yes, we should make *reasonable* assumptions about growth, accounting for archival processes, and in most systems I have seem that is far less than billions or even hundreds of millions of rows.[quote]4) I don't think it is entirely relevant to state that we now have 100 or even 1000 times more efficient hardware when the financial resources to get such hardware are not infinite.  Yes, there will always be faster CPUs and more memory to add but that doesn't mean those budget items will be approved.  That is a very short-sighted way of looking at this issue: a vendor selling the proper hardware in no way implies my ability to purchase it to solve my problems.  [/quote]You are also discounting cost. In addition to more powerful computers, cost of that power generally drops. Thus, it is likely that there is a system that could be built today that would scale your database to 100 million rows but ten years from now that system might cost the same as workstation today. I'm not saying that developers should entirely discount performance related to data types but I am saying that I would prefer they focus more on data integrity and good relation design than on minor performance boosts from say using a smalldatetime vs a datetime.[quote]5) DATEIME2 datatype does not replace GUIDs since GUIDs are used as record-locators and I cannot see that happening with datatime data even if it can point to a unique record[/quote]A significant reason against using DateTime in the past is that its resolution was far too low (1/3 of a millisecond) to work in heavy transaction systems. However, at nanosecond resolution, you would be hard pressed to intentionally create dups. A big reason that guids are used over integers is the ability to generate them on the client instead of requiring a round trip that Identity values require. DateTime2 could work well enough for that purpose in half the size. I haven't personally built a system using a DateTime2 as a PK but I can see the potential.[quote]6) Using SMALLDATETIME when the time itself is not wanted is nearly always a good idea and the limit of year 2079 is not much of an issue as a "non-intuitive max value".  The reason being is that date values are nearly always, if looking into the future, looking into the more immediate future.  Most dates start as being current which always work and some future dates for scheduled events or appointments that don't typically happen 50 years or more out.  The current max value of a SMALLDATETIME is 68 years in the future which is perfectly workable in 99% of situations (again, where only the date portion is relevant to begin with).  If you consider that computer science as a career is not much more than 40 years looking back, the max value for a SMALLDATETIME is more than that amount of time in the future. For any project that I am working on now, it is almost guaranteed that IF my company is still around in 68 years, by then they will have made changes to this data model and application many times over, especially given how many changes we have done in the past 5 years alone. And I cannot predict what will be available to use or if SQL Server will even still be around in 68 years so that amount of planning is wasted effort.  And given that I have been thinking along these lines for 8+ years now, the industry has proven me to be correct since SQL Server, starting with 2008, offers a DATE only datatype that is perfect for this use-case and everything I designed in the past 6 years that uses SMALLDATETIME can easily be converted over (IF that is ever even necessary to begin with) and this entire time my projects have benefited from not being wasteful of the extra 4 bytes for the full DATETIME.[/quote]Completely disagree. I've run into many issues with that ridiculous max value. Example 1: you have a table that tracks when an entity is "valid". You have a start and end date. Null end date = currently available. The client side code wants to use a max datetime value to represent that some entity is valid in perpetuity. In most programming languages, that date is 9999-12-31. Can't do it with smalldatetime. You have to store a mystery constant somewhere in the middle tier code that is used to represent that value. Example 2: copyrights. 10 years ago, you might have thought it safe that no copyright could extend beyond 2079 for something today or in the past. Surprise, surprise thanks to Disney, now it is a 100 years. Example 3: contracts. Yes, I've run into contracts that specify certain rights 20 and 50 years into the future. Saving those four bytes, in the long run, simply do not provide nearly enough benefit to justify them. If you are worried about storing time values, then add a check constraint which prevents time values from being stored in that datetime field. As for conversion, you better hope that your ORM does not account for the difference between smalldatetime and datetime. If so, that means additional unit testing to ensure that the change of the datatype will not break the existing code or that the code accommodates the expanded values.</description><pubDate>Sat, 01 Jan 2011 15:41:07 GMT</pubDate><dc:creator>Thomas-282729</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Thomas-282729 (1/1/2011)[/b][hr]Inefficient with your data types is a relative problem. Is storing the year out to its full four digits inefficient? It was 40 years ago. You are discounting development time, maintenance costs and most importantly hardware scale.[/quote]Inefficiency is a relative problem only if we are talking about radically different system models.  A desktop application that holds data for one person is far different than the growing trend of web-based applications holding data for everyone in a shared setting.  You are missing the point in that I am advocating people take a few moments here and there to think more about what they are doing going forward, not looking for stuff to fix.  By spending a few extra minutes here and there a company can greatly reduce maintenance and hardware costs with a minimum of development time.  Using larger than necessary datatypes for everything is just plain lazy and expecting hardware to catch up to hasty decisions only works if you have the money to spend on new hardware.And your example of the Y2K issue: that could have been handled a lot better and was much less of an issue than was originally thought.  Also, not storing the full 4 digits produced a possibility of incorrect computations on seemingly correct data whereas using a SMALLDATETIME and not being able to go beyond 2079 is simply a matter of potentially not being able to enter in a valid piece of data as opposed to entering in something that looks correct but might someday produce an erroneous result when doing date arithmetic on it.  So this is moot point.</description><pubDate>Sat, 01 Jan 2011 15:22:48 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Steve Jones - SSC Editor (1/1/2011)[/b][hr]CPU power has increased, but a great talk from Dr. David Dewitt in 2009 at the PASS Summit showed that disk xfer rates have not. The capacity has, but the IOPS is not keeping pace withe everything else.So you consider that disk matters as a [i]piece[/i] of your design. Not necessarily the most important part, but a piece, which is what I think Solomon was getting at.[/quote]Yes, this is it exactly.  I am saying that we cannot [i]ignore[/i] disk space usage as a relevant factor to system design as so many people often do because of this misperception that "disk is cheap".  And I even said in the article that GUIDs sometimes are very valid, just 99% of the time NOT for Clustered Indexes and/or PKs.  Good design takes into account as much of the environment in which the system exists as possible.Thanks Steven for clarifying this.  I think I should add this into any revision of this article.</description><pubDate>Sat, 01 Jan 2011 15:08:43 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote]If you ask your manager whether she'd rather wait 10 minutes for her answer, or 10 seconds, she's going to go for the 10 second option. [/quote]Of course this is a misrepresentation. Here, I can do that too: "Would you rather wait only 10 seconds some time in the future for a query to complete and that will cost you 1000 hours of additional development time at X rate per hour now or would you rather take a small risk that your query might take 10 minutes in the future because the hardware has not caught up to the processing or the hardware currently in use at that time is insufficient but save that money now?" I bet you most managers would rather save money now that maybe save money in the future.Designed correctly you can get comparable performance to an integer using guids that will scale well in most cases. However, most opponents of guids discount a COMB guid in their analysis of performance.[quote]There are bigger questions to answer. It doesn't matter what the hardware status quo is at any given point in time, if you're inefficient with your data, retrieval, maintenance and analysis will take longer than they should. The organization down the road that is competing with you, and who DOES take parsimony seriously, will have a competitive advantage (all else being equal). [/quote]Inefficient with your data types is a relative problem. Is storing the year out to its full four digits inefficient? It was 40 years ago. You are discounting development time, maintenance costs and most importantly hardware scale.</description><pubDate>Sat, 01 Jan 2011 14:59:21 GMT</pubDate><dc:creator>Thomas-282729</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>CPU power has increased, but a great talk from Dr. David Dewitt in 2009 at the PASS Summit showed that disk xfer rates have not. The capacity has, but the IOPS is not keeping pace withe everything else.Therefore it's not that a GUID v int makes a difference everywhere, but it can add up. It is the micro-second differences across your whole system can add up.Does this mean you should never use GUIDs? No, it means that you shouldn't blindly assume a GUID is needed and your hardware will keep up. In most systems it can cover poor design and coding, but in quite a few, it doesn't. So you consider that disk matters as a [i]piece[/i] of your design. Not necessarily the most important part, but a piece, which is what I think Solomon was getting at.</description><pubDate>Sat, 01 Jan 2011 14:58:06 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]GPO (1/1/2011)[/b][hr]What my argument doesn't address here is the opportunity cost of parsimony. Does it cost more to hire competent DBAs than it does to get the same outcome through better hardware? You don't have an unlimited budget. How do you decide where the money gets spent?   [/quote]I would typically err on the side of the human over the machine.  Yes, the cost of the machine can be depreciated on tax returns for the company and won't take sick/vacation days but:1) an employee can usually be replaced at the same cost whereas inadequate hardware requires additional spending and trying to sell old hardware doesn't recoup much of the cost2) hardware has a very limited scope of what it can help.  making one machine faster only effects that one machine whereas an employee can help fix an entire system and even help in areas that are not system related3) hardware will never creatively contribute to solving problems: it can only ask for more hardware.  an employee can re-architect a project to work more efficiently on the same hardware and then go on to adding new features.Obviously there are limits.  At some point a problem will simply require more hardware and no amount of re-engineering will help.  But of course each situation needs to be evaluated on the merits of the particular situation as opposed to having a hard-and-fast rule.  Meaning: "it depends" ;-).</description><pubDate>Sat, 01 Jan 2011 14:41:14 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Thomas-282729 (1/1/2011)[/b][hr]What is never mentioned in the article is "Scale to what point?". Let me demonstrate: You are using a bigint for a primary key on a table with a billion rows. Think of the savings if you used a 4-byte int instead! A billion rows? Really? Raise your hand if you have encountered a table with a billion rows. Raise the other if it is has happened at more than handful of companies. In your example, you have 100 million rows. Really? I've been working on many databases over the past two decades and in only a handful of situations have I ever run across a table with more than 50 million rows and often they were archive tables. If you truly expect to scale out to those dimensions, then a 4-byte integer might be too tight and yes you need to pay more attention to the sizes of the data types. However, most databases are not going to scale out to those dimensions. The arguments against guids were absolutely applicable 20 years ago, less applicable 10 years ago, nearly irrelevant now and will eventually be entirely irrelevant. Who cares that a guid takes up 4x the space if you have 100x the processing power and memory? What about 1000x? At some point, that additional space becomes irrelevant. 30 years ago people fretted over using a byte to store the year instead of two. Today, such an optimization would be considered silly. I suspect that if MS supported a datetime datatype with nanosecond resolution a two decades ago, that guids would never be used.In general, the argument "space is cheap" is to oppose premature optimization. However, premature optimization is not the same thing as ignoring data integrity. You shouldn't use a nvarchar(max) for a first name column, not because the amount of space it consumes, but rather because a user will eventually stuff a copy of War and Peace in there. Similarly, using a smalldatetime to save four bytes is often a mistake because of its non-intuitive maximum value.[/quote]I hear what you are saying but I still disagree since I think we are using different assumptions.1) Please keep in mind that this issue is a compounded one since a field can show up in other tables as well as using memory, CPU, network, log space, etc.  We are not talking about one field in one table here.  If we were, then maybe I would agree even if we had 500 million rows (although I would still try to model it properly if it is a new project but I wouldn't worry about it for an existing system).  When people don't pay attention to datatypes they generally carry that idea onto many fields in many tables so even if you don't have many large tables you still have many fields across many tables that could be taking up resources unnecessarily.2) Regarding the idea that most database do not get to the hundreds of millions of rows, I think that is an issue of looking back 20 years as opposed to ahead 20 years. I certainly respect your experience in the field but I think it is becoming clear that more and more applications are becoming highly data intensive and given that storing large volumes of data today is much more feasible and economical than 10 - 20 years ago, and that BI and data analysis is becoming more popular, I think we are already seeing a growing trend in data retention.  And regardless of table size, if operations on a table take longer due to additional resource consumption then that does affect the user experience and more applications these days are web-based with users expecting instantaneous response times and they don’t care if there are 1000 other concurrent users on the system who might be hitting the same table(s).3) Regardless of how many rows a table might have over a 5 or 10 year period, if I have the chance to model something properly in the beginning then I will certainly do so because there is a very minimal extra cost in terms of me thinking about the problem and maybe asking a few more questions of the person requesting the feature.  A few extra minutes of work now can save many hours of work in the future so why not do it?  And who can predict that their project will never grow to such sizes?  We don't know right now how many customers we will have and how many new features will require additional rows and/or tables, but if I can say that it is nearly impossible to have more than 5 status values, then it doesn't matter if it becomes 10 in the future because a TINYINT will always work and the resources saved in this instance can be used elsewhere as we acquire more customers and more data.  And if the data size never grows then what was the cost?  A few extra minutes in the beginning?4) I don't think it is entirely relevant to state that we now have 100 or even 1000 times more efficient hardware when the financial resources to get such hardware are not infinite.  Yes, there will always be faster CPUs and more memory to add but that doesn't mean those budget items will be approved.  That is a very short-sighted way of looking at this issue: a vendor selling the proper hardware in no way implies my ability to purchase it to solve my problems.  And even if I did have the money to spend on a faster computer, why would I do that if I didn't need to?  That is just throwing good money away and I would rather spend an extra 5 - 10 minutes at the beginning of a project to save the company $50k in the future that can instead be used for my bonus.5) DATEIME2 datatype does not replace GUIDs since GUIDs are used as record-locators and I cannot see that happening with datatime data even if it can point to a unique record6) Using SMALLDATETIME when the time itself is not wanted is nearly always a good idea and the limit of year 2079 is not much of an issue as a "non-intuitive max value".  The reason being is that date values are nearly always, if looking into the future, looking into the more immediate future.  Most dates start as being current which always work and some future dates for scheduled events or appointments that don't typically happen 50 years or more out.  The current max value of a SMALLDATETIME is 68 years in the future which is perfectly workable in 99% of situations (again, where only the date portion is relevant to begin with).  If you consider that computer science as a career is not much more than 40 years looking back, the max value for a SMALLDATETIME is more than that amount of time in the future. For any project that I am working on now, it is almost guaranteed that IF my company is still around in 68 years, by then they will have made changes to this data model and application many times over, especially given how many changes we have done in the past 5 years alone. And I cannot predict what will be available to use or if SQL Server will even still be around in 68 years so that amount of planning is wasted effort.  And given that I have been thinking along these lines for 8+ years now, the industry has proven me to be correct since SQL Server, starting with 2008, offers a DATE only datatype that is perfect for this use-case and everything I designed in the past 6 years that uses SMALLDATETIME can easily be converted over (IF that is ever even necessary to begin with) and this entire time my projects have benefited from not being wasteful of the extra 4 bytes for the full DATETIME.</description><pubDate>Sat, 01 Jan 2011 14:31:45 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>@Thomas-282729[quote]Who cares that a guid takes up 4x the space if you have 100x the processing power and memory? What about 1000x?[/quote]The argument that it's OK for data to take up more space than it should because we have the processing power and memory to handle it, discounts the question of what we're actually going to do with the data we're collecting. Thirty years ago some types of analysis might simply have not been an option to use on a day-to-day basis. You would have been telling your manager "If you want that question answered we're going to have to run it over a weekend. And we have a queue of people wanting stuff processed on weekends so you'll need to fill out this form..." Now we have the extra memory and processing power we can get our hands on answers in milliseconds, that we'd waste hours on years ago... provided we're efficient in how we set up our data. If you ask your manager whether she'd rather wait 10 minutes for her answer, or 10 seconds, she's going to go for the 10 second option. The other thing I've noticed over the years is that data doesn't really answer questions, it merely creates a framework for more detailed questions. As soon as you give the executive their dashboard of KPIs, they are going to ask "Why is indicator X falling?" We give ourselves more chance of quickly getting into the detail if we've been efficient with our data.Thirty years ago it might have taken an organization say 10 hours to answer question x. Question x was a basic fundamental question that had to be answered. Thirty years later, the organisation has 5 times the number of employees it had back then. It has hundreds and hundreds of times more data (perhaps billions of times more data). Back then very little was collected electronically. There were rooms full of paper files instead. Now data is collected on every imaginable aspect of the organization's life. Question X from 30 years ago is passe.  There are bigger questions to answer. It doesn't matter what the hardware status quo is at any given point in time, if you're inefficient with your data, retrieval, maintenance and analysis will take longer than they should. The organization down the road that is competing with you, and who DOES take parsimony seriously, will have a competitive advantage (all else being equal). What my argument doesn't address here is the opportunity cost of parsimony. Does it cost more to hire competent DBAs than it does to get the same outcome through better hardware? You don't have an unlimited budget. How do you decide where the money gets spent?   </description><pubDate>Sat, 01 Jan 2011 13:52:47 GMT</pubDate><dc:creator>GPO</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>What is never mentioned in the article is "Scale to what point?". Let me demonstrate: You are using a bigint for a primary key on a table with a billion rows. Think of the savings if you used a 4-byte int instead! A billion rows? Really? Raise your hand if you have encountered a table with a billion rows. Raise the other if it is has happened at more than handful of companies. In your example, you have 100 million rows. Really? I've been working on many databases over the past two decades and in only a handful of situations have I ever run across a table with more than 50 million rows and often they were archive tables. If you truly expect to scale out to those dimensions, then a 4-byte integer might be too tight and yes you need to pay more attention to the sizes of the data types. However, most databases are not going to scale out to those dimensions. The arguments against guids were absolutely applicable 20 years ago, less applicable 10 years ago, nearly irrelevant now and will eventually be entirely irrelevant. Who cares that a guid takes up 4x the space if you have 100x the processing power and memory? What about 1000x? At some point, that additional space becomes irrelevant. 30 years ago people fretted over using a byte to store the year instead of two. Today, such an optimization would be considered silly. I suspect that if MS supported a datetime datatype with nanosecond resolution a two decades ago, that guids would never be used.In general, the argument "space is cheap" is to oppose premature optimization. However, premature optimization is not the same thing as ignoring data integrity. You shouldn't use a nvarchar(max) for a first name column, not because the amount of space it consumes, but rather because a user will eventually stuff a copy of War and Peace in there. Similarly, using a smalldatetime to save four bytes is often a mistake because of its non-intuitive maximum value.</description><pubDate>Sat, 01 Jan 2011 12:44:20 GMT</pubDate><dc:creator>Thomas-282729</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]dpersson-635827 (1/1/2011)[/b][hr]You lost me at a few minutes vs. a couple of seconds.  If that were truly the case you would have to add an index no matter what, a tinyint to a bigint is not going to cause that kind of slowdown.  The one area you failed to mention is versatility.  Most applications are tightly coupled with the underlying data.  If I have a username field set to 20 characters to save space, when I need to increase this field, it's not simply a matter of altering the table.  Every application using this data will have to be checked, and unit test will need to be updated(not fun).  Furthermore, the fragmentation caused by altering a column would more than cancel out the savings of making it too small to begin with.  You can always add/update/column to help keep pages in order, but that's easier said than done.  If the column is indexed, primary key, large table... it could bring a website down, and any maintenance window is too much for management.[/quote]Please keep in mind that the article was more so about planning how you approach new modeling as opposed to going back and fixing existing issues. My examples were cases where I did go back to fix issues because these ideas were not kept in mind and that did cause issues that warranted the time spent on the changes.Also keep in mind that we are not talking about an isolated table but instead the overall idea of how a system is modeled so it would be many cases of BIGINT vs. INT.  And these decisions affect many areas which do indeed affect performance.   Since all parts of the system are affected together the best approach to modeling is a holistic one.About your particular example:1) If a change needs to be made for performance then it doesn't matter how tightly coupled the app is since new features could also require the same amount of work and testing.2) There is no need to set a variable length column to any length just to save space since the actual bytes used are all that matters (in this case; I am ignoring the space left on the row for additional columns since we are not talking about that here).  Meaning, a VARCHAR(20) field takes as much disk space as a VARCHAR(50) field since a 15 byte value is 15 bytes no matter how large of a value can be accepted for the field.3) If you need to increase a variable length field, then yes, it is "mostly" a matter of altering the table as increasing a variable length field is a meta-data only operation.  The existing data doesn't change.  The only other thing to change could be the UI if there is JavaScript validation or a MaxSize option set on the form field.  And you might need to update input parameters to any Stored Procs that accept data from the application so they don't truncate the data coming in.4) Unit tests do not need to be updated (unless the Business Layer code is doing field size validation) but this is not related to the database since Unit Tests do not hit a database or any external resource.  If they do then they are not Unit Tests.  Validation Tests might need to be updated, but this is part of any change to a system so no more work than any other new or changed feature.5) Fragmentation is a non-issue since again increasing the size of a variable width column is a meta-data only operation that happens instantly.  And if any operation does cause fragmentation that would be handled by normal index maintenance that should already be happening.6) If table changes are necessary and would cause a table lock (decreasing a variable size field or changing a fixed-width datatype) then you can: create a copy of the table with the ideal structure, create a job to pull the existing data over in small batches over a period of time, create a trigger to sync INSERTs and UPDATEs, and when done you simply do an sp_rename on the objects and then a sp_recompile on the table to make sure everything touching that table is updated.  We do this all of the time where I work for tables with hundreds of millions of rows (50 - 100 GB) and we don't need more than a few seconds to make the swap (and drop the job) so we don't need any maintenance &amp;#119;indow.Again, the main point of the article was not that everyone needs to go back and reduce datatype sizes of existing systems but to better plan future projects to reduce the need for such avoidable conversations about the cost of going back to fix problems when you find yourself in the middle of one.</description><pubDate>Sat, 01 Jan 2011 12:12:13 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>You lost me at a few minutes vs a couple of seconds.  If that were truly the case you would have to add an index no matter what, a tinyint to a bigint is not going to cause that kind of slowdown.  The one area you failed to mention is versatility.  Most applications are tightly coupled with the underlying data.  If I have a username field set to 20 characters to save space, when I need to increase this field, it's not simply a matter of altering the table.  Every application using this data will have to be checked, and unit test will need to be updated(not fun).  Furthermore, the fragmentation caused by altering a column would more than cancel out the savings of making it too small to begin with.  You can always add/update/column to help keep pages in order, but that's easier said then done.  If the column is indexed, primary key, large table... it could bring a website down, and any maintenance window is too much for management.  Save yourself a future headache, besides disk is cheap.</description><pubDate>Sat, 01 Jan 2011 11:00:18 GMT</pubDate><dc:creator>dpersson-635827</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]TheSQLGuru (1/1/2011)[/b][hr][quote][b]Steve Jones - SSC Editor (12/31/2010)[/b][hr]I believe the clustered key is always there. Even if you have a unique NC index, the clustering key must be there so that you can look up the actual data. If the clustered key column is a part of the NC columns, then the key is there. It's not stored twice, but it is in there.[/quote]I agree Steve.  I read the referenced blog post (perhaps too quickly), but I didn't see anywhere a mention about any case where the clustering key isn't carried on nonclustered indexes.[/quote]I think there was a misunderstanding.  I was speaking in terms of additional disk space usage so to me the term "is there" meant "is there taking up additional space", such as would be the case if it were stored twice.   I was just speaking in terms of how to calculate the size of the resulting index. So now it seems that we actually do agree.  Sorry for the confusion.</description><pubDate>Sat, 01 Jan 2011 09:39:50 GMT</pubDate><dc:creator>Solomon Rutzky</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]GPO (12/31/2010)[/b][hr]This sort of thing does have its amusing side sometimes. An organization I used to work for hired a data warehousing "expert" to fix the mess their warehouse was in. After many months of non-delivery and reporting queries taking 30 minutes to execute etc he decided that all SQL-based performance options had been exhausted. What we needed, he explained, was a "hardware solution". He put together an extravagant wishlist and complained to his boss that his failure to deliver was down to his boss's failure to provide him with the right components. So his boss filled the wishlist down to the last screw and presented him with it. This effectively took away any excuses for ongoing failure. Lo and behold, the hardware made very little difference to performance and rendered his position in the organization almost untenable. Certainly destroyed his credibility. The take-home message for me was that while hardware may be continually falling in price, be careful what you wish for. You'd better be damn sure hardware IS the problem before you go blaming it.[/quote]There are definitely a WIDE range of consultants out there and I think that many of them have no business calling themselves consultants, and they give those of us who DO know what we are doing a bad rap.  I have been called in to clean up the messes left behind by quite a few of them.  I am "The Wolf" when it comes to the SQL Server relational engine.  :-D</description><pubDate>Sat, 01 Jan 2011 08:11:38 GMT</pubDate><dc:creator>TheSQLGuru</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Steve Jones - SSC Editor (12/31/2010)[/b][hr][quote][b]Solomon Rutzky (12/31/2010)[/b][hr][quote][b]Now, Steve mentioned "Regardless of what columns are in the nonclustered index, the clustering key is there" which is not always true. It is most often true, but not for the cases of a unique index or if the NC Index contains the Clustered Index Key.  Craig Freedman does a good job of showing all of these cases in the following blog:[url]http://blogs.msdn.com/b/craigfr/archive/2006/06/30/652639.aspx[/url][/quote]I believe the clustered key is always there. Even if you have a unique NC index, the clustering key must be there so that you can look up the actual data. If the clustered key column is a part of the NC columns, then the key is there. It's not stored twice, but it is in there.[/quote]I agree Steve.  I read the referenced blog post (perhaps too quickly), but I didn't see anywhere a mention about any case where the clustering key isn't carried on nonclustered indexes.</description><pubDate>Sat, 01 Jan 2011 08:07:17 GMT</pubDate><dc:creator>TheSQLGuru</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>Excellent article. This is how I design table fields for over ten years.  I did expect that everybody uses the arguments for this parsimonious approach, until I found that associates used int for fields having only two or three values. Indeed, "Disk Is Cheap"-like arguments were used not to follow my minimal approach.Thanks for the article. Now I feel less alone.M.vr. gr., Leendert.</description><pubDate>Sat, 01 Jan 2011 05:10:07 GMT</pubDate><dc:creator>Leendert van Staalduinen</dc:creator></item><item><title>RE: Disk Is Cheap!  ORLY?</title><link>http://www.sqlservercentral.com/Forums/Topic1041261-403-1.aspx</link><description>[quote][b]Solomon Rutzky (12/31/2010)[/b][hr][quote][b]Now, Steve mentioned "Regardless of what columns are in the nonclustered index, the clustering key is there" which is not always true. It is most often true, but not for the cases of a unique index or if the NC Index contains the Clustered Index Key.  Craig Freedman does a good job of showing all of these cases in the following blog:[url]http://blogs.msdn.com/b/craigfr/archive/2006/06/30/652639.aspx[/url][/quote]I believe the clustered key is always there. Even if you have a unique NC index, the clustering key must be there so that you can look up the actual data. If the clustered key column is a part of the NC columns, then the key is there. It's not stored twice, but it is in there.</description><pubDate>Fri, 31 Dec 2010 17:32:03 GMT</pubDate><dc:creator>Steve Jones - SSC Editor</dc:creator></item></channel></rss>