Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Disk Is Cheap! ORLY? Expand / Collapse
Author
Message
Posted Friday, December 31, 2010 6:42 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, January 31, 2013 8:01 AM
Points: 1,228, Visits: 1,046
I only hear this naive argument from someone that has not seen the price of server class storage. When it comes to Enterprise class systems, nothing is cheap.
Post #1041367
Posted Friday, December 31, 2010 7:34 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 2:25 AM
Points: 9, Visits: 208
The article is pretty sweet, I'd just like to mention an error on the part refering to SMALLDATETIME... this data type does not discard time data; it's just more imprecise and covers a smaller date range:

http://msdn.microsoft.com/en-us/library/ms187819(v=SQL.90).aspx

Post #1041389
Posted Friday, December 31, 2010 8:27 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, September 11, 2013 7:07 AM
Points: 94, Visits: 111
I find it amusing that he bothered to mention using char instead of varchar, but never mentioned that using nvarchar, or nchar, effectively doubles how much space is being used for that column.

Also, using char is only ideal when you know you have a consistent data length, otherwise you're throwing bytes into the wind again by not using varchar (which only takes up as much space as the data being stored, so if you have varchar(100) but only using 10 characters, only 12 [length of string + 2] bytes is used). if i had a char column and only use 10 spaces, the datatype dictates that the end have spaces, thus forcing it to 100 bytes.
Post #1041399
Posted Friday, December 31, 2010 8:46 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 13, 2014 2:03 PM
Points: 3, Visits: 89
Great Article. As someone who started out using packed data on old IBM systems to save room, even I can get lazy about selecting the right column type.

Can someone clarify one sentence in the article:

>>
So you need to keep in mind things like (for SQL Server at least) NonClustered indexes typically contain the data from the Clustered index (exceptions are: Unique Indexes and NonClustered Indexes that contain the Cluster Key)
<<

I just reviewed the structures of non clustered indexes (http://msdn.microsoft.com/en-us/library/ms177484.aspx) and from what I see, it does not store the clustered index data. Instead of a RID (used when the table is a heap), a NonClustered index points to the clustered index key.

From same article as above.

>>
If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
<<

Can someone show me how a NonClustered index size (disk cost) is affected by the Clusterd Index Size?

Thanks,

Mark
Post #1041408
Posted Friday, December 31, 2010 8:47 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 9:29 AM
Points: 4,436, Visits: 6,337
Nice article! I have been telling clients this stuff for 15+ years. Now I have a link I can refer them to. :)

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1041409
Posted Friday, December 31, 2010 9:13 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, November 3, 2014 8:17 AM
Points: 18, Visits: 210
Very well written article with great practical examples of the downstream effect of "lazy" design decisions.
Post #1041418
Posted Friday, December 31, 2010 10:00 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, October 10, 2014 3:19 PM
Points: 28, Visits: 185
While I agree in general principle with the gist of the article, I don't think everyone needs to run into work this weekend and take their servers down and rebuild all their tables with the 'correct' data types. Again, I do agree in principle and it is a great article and the points are things that data modelers should take to heart.

But for existing systems, I'm going to put out a slightly contrairan viewpoint here for arguments sake.

Everything is relative. Large organizations that have huge databases generally have larger budgets and the reality is the even at enterprise disk-space cost, a savings of a few bytes per row--even if this savings could be achieved in every table in the system--is not going to make a CFO jump for joy and run out and by that yacht.

And lets look at the real savings (not talking about i/o yet--just disk cost). Let's say in every table in the system, you find a couple datetimes that could be Date instead (4 byte savings per field) a few varchar(2) that could be char(2) (1 byte saved). Lets throw in a few more to get really generous and say you found a lot of little changes and saved 30 bytes per row in every table and index in the entire system.

Now let's say there's a few detail tables that have 150 million rows. Maybe some audit tables with a another few hundred million. A hundreds of other smaller tables...lets say you have a total of a billion rows--no--how about 10 billion rows. I know that many organizations have even more than that but again--everything is relative. So 10 billion row--and things were so bad that you saved 30 bytes for every row in every table in the entire system. What have you saved? 300 gigs if my math is right. How much is that in enterprise disk cost? And remember--this is an organization who's in-house system has 10 billion rows (and we're not talking about 3rd party systems like SAP with over 100,000 tables or things like MS ISA which can generate many billions of rows in even an average company).

Are you going to run to the powers that be and tout how much you're going to save after you...what?...take down the servers for how long?...Test the applications with the new data types--and what about those apps that now need to be rebuilt because, say, smalldatetime will now error when a vb app passes its version of null...go through all that to rebuild your applications and tables for how much savings?

And yes, I get the part about I/O cost. Again, relativity, and pick the low-hanging fruit. If there are butt-stupid models out there in which the modeler was so lazy he had a default type of char(50) and left that in for all his "status" fields--of course by all means I'd make the time to fix that. But as many WTF's that might be out there in data models, the reality is that most places won't have that much low-hanging fruit (If you you can beg to differ on that one, then your company probably has more pressing IT issues than just this). If you're experiencing massive i/o delays on the server then of course--if you see low hanging fruit like that then you've probably found the cause.

But a more 'expensive' thing in terms of user productivity is going to be network bandwith. And no matter how efficient you are in the data model, when you return the rows over the pipe the application is going to get the same amount of data--whether the db has to chug a few more i/o cyles to put it together or not--the network is going to see the same amount of data. That's where application-level WTF's can save bandwith that's much more costly than some extra spindle time, but that's another article.

Really though, excellent info and by all means use it if you're in the modeling stage, but if you've got an existing system, you really need to do the math and see how much you're really going to save and if you have the resources to do this.
--Jim

Post #1041431
Posted Friday, December 31, 2010 10:16 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 8:25 PM
Points: 31,279, Visits: 15,740
mwenner (12/31/2010)

...

Can someone show me how a NonClustered index size (disk cost) is affected by the Clusterd Index Size?

Thanks,

Mark


If you choose an identity for your clustering key, that identity is in every non-clustered index. Regardless of what columns are in the nonclustered index, the clustering key is there. So let's assume you have this:

CustomerID int
FirstName varchar(50)
LastName varchar( 50)
Status char(5)
City varchar(100)

Now, if my CI is on Customer ID, all the data is ordered by CustomerID.

If I create a nonclustered index on Status, I will also have CustomerID in the nonclustered index. That is the way that I "look up" (row look up/bookmark lookup) the data. So I have 5 bytes (char(5) ) + 4 bytes (int) for every row in the nonclustered index.

If, however, I created my CI on LastName, then my same nonclustered index on Active, now has 5 bytes for the Staus, but it also has a varchar(50) for the lastname key. On top of that, you have 2 bytes in each row for the variable column overhad (just like a table) , and if LastName is not unique, then each "duplicate" of LastName will have a unique value added (2 bytes I believe) to each duplicate row.

Any other nonclustered indexes also have the larger size. So a CI key with variable columns potentially can really effect your nonclustered sizes.

Now, is 4 bytes v 25 (Avg) a big deal? Maybe, maybe not. Depends on size of data, volume of queries, etc., but it is something to be aware of and understand what impact you could have. You would have to really test if this is significant in any way for your application.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1041438
Posted Friday, December 31, 2010 10:31 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 9:29 AM
Points: 4,436, Visits: 6,337
Good description there Steve. One minor point - the 'uniquefier' for non-unique clustered indexes is 4 bytes, not 2.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1041441
Posted Friday, December 31, 2010 10:40 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 8:25 PM
Points: 31,279, Visits: 15,740
Thanks, couldn't remember that. I was thinking it was 2 + 2 for the variable portion. I was wrong.

http://msdn.microsoft.com/en-us/library/ms177484.aspx







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1041442
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse