Disk Is Cheap! ORLY?

  • Comments posted to this topic are about the item Disk Is Cheap! ORLY?

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon,

    Great article. I used to get very frustrated with programmers AND DBA's that thought the normalization aspects of relational models would cause a degredation in performance. I know that's a long and not so simple topic, but you have alluded to it here.

    [font="System"]Jay Quincy Allen, Managing Partner, Relational Models LLC[/font]

  • AWESOME! I love this article. Solid, logical, intelligent.


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • Brilliant - will circulate to my colleagues rightaway!

  • Thank you for a great article!

  • Columns that are too large also use more CPU time even if they are cached in memory. When you scan a table or index, it has to pass through the CPU, so the more data there is, the more CPU time it takes to process.

    I was just looking at the database for a vendor supplied application, and was shocked to see they were using GUIDs in the form of strings as the primary keys for all tables. Each PK is stored in a varchar(36) column, and they even store the dashes in the column. :crying:

  • Thank you for a great reminder that disk space isn't free! We installed a new SAN at the start of 2010 and it is already full - and I'm sure a lot of it is 'wasted' space. Some of us are old enough to remember when every byte (or even bit!) counted, and this is an excellent explanation.

  • This was removed by the editor as SPAM

  • Hi Solomon,

    Great article, clear and practical. Alot of common sense that usually never gets taken on board in the design phase.

    Just like to add, another potential problem of wasting space when incorrectly defining data types is the issue of clogging up precious network bandwidth.

    Many Thanks

  • Great Article and well explained. I hear 'disk space' is cheap all the time as well. The problem typically stems from the fact that the developers don't fully understand what they they are doing and don't want to clean up; so instead they keep all their intermediate results even though they role their intermediate results forward.

  • I only hear this naive argument from someone that has not seen the price of server class storage. When it comes to Enterprise class systems, nothing is cheap.

  • The article is pretty sweet, I'd just like to mention an error on the part refering to SMALLDATETIME... this data type does not discard time data; it's just more imprecise and covers a smaller date range:

    http://msdn.microsoft.com/en-us/library/ms187819(v=SQL.90).aspx

  • I find it amusing that he bothered to mention using char instead of varchar, but never mentioned that using nvarchar, or nchar, effectively doubles how much space is being used for that column.

    Also, using char is only ideal when you know you have a consistent data length, otherwise you're throwing bytes into the wind again by not using varchar (which only takes up as much space as the data being stored, so if you have varchar(100) but only using 10 characters, only 12 [length of string + 2] bytes is used). if i had a char column and only use 10 spaces, the datatype dictates that the end have spaces, thus forcing it to 100 bytes.

  • Great Article. As someone who started out using packed data on old IBM systems to save room, even I can get lazy about selecting the right column type.

    Can someone clarify one sentence in the article:

    >>

    So you need to keep in mind things like (for SQL Server at least) NonClustered indexes typically contain the data from the Clustered index (exceptions are: Unique Indexes and NonClustered Indexes that contain the Cluster Key)

    <<

    I just reviewed the structures of non clustered indexes (http://msdn.microsoft.com/en-us/library/ms177484.aspx) and from what I see, it does not store the clustered index data. Instead of a RID (used when the table is a heap), a NonClustered index points to the clustered index key.

    From same article as above.

    >>

    If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

    <<

    Can someone show me how a NonClustered index size (disk cost) is affected by the Clusterd Index Size?

    Thanks,

    Mark

  • Nice article! I have been telling clients this stuff for 15+ years. Now I have a link I can refer them to. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 101 total)

You must be logged in to reply to this topic. Login to reply