Disk Is Cheap! ORLY?

  • Excellent article. I think the network bandwidth issue is really important and often overlooked. It is mentioned in the article, but I'd add that a WAN circuit of even just moderate bandwidth can cost as much per month to operate as some disk drives cost to buy. And can easily cost significantly more. So if you are syncing to a DR site, getting the backup image over there, and as mentioned, log shipping or replication, can take a big byte out of it. Bandwidth is also not something you can quickly address if you run out of it! Disks are usually a two week turnaround to buy more and rack them up. WAN circuits are running 120 days or more!

  • Excellent article. It's funny (in a bad way) that many people say "disk is cheap" until the day that tempdb or transaction log starts filling up and you actually put in a request for more disk storage. In a corporate enterprise envrionment, it's not as if the sysadmin can just drop what they're doing, run down to the local electronics store, pull a $200 drive off the shelf, and pop it in the server. You can blow through thousands of dollars worth of billable time and broken service level agreements just discussing the issue and waiting for it to happen.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Great article and information (glad it was reposted on SQLServerCentral today)!

    In regard to item #10 - while not denormalizing tables definitely applies to transaction systems (OLTP) - so order entry and other application transaction systems retain their integrity and update efficiencies - the presentation tables in a data warehouse (OLAP) are usually denormalized (as a star schema) to improve performance (and to a lesser extent understandability for the report creation users). This storage of redundant data in data warehouses is deemed worthy to reduce report query response time by limiting the depth of the table joins.

    A properly designed star schema for the data warehouse presentation fact and dimension tables makes sure the large fact tables are using foreign keys (and the corresponding primary key in the dimension table) of appropriate size for both space saving and performance reasons and also espected growth as you specified in your article for transaction tables. Also numeric and string fields in both the fact (very important because fact tables get very large) and dimension tables would benefit from the space saving techniques you specified.

    Thanks again,

    Chris Reeve

  • Eric M Russell (10/25/2013)


    Excellent article. It's funny ... You can blow through thousands of dollars worth of billable time and broken service level agreements just discussing the issue and waiting ...

    We had an issue like this recently but it was a cert not disk. The time spent was more then the cert. The bloated cost of buying some things is amazing.

    Appreciate your work here!

    M.

    Not all gray hairs are Dinosaurs!

  • Miles Neale (10/25/2013)


    Eric M Russell (10/25/2013)


    Excellent article. It's funny ... You can blow through thousands of dollars worth of billable time and broken service level agreements just discussing the issue and waiting ...

    We had an issue like this recently but it was a cert not disk. The time spent was more then the cert. The bloated cost of buying some things is amazing.

    Appreciate your work here!

    M.

    What did you mean by a "cert"; some new hardware or software was install, but you had to wait for it to be certified?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Great article.

  • Sorry, we are running Secure Sockets Layer, SSL which is the public-and-private key encryption strategy that includes a digital certificate. We are looking at TLS the more recent strategy but have not migrated yet.

    Not all gray hairs are Dinosaurs!

  • I came across β€œDisk Is Cheap! ORLY?” post just now. Excellent, very comprehensive article!

    - Nataliya

  • Thanks for a great article. I am still very young in the industry so this will be a great guide for future designs and decisions.

  • Disk is about to become more expensive. We are on the edge of a transition to flash storage which will make traditional disk go the way of the floppy disk. I've always said that any technology becomes truly useful when version 4 is released (think Windows, Novell, IE and Netscape, TCP/IP, and, of course, SQL Server). The 3rd generation of flash arrays is on the way in. The cost associated with a huge Texas Memory Systems or Violin storage array is mostly gone thanks to the software layer of the 3rd generation systems. The cost can actually fall as low as $5/GB after deduplication, and that doesn't take into consideration the savings from energy, cooling, and licensing, which can be substantial when going to from a full rack of SAN disk to a quarter rack or less of flash arrays. Even with 3rd generation hardware, the cost for storage can be cut in half while gaining 10 times the IOPS. Since we'll likely be at the 4th generation within the next 3 years or so, flash is about to get much less expensive (it is already starting to drop in price thanks to tablets and ultrabooks) and disk will start to become more expensive as it is used less. For anyone that cares to take a look, PureStorage is the vendor leading the charge on the 3rd generation systems and there is a decent deep dive on their technology...

    http://blog.nigelpoulton.com/pure-flasharray/

  • Joshua M Perry (10/28/2013)


    Disk is about to become more expensive. We are on the edge of a transition to flash storage which will make traditional disk go the way of the floppy disk. I've always said that any technology becomes truly useful when version 4 is released (think Windows, Novell, IE and Netscape, TCP/IP, and, of course, SQL Server). The 3rd generation of flash arrays is on the way in. The cost associated with a huge Texas Memory Systems or Violin storage array is mostly gone thanks to the software layer of the 3rd generation systems. The cost can actually fall as low as $5/GB after deduplication, and that doesn't take into consideration the savings from energy, cooling, and licensing, which can be substantial when going to from a full rack of SAN disk to a quarter rack or less of flash arrays. Even with 3rd generation hardware, the cost for storage can be cut in half while gaining 10 times the IOPS. Since we'll likely be at the 4th generation within the next 3 years or so, flash is about to get much less expensive (it is already starting to drop in price thanks to tablets and ultrabooks) and disk will start to become more expensive as it is used less. For anyone that cares to take a look, PureStorage is the vendor leading the charge on the 3rd generation systems and there is a decent deep dive on their technology...

    http://blog.nigelpoulton.com/pure-flasharray/

    Good information. Thanks.

    Shifting gears, I don't care how cheap storage is or becomes, I hate wasting resources if they don't need to be wasted. "Right-sizing" still rules.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have tried explaining the same concept to Web Developers in my team before, without much success. This explanation is Solid. I will be happy to refer them to this post. Thanks Solomon.

    Bhaskar

  • I don't care how cheap storage is or becomes, I hate wasting resources if they don't need to be wasted.

    That's right. Laziness is looking at what the price used to be, and what it is now, and thinking "Hey, the pressure's off! I can coast." Creativity is thinking "Hey, I have so much at my disposal now. How can I make the world a better place with it?"

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

  • Context is everything!

    I work on a number of web-based applications and for our applications we rarely see tables with more than 100K rows. On the other hand, we have more than 10,000 source files being maintained by a team of 5 developers. In this situation, developer time is way more expensive than disk space. In 99% of all queries or stored procedures, response time is never even close to being an issue.

    To invest time re-factoring the first solution to be more disk-efficient is a bad investment. To make design choices that trade off ease of developer (maintenance) understanding for disk efficiency (space or time) is a bad investment.

    On those rare occasions that performance matters, we can take the time to "tune up" our design, but only when and as needed.

    Having said the above, I have worked on database systems where we used algorithms to strategically place data on disk in ways that would maximize performance based on disk rotation speed and head seek performance profiles.

    The point is to recognize and focus on the metrics that are important to your specific context, and not worry unduly about unimportant metrics, or make trade-offs that are sub-optimal for your situation in order to improve those unimportant metrics at the cost of the important ones.

  • dhubbard (10/25/2013)


    Excellent article.

    Thank you πŸ™‚

    I think the network bandwidth issue is really important and often overlooked. It is mentioned in the article, but I'd add that a WAN circuit of even just moderate bandwidth can cost as much per month to operate as some disk drives cost to buy. And can easily cost significantly more. So if you are syncing to a DR site, getting the backup image over there, and as mentioned, log shipping or replication, can take a big byte out of it. Bandwidth is also not something you can quickly address if you run out of it! Disks are usually a two week turnaround to buy more and rack them up. WAN circuits are running 120 days or more!

    Interesting info. I am not as familiar with the intimate details of the network side of things so couldn't go into too much depth there. Definitely good to know about the monthly cost. But regarding the turnaround, that would fluctuate greatly, I would think, at least over time if not across geographical regions (maybe?). So that might not be a long-term issue (as in years from now), but more likely that the monthly cost issue would be a long-term issue. If I can get more detailed info on that aspect then I can probably include it in the article.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 15 posts - 76 through 90 (of 101 total)

You must be logged in to reply to this topic. Login to reply