Disk Is Cheap! ORLY?

  • Miles Neale

    SSChampion

    Points: 13147

    aalcala (10/29/2013)


    Context is everything!

    I work on a number of web-based applications and for our applications we rarely see tables with more than 100K rows. On the other hand, we have more than 10,000 source files being maintained by a team of 5 developers. In this situation, developer time is way more expensive than disk space. In 99% of all queries or stored procedures, response time is never even close to being an issue.

    To invest time re-factoring the first solution to be more disk-efficient is a bad investment. To make design choices that trade off ease of developer (maintenance) understanding for disk efficiency (space or time) is a bad investment.

    On those rare occasions that performance matters, we can take the time to "tune up" our design, but only when and as needed.

    Having said the above, I have worked on database systems where we used algorithms to strategically place data on disk in ways that would maximize performance based on disk rotation speed and head seek performance profiles.

    The point is to recognize and focus on the metrics that are important to your specific context, and not worry unduly about unimportant metrics, or make trade-offs that are sub-optimal for your situation in order to improve those unimportant metrics at the cost of the important ones.

    aalcala - Thank you for this thought.

    The cost of storage is not just the purchase price of the hardware or the needed software to make the storage work, it is the cost of use plus those things. If the cost of refactoring and making the use of data and application processes more efficient when it is already working this is an additional cost with no benefit to the user. I know that this smacks of compromise in things being acceptable when they are just good enough, but some of the time the number of tasks is so large that good enough has to stand till the number of tasks gets under control, if it ever does.

    The question is, is it acceptable to have 20 processing running not so efficiently and staying in business, or having only four processes running perfectly efficient and going out of business because you cannot do all you need to do? πŸ™‚

    M.

    Not all gray hairs are Dinosaurs!

  • Solomon Rutzky

    SSCoach

    Points: 15964

    Eric M Russell (10/25/2013)


    Excellent article.

    Thanks :-).

    In a corporate enterprise envrionment, it's not as if the sysadmin can just drop what they're doing, run down to the local electronics store, pull a $200 drive off the shelf, and pop it in the server. You can blow through thousands of dollars worth of billable time and broken service level agreements just discussing the issue and waiting for it to happen.

    And yet I have heard senior management say that they can just go down to the store and grab a 1 TB drive for {some low amount of money} and "so what's the problem?" It would make more sense for the business folks to just ask the DBA and/or IT Director how much it would cost to get X amount of space, how long would it take to get it, and what is involved in installing it? That would be a real eye-opener. And of course, that conversation needs to happen at the development level as well when planning new projects as it happens all too often that a project goes to production that requires more space than is available and yet the budget has already been spent and nothing more can be added for 6 months or so. Those are always pleasant discussions ;-).

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky

    SSCoach

    Points: 15964

    christopher reeve (10/25/2013)


    Great article and information (glad it was reposted on SQLServerCentral today)!

    Thank you :-).

    In regard to item #10 - while not denormalizing tables definitely applies to transaction systems (OLTP) - so order entry and other application transaction systems retain their integrity and update efficiencies - the presentation tables in a data warehouse (OLAP) are usually denormalized (as a star schema) to improve performance (and to a lesser extent understandability for the report creation users). This storage of redundant data in data warehouses is deemed worthy to reduce report query response time by limiting the depth of the table joins.

    I completely agree! And in my defense, I did qualify that recommendation by saying "do not prematurely denormalize". I would not consider denormalization in an OLAP system / Data Warehouse to be premature as that is, as you stated, an appropriate technique in that environment. I guess I can update that part to be more explicit in applying to OLTP systems. Thanks for pointing out the ambiguity and the explanation.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky

    SSCoach

    Points: 15964

    msmithson (10/25/2013)


    Great article.

    npodlesny (10/25/2013)


    I came across β€œDisk Is Cheap! ORLY?” post just now. Excellent, very comprehensive article!

    - Nataliya

    inuscz (10/28/2013)


    Thanks for a great article. I am still very young in the industry so this will be a great guide for future designs and decisions.

    avbhaskar (10/28/2013)


    I have tried explaining the same concept to Web Developers in my team before, without much success. This explanation is Solid. I will be happy to refer them to this post. Thanks Solomon.

    Bhaskar

    To each of you: thank you for the kind words and you are quite welcome. I am glad that you found the information helpful :-).

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Jeff Moden

    SSC Guru

    Points: 994266

    aalcala (10/29/2013)


    ... we have more than 10,000 source files being maintained by a team of 5 developers.

    I'm just curious. what kind of files are those and what format are they in?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. πŸ˜‰

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • aalcala

    SSC Enthusiast

    Points: 157

    Jeff Moden (10/29/2013)


    aalcala (10/29/2013)


    ... we have more than 10,000 source files being maintained by a team of 5 developers.

    I'm just curious. what kind of files are those and what format are they in?

    A suite of related web applications that has evolved over the last dozen years or so. So... .css files, classic ASP with include files in various scripting languages (vbscript, javascript), ASP.NET, a small percentage of image files for banners, etc.

  • twin.devil

    SSC-Insane

    Points: 22208

    excellent article i must say ... detailed enough for ppl who design database with closed eyes 😎

    good work !!

  • Jeff Moden

    SSC Guru

    Points: 994266

    aalcala (10/29/2013)


    Jeff Moden (10/29/2013)


    aalcala (10/29/2013)


    ... we have more than 10,000 source files being maintained by a team of 5 developers.

    I'm just curious. what kind of files are those and what format are they in?

    A suite of related web applications that has evolved over the last dozen years or so. So... .css files, classic ASP with include files in various scripting languages (vbscript, javascript), ASP.NET, a small percentage of image files for banners, etc.

    I guess it's mostly because I'm a hardcore data troll but when I think of "source" files, I usually think of "imports" to SQL Server and not files that are sources for sceen display by a web-app. Thanks for the clarification.

    I haven't worked on front-end code for over a decade but, even with that, 10,000 such source files seems a bit excessive. It must be one hell of a suite to require so many source files.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. πŸ˜‰

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • Solomon Rutzky

    SSCoach

    Points: 15964

    Joshua M Perry (10/28/2013)


    Disk is about to become more expensive. We are on the edge of a transition to flash storage which will make traditional disk go the way of the floppy disk...The cost can actually fall as low as $5/GB after deduplication, and that doesn't take into consideration the savings from energy, cooling, and licensing, which can be substantial when going to from a full rack of SAN disk to a quarter rack or less of flash arrays.

    Hi Joshua.

    Thank you for sharing that great info! This is good for business as well as the environment / world-at-large so sounds like a huge win-win :-D.

    Even with 3rd generation hardware, the cost for storage can be cut in half while gaining 10 times the IOPS...

    The intention of this information is a little ambiguous, but the point seems to be that with falling prices AND increased IOPS, physical storage will not be nearly as much of a bottle-neck so the importance of choosing "appropriate" datatypes will diminish. And even if not specifically intended, I am sure that some people will get that message. With that in mind, I would say two things:

    • In this article I am recommending that we take a holistic approach to all (or at least most) of the touch points for our datatype choices. So even if the physical storage component of the system becomes less of an issue over time -- due to things such as the flash storage -- that doesn’t imply a similar decrease in cost for the other components of the system. Will backups go onto this flash storage? What about memory, CPU, and network usage? Working under the assumption that physical storage is the only piece of the puzzle will inadvertently lead to slowing down queries due to decreasing Page Life Expectancy. So now you are back to trying to solve performance problems via hardware and simply shifting the expense to now adding more RAM. And again the backup size will be larger and cost more to store the same number of backups and/or reduce the number of days you can afford to have backups for.
    • This mind-set lends itself to bad habits: being too comfortable with "plentiful" resources leads to acting as if those resources are infinite, and no resources are. This happens all around us, just like with water and trees. And one could argue the same is happening in China and India with cheap and plentiful labor (ignoring why it is cheap and plentiful). And did we really need a flight simulator in Excel 97? Basically, if you convince yourself that a particular resources is not a concern due to being so abundant, over time you will eventually consume it at a high enough rate as to render it just as finite as the other moderately available resources. By being mindful to not be wasteful (i.e. use what you need or expect to need with optimistic growth), as opposed to being highly conservative (using so little as to not have any room to grow, such as with the Y2K issue or maybe using SMALLINT or UserID instead of INT), a system can operate as if those resources are still somewhat infinite (i.e. be scalable). By being pragmatic, the falling cost of storage combined with increased IOPS just makes our applications that much faster.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Adam Seniuk

    SSCrazy

    Points: 2171

    Excellent article, disk is cheap until you start factoring in the other costs and requirements.


    Over 12yrs in IT and 10yrs happily stuck with SQL.
    - SQL 2008/R2/2012/2014/2016/2017
    - Oracle 8/9/10/11
    - MySQL 4/5 and MariaDB

  • Solomon Rutzky

    SSCoach

    Points: 15964

    aalcala (10/29/2013)


    Context is everything!

    I completely agree.

    I work on a number of web-based applications and for our applications we rarely see tables with more than 100K rows. On the other hand, we have more than 10,000 source files being maintained by a team of 5 developers. In this situation, developer time is way more expensive than disk space. In 99% of all queries or stored procedures, response time is never even close to being an issue.

    Ok. As you said, context is everything, and the context of my article is dealing with much larger systems, so what you are getting at here is really not at odds with what I was saying. I also did state in the article to "be sensible, not wasteful" and it sounds like you are being sensible for your particular needs.

    To invest time re-factoring the first solution to be more disk-efficient is a bad investment. To make design choices that trade off ease of developer (maintenance) understanding for disk efficiency (space or time) is a bad investment.

    I specifically stated in the article to consider the recommendations for new projects, though there are certainly situations, even if not for you, when refactoring is a good investment.

    Regarding design choice trade-offs, once again it all depends on the particular situation. For small-scale systems you are generally correct, but for larger systems (which are the main concern of this article) it is a good investment to err on the side of overall system efficiency. And for small to mid-size systems there is usually a decent compromise. I was discussing this very point with a developer the other day who designed a series of tables that have a variable-width text field to JOIN on. He was being told that he needs to convert that to TINYINT and was unsure of the benefit since the time it takes him to track down data feed problems would increase if he had to remember the 50 or so values rather than just immediately seeing the data he needs and understanding it without having to query an additional lookup table. Given that he, too, was working on a system that would never reach 100k rows in any table, I recommended using a CHAR(3) or CHAR(4) field, maybe even CHAR(5), to hold a code that represents the full text in some meaningful way. That is an efficient solution that not only lets him keep the ease-of-use factor, but is also more maintainable given that the spelling of the full text of those values could possibly change and if that ever happens then he would have to find and update every table that has that field.

    Just FYI, I have submitted an update to this article that adds an extra paragraph in the Conclusion section to more explicitly clarify these two points given that a couple of other people had similar misinterpretations of what I was saying. Now I clearly state that I am not talking about refactoring unless the situation calls for it, and that smaller-scale systems might benefit from erring on the side of developer ease-of-use. I am not sure when that change will be made active, but hopefully soon.

    Take care,

    Solomon.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky

    SSCoach

    Points: 15964

    twin.devil (10/30/2013)


    excellent article i must say ... detailed enough for ppl who design database with closed eyes 😎

    good work !!

    Adam Seniuk (10/30/2013)


    Excellent article, disk is cheap until you start factoring in the other costs and requirements.

    To both of you: thank you for the kind words and you are quite welcome. I am glad that you found the information helpful .

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 12 posts - 91 through 102 (of 102 total)

You must be logged in to reply to this topic. Login to reply