This week at SQL University we are talking about capacity planning. I am the guest lecturer for this topic, and covered a broad overview of what capacity planning means for DBAs to start the week. In this installment, we will look specifically at disk requirements and how to plan for them.
The traditional methods of sizing a database has been to compute the row size for each row, multiply by the number of rows, and then add some growth factor for the next xx months along with a bit of pad. This is an easy formula for each table.
TS = P ( G ( RS * Re) )
If you sum this up for all tables, you can easily determine the size you need for your database.
However there is one factor in there that tends to throw the calculations off. The estimates you make, both in growth, and in the pad, can dramatically throw off your calculations.
There is also the fact that not all tables grow at the same rate. The growth rate for your Orders table probably is larger than that of your Customers table. You hope that each Customer places more than one order, so while each should grow, the Orders table should grow at a faster rate. Determining the growth rate for each table can be a cumbersome task.
I used to have a system in place that counted the rows in every table in my database every night. We stored this number and calculated a monthly average and growth rate, and this allowed me to plan for disk usage, as well as understand the different growth rates of tables. Popular software packages like SAP or Dynamics have a fairly well known rate of growth for each table in their system and that can help you to determine how quickly each table will grow.
This formula is only as good as your estimates, and while it is a very accurate way to determine disk capacity needs in the future, it is also overly complex in my opinion.
One of the guiding principles that I have tried to use in my career is to keep things as simple as you can. Not that everything is simple, but I try to keep it as simple as it can be. Don’t add unnecessary complexity if you don’t need it.
Over time I observed a few things in the databases I managed. One was that relatively few tables in a database contribute heavily to the disk usage of that system. There are often a few tables like order tables, fact tables (in warehouses) auditing tables, that use up most of the disk space. Lookup tables tend to use trivial amounts of space (relatively) and other tables tend to get dwarfed in size relative to your few large tables.
I also noticed another correlation in my database disk usage. The larger the backup size, the more disk space I needed for my database. Since a backup contains mostly data (and some log records), it represents a fairly good estimate of the size of you data on disk. Since you also need to plan for backup space on your server, I found this formula to be very accurate for me.
TS = d ( BS) + BS + p (BS)
Here I calculate my total data size on disk as being the number of full backups I keep multiplied by the backup size. I then add in my database by itself (the single BS in the middle) and finally a pad percentage of this size.
What is even better is that each SQL Server already tracks backup size in msdb.dbo.backupset in the backup_size column. If you have been running backups on your server, you can calculate the average and the growth of this number over time. That will easily allow you to plan for future space needs.
It’s simple, and an easy way to track disk requirements over time. I would recommend storing this data elsewhere as the msdb table periodically get cleaned out, and in the event of a DR situation, you still need this data to be restored on your new server.
I would also recommend tracking this on a daily basis, and if it grows an extraordinary amount over one day, you can investigate to see if something has fundamentally changed in the way your database is used, or if this was an unusual event. My rule of thumb was if a backup grew more than 15% of what I expected, I should investigate.
This has alerted me to new development changes, the addition of new customer or partner data being loaded regularly, along with various other changes in my systems. This has allowed me to better manage growth and plan for new disk needs in advance of actually needing them.
This is 15% in unexpected growth, not 15% growth. If I expect 2% growth in my backup file each data, then if I received 17% (15% unplanned + 2% planned growth) in one day, I would investigate to determine what had happened on that database.
There are a fwe issues with this method that you need to be aware of, and perhaps plan for separately. The first is dealing with partitions/filegroups, the second multiple disks, and the third is log space.
I have typically worked on databases that had one filegroup. In this situation. That is a very simple server, but as data sizes grow, and more people take advantage of partition features in SQL Server, you might easily end up with very different sized partitions or filegoups on different disks. If your data tends to distribute fairly evenly, you can still use disk backup sizes to guesstimate future needs. However if you have tables growing at different rates, you will need to track the size of each partition or filegroup separately.
There is also the issue of planning across disks. A best practice dictates that backups are stored separate from the data, on separate physical disks. This means that while the calculations above are relatively simple, you need to be sure that each disk you use has enough space for its needs. Don’t assume that backups can use the space on you data drive or vice versa. As you track space used over time, it needs to be broken out by logical disk drive.
The last issue, which varies from server to server, is the capacity needed for your logs. The amount of log space needed each day by a database varies with the actual load on that server. If you have very busy servers, you will generate more log records, and larger log backup sizes, than if you have less load. Unlike a full backup, whose size is similar on days when there is little activity (such as weekends for many businesses), log backup sizes can vary tremendously during low activity periods.
My approach with logs has been to guess at log size, or leave it at the current size, schedule log backups, and then calculate the total log backup size each day. This tells me roughly how much disk space I need for log backups each day, and I can multiply this by the number of days I am keeping to determine log backup space needed.
For the actual log size, there are numerous articles, but you essentially need space for your largest log backup plus some padding. This might be the space needed for a single log backup at 5:00pm that is twice the size of other log backups. You cannot take the total day’s size and divide it by the number of log backups. You need to account for the largest log backup, based on whatever interval you choose, and add some pad.
Note that your needs can change dramatically if you change the frequency of your log backups.
I have mentioned padding numerous times in this lecture, however I have not given concrete guidance. That is because the amount of pad will vary, depending on your tolerance for issues, and the time you want to devote to this topic.
If you only want to manage space once a year, you need a larger pad for your estimates than if you are willing to manage weekly. I prefer to proactively grow space on servers once a quarter. So I estimate (based on backup size) the usage for the next quarter, and then add in a 10% pad to that amount. If this is a critical server, or one with little history, I would increase that pad to 15% or 20% to ensure I do not have issues.
Note that while I don’t want to grow space more than once a quarter, I have processes in place to track space daily and I investigate any unusual growths immediately.
Planning for the disk capacity needed can be a complex and confusing task. It involves a lot of variables, and ultimately is a bit of a guess on the part of the administrator. As a result, I try to keep it as simple as I can and track at the highest (most gross) level possible, which has been the data size in full backups.
However you feel to track space, you should be doing it in some way and reviewing it periodically. I tend to manage space once a quarter, for all servers, proactively growing space as needed for each database. This requires that I have some idea of the past use so I can plan for the next quarter’s use, as well as be ready to order/allocate more as needed.
If you have comments, or would like to add your blog to this lecture, please leave a note and link below. I will add interesting blogs to this lecture as a separate section. Next time we will conclude this week with a look at other issues to consider when capacity planning.