If you are deploying SQL Server in Azure, or any Cloud platform for that matter, instead of just provisioning storage like you did for your on-premises deployments for many years, you may consider that storage in the Azure isn’t exactly like the storage you may have had access to on-premises. Some traditional “best practices” may wind up costing you additional money and give you less than optimal performance, all while not providing you any of the intended benefits. Much of what I am about to discuss is also described in Performance Guidelines for Azure in SQL Server Virtual Machines.
I’m not here to tell you that you must use UltraSSD, Premium Storage, or any other disk type. You just need to be aware that you have options, and what each disk type brings to the table. Of course, like anything else in the cloud, the more money you spend, the more power, speed, throughput, etc., you will achieve. The trick is finding the optimal configuration so that you spend just enough to achieve the desired results.
Size DOES Matters
Like many things in the cloud, certain specs are tied together. For servers if you want more RAM you often get more CPU, even if you didn’t NEED more CPU. For storage, IOPS, throughput and size are all tied together. If you want more IOPS, you need a bigger disk. If you need more space, you also get more IOPS. Of course you can jump between storage classes to circumvent that to some extent, but it still holds true that if you need more IOPS, you also get more space on any of the different storage types.
The size of your virtual machine instance also matters. Regardless of what storage configuration you eventually go with, the overall throughput will be capped at whatever the instance size allows. So once again, you may need to pay for more RAM and CPU than you need, just to achieve your desired storage performance. Make sure you understand what your instance size can support in terms of max IOPS and MBps throughput. Many times the instance size will turn out to be the bottleneck in a perceived storage performance problem in Azure.
Use RAID 0
RAID 0 is traditionally the 3rd rail of storage configuration options. Although it provides the best combination of performance and storage utilization of any RAID option, it does so at the risk of a catastrophic failure. If just a single disk in a RAID 0 stripe set should fail, the entire stripe set fails. So traditionally RAID 0 is only used in scenarios where data loss is acceptable and high performance is desirable.
However, in Azure software RAID 0 is desirable and even recommended in many situations. How can we get away with RAID 0 in Azure? The answer is easy. Each disk you present to an Azure virtual machine instance already has triple redundancy on the backend, meaning you would need to have multiple failures before you would lose your stripe set. By using RAID 0, you can combine multiple disks and the overall performance of the combined stripe set will increase by 100% for each additional disk you add to the stripe set.
So for example, if you had a requirement of 10,000 IOPS, you might think that you need UltraSSD since Premium Storage maxes out at 7,500 IOPS with a P50. However, if you put two P50s in a RAID 0, you now have the potential to achieve up to 15,000 IOPS, assuming you are running a Standard_F16s_v2 or similarly large instance size that supports that many IOPS.
In Windows 2012 and later, RAID 0 is achieved by creating a Simple Storage Space. In Windows Server 2008 R2 you can use Dynamic Disks to create a RAID 0 Striped Volume. Just a word of caution, if you are going to use a local Storage Space and also configure Availability Groups or a SANless Failover Cluster Instance with DataKeeper, it is best to configure your storage BEFORE you create a cluster.
Just a reminder, you only have about two more months to move your SQL Server 2008 R2 instances to Azure. Check out my post on how to deploy a SQL Server 2008 R2 FCI on Azure to ensure high availability.
Don’t bother separating log and data files
Traditionally log and data files would reside on different physical disks. Log files tend to have a lot of write activity and data files tend to have more read activity, so sometimes storage would be optimized based on those characteristics. It was also desirable to keep log and data files on different disks for recovery purposes. If you should lose one or the other, with a proper backup strategy in place you could recover your database with no data loss.
With cloud based storage, the likelihood of losing just a single volume is EXTREMELY low. If by chance you lose storage, it is likely your entire storage cluster, along with the triple redundancy, went to lunch. So while it may feel right to put logs in E: logs and data in F:data, you really are doing yourself a disservice. For example, if you provision a P20 for logs and a P20 for data, each volume will be 512 GiB in size and capped at 2,300 IOPS. And just think, you may not need all that size for log files, but it might not give you much room to grow for your data files, which will eventually require moving to a more expensive P30 just for the extra space.
Wouldn’t it be much nicer to simply stripe those two volumes together into a nice large 1 TB volume that supports 4,600 IOPS? By doing that both the log and data files can take advantage of the increased IOPS and you have also just optimized your storage utilization and decreased your cloud storage cost by putting off the move to a P30 disk for your data file.
The same holds true files and filegroups. Really think hard about what you are doing and whether it still makes sense once you move to the cloud. What makes sense might be counter intuitive to what you have done in the past. When in doubt, follow the KISS rule, Keep It Simple Stupid! The beauty of the cloud is you can always add more storage, increase instance size, or do whatever it takes to optimize performance vs. cost.
What to do about TempDB
Use the local SSD, aka, the D: drive. The D drive is going to be the best location for your tempdb. Because it is a local drive the data is considered “temporary”, meaning it can be lost if a server is moved, rebooted, etc. That’s okay, tempdb is recreated each time SQL starts anyway. The local SSD is going to be fast and have low latency, but because it is local the reads and writes to it do not contribute to the overall storage IOPS limit of the instance size, so effectively it is FREE IOPS, so why not take advantage? If you are building a SANless SQL Server FCI with SIOS DataKeeper, be sure to create a non-mirrored volume resource of the D drive so you don’t needlessly replicate TempDB.
Mount Points Become Obsolete
Mount Points are commonly used in SQL Server FCI configurations when multiple instances of SQL Server are installed on the same Windows Cluster. This reduces the overall cost of SQL Server licenses and can help save cost by driving higher server utilization. As we discussed in the past, typically there might be five or more drives associated with each SQL Server instance. If each of those drives had to consume a drive letter you would run out of letters in just about three to four instances. So instead of giving each drive a letter, mount points were used so that each instance could just be serviced by a single drive letter, the root drive. The root drive has mount points that map to separate physical disks that don’t have drive letters.
However, as we discussed above, the concept of using a bunch of individual disks really doesn’t make a lot of sense in the cloud, hence mount points become obsolete in the cloud. Instead, create a RAID 0 stripe we as described and each clustered instance SQL Server will simply have its own individual volume that is optimised for space, performance and cost. This solves the problem of running out of drive letters and gives you much better storage utilization and performance while also reducing the cost of your cloud storage.
This post is meant as a jumping off point, not a definitive guide. The main point of the post is to get you thinking differently about cloud and storage as it pertains to running SQL Server. Don’t simply take what you did on-premise and recreate it in the cloud, that will almost always result in less than optimal performance and a much larger storage bill than necessary.