“One Gerbil, Two Gerbils or Three Gerbils?” is a common DBA joke about server and storage performance. No matter how many gerbils power your storage, you need to know what type they are and the power that they provide. Storage is not about gerbils it is about IOPs, bandwidth, latency, and tiers.
As a DBA it is important for you to understand and know what kind of storage is attached to your servers and how it is handling your data. It is not important to master everything about it, but it is very advantageous to be able to talk to your storage admins or “Gerbil CoLo, LLC” provider intelligently especially when you experience performance issues. Here is a list of things to I encourage you to know and ask.
IOPs– IOPS stands for I/O (single read/write request) Operations Per Second. This is a performance metric that is dependent on the type of storage being used and can vary widely. It is important to understand how fast your storage can process data by knowing the expected IOPs and the actual IOPs once the array is processing workloads.
Bandwidth or Throughput – This is the measure of the size of the data in the I/O request. You can figure out throughput by taking I/O request size multiplied by the IOPs the Measure will be in Megabytes and Gigabytes per second.
Latency– In my opinion this is the most important metric to understand. It’s the time it takes to process that I/O request. Its an indicator of a possible storage bottleneck. You measure this time from when the request is issued to when the request is completed. This determines the responsiveness of your storage.
Storage Tier & Automatic Storage Tiers– A modern day array can be divided into tiers some of those tiers can be slower spinning disks while others can be fast flash or a hybrid of both. I think of these in terms of gerbils. You can get a small gerbil who has little legs that can run a marathon, a medium one that runs at a moderate speed 5k and a large gerbil that’s a speed racer. These together can work separately (pinned) or merged into a team like in a relay. Your data can be pass like a baton through each tier (automatic). Another words your data can be demoted or promoted between tiers of the storage device when needed for performance and compacity.
Note these apply to the Guest OS, there are metrics for the Hypervisor/Storage Stack that DBA’s do not normally have access to. The important part is that the different parts of the stack should mainly be in agreement about those numbers. If latency at the array side > latency at the Guest OS level, there is a big issue somewhere
Avg. Disk sec/Read – Shows the average read latency.
Avg. Disk sec/Write – Shows the average write latency.
Avg. Disk sec/Transfer – Shows the combined averages for both read and writes.
Disk Transfers/sec – is the rate of read and write operations on the disk.
Disk Reads/sec – is the rate of read operations on the disk.
Disk Writes/sec – is the rate of write operations on the disk.
Avg. Disk Queue Length – is the average number of both read and write requests that were queued for the selected disk during the sample interval.
Current Disk Queue Length – is the number of requests outstanding on the disk at the time the performance data is collected.
Here is a great session by my friend Argenis Fernandez (T) to better educate yourself. I highly recommend you start here.
RAID (redundant array of independent disks) –RAID is a solution that protects your data from a disk failure. You tend to hear administrators talk in terms of RAID 0,1,5,6 and 10. As database administrators you need to know what RAID type your data is on. For Tempdb you want it on the fastest RAID as possible RAID 1 or 10 while maintaining disk fault tolerance. This is usually old SANs and no longer a concern with modern storage arrays. Modern arrays take a different approach with object-based storage models, which is more like the cloud.
FLASH –High speed storage based on non-volatile memory, you may see it referred to as NVMe, Non-volatile Memory Express. These are SSD, solid state drives. One thing to keep in mind is that NVMe’s are SSDs but not all SSDs are NVMe’s there are different types of SSDs. Not matter what type of SSD it is these are really great for Tempdb workloads.
Hyper-converged– This is referred to as HCI. Both the storage, networking and compute are bundled into one. This is a newest all in one hardware that claims to save money and creates ease of use. Keep in mind that these now means the HCI processing power is now handling everything (networking, storage, IOPS, etc).
Snapshots– A capture of the state of your data is taken at a point in time. These snapshots can be used as restores or backup copies. These are usually snapshot copies of your mdf and ldf files. Note: Uncommitted transactions are not captured, and snapshots are not necessarily a replacement for backups. If your sysadmin asks about doing snapshots in lieu of backups, it’s your job ask a lot of hard questions to backup or storage vendor who is doing the snapshots and test both the back and more importantly the recovery. You need to be asking about to point in time recovery and how to handle page level restores for corruption just to name a couple.
Clones– A volume copy of your data, think of this a disk drive copy. It takes the files a makes a replica from snapshots creating a database copy.
Disk Replication (sync and async) –The replication of logical disk volumes from one array to another in real time (synchronous) or asynchronous for disaster recovery and continuity.
If you can educate yourself on these it will go a long way to being able to make sure you can have intelligent conversations with your storage admins or providers. This will enable you to better ensure you can advocate for your SQL environment when you experience performance issues related to storage. If your data is hosted elsewhere, like Gerbil Colo, LLC or even in a public cloud like Azure, make sure they can provide the above metrics to you. If they can’t, it might be time to host your data elsewhere.