SQL Server Disk Latency

  • Hi Folks,

    we have a curious situation that I can only partially explain.

    A SAN that hosts a large highly-transactional database reports at the SAN Level a latency of around 6-8ms but when i measure latency on the SQL Server I get a result of between 35-50ms.  Obviously this results in a very slow response time from the application and this is what I need to improve.

    The database comprises of around 50 tables all of which are HEAP tables and there are no relationships defined between these objects.  The only defined indexes are nonclustered and there are a great deal of them, most of which are not used.  New Indexes need to be created to fulfil the requirements of the application, redundant, duplicated and unused Indexes will also be removed in the near future.

    My understanding of disk latency from the SQL Server is that it is the time measured from the moment a request to obtain data is sent to the delivery of that data.  Is that right?

    I also believe this latency can be dramatically reduced with a clustered index on the affected tables because in my opinion the amount of distance the read/write head must move is a contributory factor adding to this latency (as long as the index a sensible index is!).  

    Would you consider my initial diagnosis to be accurate or is there something I could have missed?

    I look forward to your comments!

    Regards Kev

  • Below 50ms for reads is generally considered good in SQL (I'm assuming you're referring to the sec/Read counter). So not really seeing a problem there.

    "Obviously this results in a very slow response time from the application" is not the conclusion I'd jump to.

    Sort your indexing out first and see how things change. Your application's slow response is much more likely to be due to poor query performance resulting from sub-optimal indexing, ie poor plans and having to read much more data than it needs to.
    I'd be surprised if latency changes much even after these changes as that's not really what latency measures, unless the indexing is so bad you're actually saturating the SAN.

  • According to Microsoft, 20-50ms is considered to be bad.  Read Latency is 80ms and Write Latency is 28ms and it is reducing this difference between what is seen on the storage and what SQL Server sees that is the solution to the Performance issues.

    I do consider SAN saturation to be a problem and that most likely because of far too many index operations.  That is why I believe reducing the amount of unnecessary indexes, query tuning and sensible indexing to be the way to reduce latency when retrieving data from disk.

    Aditionally, this software in in use in other locations as well but on a smaller scale.  That the databases isn't scalable is clear and a knock-on effect of reducing this latency will be an application that scales better.

  • All our DBs are relatively small but we do have a couple of servers with highish latency.
    I used the following Wait Statistics script, from Paul Randall, to identify the problem:

    http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

    In my case the top waits are ASYNC_NETWORK_IO and PREEMPTIVE_OS_WAITFORSINGLEOBJECT which seem to be caused by old RBAR type COM services running on slow application server VMs.

    I am not really an expert but Paul's script, or something similar, might help you narrow down the problem.

  • Ken McKelvey - Thursday, April 6, 2017 7:11 AM

    All our DBs are relatively small but we do have a couple of servers with highish latency.
    I used the following Wait Statistics script, from Paul Randall, to identify the problem:

    http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

    In my case the top waits are ASYNC_NETWORK_IO and PREEMPTIVE_OS_WAITFORSINGLEOBJECT which seem to be caused by old RBAR type COM services running on slow application server VMs.

    I am not really an expert but Paul's script, or something similar, might help you narrow down the problem.

    Happy days.....thanks a lot Ken.  I'll run that and post the results back here.

  • kevaburg - Thursday, April 6, 2017 4:19 AM

    Hi Folks,

    we have a curious situation that I can only partially explain.

    A SAN that hosts a large highly-transactional database reports at the SAN Level a latency of around 6-8ms but when i measure latency on the SQL Server I get a result of between 35-50ms.  Obviously this results in a very slow response time from the application and this is what I need to improve.

    The response time at the array is going to be vastly different to the transit time for the I\O request from SQL Server.
    When sql server requests an I\O there's a series of paths it takes. The SQL OS has to first send this request out to the Windows resource management, this will then be sent to an appropriate storage controller card, it will follow it's path across the storage network before reaching the storage processor. The I\O is processed and flows back, once confirmed to SQL OS the I\O is complete.
    Now, on a badly configured SAN this can take sometime. Saturation can be a big factor, montitor the storage network to see what response times are being experienced on the switches.

    kevaburg - Thursday, April 6, 2017 4:19 AM


    My understanding of disk latency from the SQL Server is that it is the time measured from the moment a request to obtain data is sent to the delivery of that data.  Is that right?

    See above

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • I think a quick set of definitions is in order.   Folks tend to throw terms around like latency and it's meaning depends on what element of your performance you're referring to.   Disk latency is a measure of the average time between the I/O request coming in to the disk drive and the time that the data starts to pass under the read/write head.  It's largely a reflection of rotational speed for the disk drive.   However, disk drives and computers are now connected by a SAN and that complicates matters considerably.   It adds several layers of complexity to an I/O stream that is already the slowest part of a computer system, and by virtue of a SAN connection, becomes slower still.   Once the computers I/O controller issues a read request, that has to travel to the SAN Fabric network card, so there's latency associated with arrival of that request at the network card and then arrival of the I/O request at SAN's disk controller, and thus you have SAN Fabric latency as well as actual disk latency and I/O controller latency on both sides to worry about.   As a typical SATA or SAS disk drive has a hard limit of 6 GB/sec data transfer, and the I/O controller in a typical computer is probably going to max out at around 70 to 80 percent of that figure, and then the typical SAN Fabric is just one GigaBIT per second, the I/O for JUST ONE PHYSICAL SPINDLE is enough to drive the SAN Fabric into saturation across multiple paths.   If you want to reduce the overall I/O latency then you have to seriously up the speed of the SAN Fabric to 10 Gb/sec on all of its paths.   Getting 30 to 50 ms I/O response time over a SAN isn't very good, but it's not likely to improve without a SAN Fabric speed up-tick.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Thursday, April 6, 2017 9:37 AM

    I think a quick set of definitions is in order.   Folks tend to throw terms around like latency and it's meaning depends on what element of your performance you're referring to.   Disk latency is a measure of the average time between the I/O request coming in to the disk drive and the time that the data starts to pass under the read/write head.  It's largely a reflection of rotational speed for the disk drive.   However, disk drives and computers are now connected by a SAN and that complicates matters considerably.   It adds several layers of complexity to an I/O stream that is already the slowest part of a computer system, and by virtue of a SAN connection, becomes slower still.   Once the computers I/O controller issues a read request, that has to travel to the SAN Fabric network card, so there's latency associated with arrival of that request at the network card and then arrival of the I/O request at SAN's disk controller, and thus you have SAN Fabric latency as well as actual disk latency and I/O controller latency on both sides to worry about.   As a typical SATA or SAS disk drive has a hard limit of 6 GB/sec data transfer, and the I/O controller in a typical computer is probably going to max out at around 70 to 80 percent of that figure, and then the typical SAN Fabric is just one GigaBIT per second, the I/O for JUST ONE PHYSICAL SPINDLE is enough to drive the SAN Fabric into saturation across multiple paths.   If you want to reduce the overall I/O latency then you have to seriously up the speed of the SAN Fabric to 10 Gb/sec on all of its paths.   Getting 30 to 50 ms I/O response time over a SAN isn't very good, but it's not likely to improve without a SAN Fabric speed up-tick.

    Hi Steve,

    thanks for that.  You are right of course, the word latency is easy to throw around.  In my case I use it to simply describe an amount of waiting between the start and finish of one or more consecutive operations.

    Your definition has given me another point to investigate and I will have a chat with the SAN Team to see what they can add.  The more I read the comments here though, the more I am convinced we need to rework alot in the database model from normalisation through to indexing. 

    It will be a long journey to get there I think..... 🙂

  • sgmunson - Thursday, April 6, 2017 9:37 AM

    I think a quick set of definitions is in order.   Folks tend to throw terms around like latency and it's meaning depends on what element of your performance you're referring to.   Disk latency is a measure of the average time between the I/O request coming in to the disk drive and the time that the data starts to pass under the read/write head.  It's largely a reflection of rotational speed for the disk drive.   However, disk drives and computers are now connected by a SAN and that complicates matters considerably.   It adds several layers of complexity to an I/O stream that is already the slowest part of a computer system, and by virtue of a SAN connection, becomes slower still.   Once the computers I/O controller issues a read request, that has to travel to the SAN Fabric network card, so there's latency associated with arrival of that request at the network card and then arrival of the I/O request at SAN's disk controller, and thus you have SAN Fabric latency as well as actual disk latency and I/O controller latency on both sides to worry about.   As a typical SATA or SAS disk drive has a hard limit of 6 GB/sec data transfer, and the I/O controller in a typical computer is probably going to max out at around 70 to 80 percent of that figure, and then the typical SAN Fabric is just one GigaBIT per second, the I/O for JUST ONE PHYSICAL SPINDLE is enough to drive the SAN Fabric into saturation across multiple paths.   If you want to reduce the overall I/O latency then you have to seriously up the speed of the SAN Fabric to 10 Gb/sec on all of its paths.   Getting 30 to 50 ms I/O response time over a SAN isn't very good, but it's not likely to improve without a SAN Fabric speed up-tick.

    SAN fabric is just a small part, close attention should be payed to the HBA configuration and the storage processor\cache, etc

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks for all your input.  It would seem I forgot more than a couple of points....

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply