Calculating average read/write latency across instance using sys.dm_os_performance_counters

  • Hello,

    I'm trying to figure out how you would calculate the average read/write latency experienced by a SQL Server instance during a specific time window in order to monitor this for multiple instances. From this MSDN blog, I know that you have to take multiple samples and do some calculations to get the correct latency.

    http://blogs.msdn.com/b/psssql/archive/2013/09/23/interpreting-the-counter-values-from-sys-dm-os-performance-counters.aspx

    However, the SQLServer:Resource Pool Stats object tracks these numbers per resource pool and we want to get one number for the whole server. Since there can be a different base value for each resource pool, you can't simply sum the numerator values together. Here's some sample data from a server that illustrates the problem.

    object_name counter_name instance_name cntr_value cntr_type

    SQLServer:Resource Pool Stats Avg Disk Read IO (ms) default 307318919 1073874176

    SQLServer:Resource Pool Stats Avg Disk Read IO (ms) Base default 25546724 1073939712

    SQLServer:Resource Pool Stats Avg Disk Read IO (ms) internal 2045730 1073874176

    SQLServer:Resource Pool Stats Avg Disk Read IO (ms) Base internal 208270 1073939712

    I'm thinking I would need to do some sort of weighted average, but I'm not sure if that will result in the correct value. Here's the formula I am thinking about using currently before doing the calculation over time

    ((default * default[base]) + (internal * internal[base]))/(default[base] + internal[base])

    Then to do the calculation over time, I'd use the changes in the calculated numerator and denominator to get the average.

    Does this sound like to correct way to get this value? Is there a good way to verify?

  • You are right that you would have to do a weighted average, and that combined with taking snapshots at some interval so that you get a more granular view for specific times should get you were you want to be.

    However, you'd have to weight the average by the number of IO operations, which I don't believe is available from the Resource Pool Stats object.

    I generally go a different route and pull from sys.dm_io_virtual_file_stats(NULL,NULL) at some interval (my default is every 5 minutes, but that can obviously be tweaked as necessary) and load that into a table.

    That lets me look at specific databases and files, in case some are on different storage than others.

    If you then wanted to just aggregate across all files to get an overall number for the instance, you could do that simply enough, while maintaining the ability to look at a more granular level as well..

    This gives the MS documentation about that function, and this explains a little more about its use in this sort of situation.

    Cheers!

  • I was hoping the 'Base' value that comes up was tracking the IOs or something equivalent so that this formula would work but there doesn't seem to be good enough documentation on what the value actually is to confidently use it.

    Looking at the columns returned by sys.dm_io_virtual_file_stats, it looks a lot easier. You'd have to snapshot the #reads/writes, and the total ms of waits.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply