SAN related performance issue, though low disk queue length

  • Hello,

    At a customer we are experiencing severe performance issues. The performance issues seem to be storage related, since it matters a whole bunch on which drives we place the database. When the database is placed on SAN 1, certain query's take about 4 minutes to complete, while when the exact database is places on SAN 2 the same query's take less than a minute to complete.

    But what I just cannot comprehend is the low average disk queue length, while monitoring the performance on both SAN's. On the fast SAN the disk queue length is '1' for about a minute, while ons the slow SAN the disk queue length is '1' as well but for about 4 minutes.

    Now, a disk queue length of 1 is not very high. Especially on the SAN from which the query's run for about 4 minutes, it doesn't make sence to me that dis average (read) disk queue length does not exceed 1. Why on earth does SQL Server (2005) not through a whole bunch of those I/O requests in the queue? Am I missing something here?

    Thanks for thinking with me.

    Bram

  • Ignore disk queue length. It's a meaningless counter, especially with a SAN.

    What's your disk latencies? (avg sec/read, avg sec/write)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Disk latency (read) on the fast SAN is an average 7 ms, on the slower SAN average 32 ms.

    Now why are you saying that disk queue length is a meaningless counter? I would just like to be able to explain this to the customer, before having to tell them to invest xx thousand euro's in a new SAN. It it obvious that the one SAN outperforms the other, and thus has a big impact on application performance. But it seems like there is something else limiting the performance of SQL Server.

    Because what I don't get, is that when I perform an SQLIO test with 64 KB random read IO's the slower SAN manages 580 IO's/s, 36 MB/sec with an average latency of 26 ms. But when I perform a select query on a table with SQL Management Studio, this SAN manages only an average of 41 IO's/s, 2,5 MB/sec with an average latency of 32 ms.

    Now when I compare the results from SQLIO with SQL Management Studio, the latency doesn't differ that much. But the IO's/s and throughput differ a lot. I see that (with 2 threads and 8 IO/s per thread) SQLIO causes a disk queue length of 16, while SQL Management Studio causes only 1. That's why I think that SQL Server seems to limit the speed of the query even more than the SAN does.

    If the disk queue length is a meaningless counter, how then can I explain what causes the SAN to perform slow slow from withing SQL Server, compared to SQLIO? Why doesn't SQL Server 'work a little bit harder' and doesn't it at least request more IO/s per second?

    Thanks again.

  • The problem with disk queue length on a san is that you don't have any idea how many disks are involved, so the queue count isn't a number that allows for apples to apples comparisons, especially if you're comparing one SAN system to another.

    Disk Sec/read & Disk Sec/write are great measures for performance on the disk sub-system. Also check out sys.dm_io_virtual_file_stats and look for ii_stall_read_ms values. These show how long processes are waiting for reads. There's one for writes too. You have to combine this with the sample_ms which shows the time measured. You can use this down to a particular file to see where slowdowns are occuring. Also look to sys.dm_os_wait_stats and look for any of the wait types that start with pageiolatch.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • bram.krol (2/5/2012)


    Disk latency (read) on the fast SAN is an average 7 ms, on the slower SAN average 32 ms.

    Yup, definitely a disk latency problem. Where and why is a whole nother matter. Something's causing that SAN to respond slowly, I suspect the investigation needs to be done at the SAN level, find out what is causing one SAN to take 4x the time for an IO operation.

    Now why are you saying that disk queue length is a meaningless counter?

    It's not a personal opinion. Anyone involved in performance (especially storage performance) for the last several years has been saying that. Firstly because there's so much between the disks and the OS these days that the old '2/spindle is max' rule is totally meaningless. The other reason is that SQL uses async IOs. It doesn't sit waiting for the IO, it goes on and does other stuff (especially if that's a read-ahead IO), it's also quite capable of issuing thousands of IOs for data that it does not need right now but will need later and letting them process in the background.

    Why doesn't SQL Server 'work a little bit harder' and doesn't it at least request more IO/s per second?

    In short, because it doesn't think it needs to. From SQL's perspective, the fewer IO requests the better. It would rather issue 1 IO request for say 128 pages than to issue 8 separate IO requests for those 128 pages. One of the features of enterprise edition is larger IOs, up to (I believe) 1000 pages at a time.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hey Bram - do your customer a favor and hire a professional performance tuning consultant to find and fix the cause(s) of bad performance at this client. You can also benefit from mentoring from said person to enable you to more effectively find and fix the problems - and know WHY they are occurring instead of focusing on things that are red herrings. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply