SAN IOPS up 3x after upgrade to 2008R2

  • We've noticed reduced performance on our ERP in recent weeks. Infrastructure have just shown me a graph of SAN. read/write/total iop activity over the past 12 months on the SAN. There is a slow, steady increase (to be expected), and then at the end of April the total IOPS jump from 400 to 1200.

    For the rest of may (ie, to current) the IOPS graph is extremely spikey, oscillating between 500 and1200 IOPS, in an approximately 50% share.

    We upgraded from SQL 2005 to SQL 2008R2 in the last week of April.

    I'm wondering whether there is some huge switch I've missed when doing the upgrade. Something that changes behaviour in 2008R2. Some big-*** switch or set of switches that affect overall performance characteristics (ie, we're not talking about specific query tuning or anything like that here, this is a 3x increase in average SAN activity).

  • Sounds silly but you have checked you have the same indexes on the new SQL system....

    Dean.

  • The server is (almost but not quite exclusively) for the hosting of an ERP system (third party). The upgrade was an in-place upgrade of the cluster. So yep, all objects on disk and all objects hosted by SQL Server are identical (with the possible exception of the changed full text index structure but as it happens we do not have full text indexes).

    I have considered the new "optimize for ad hoc workloads" option as one possibility.

    I have also noticed that the nightly maintenance plan was taking far longer to complete (up into business hours). This plan was not "smart" with the index optimization step, it just reorganized essentially everything on the system every night. So I would be interested if I heard that the reorg process was different in 2008R2 compared to 2005.

  • Did you do any maintenance on the indexes or statistics after the upgrade?

  • Yes, and maintenance plans run every night (reorg and update stats)

  • 1) not sure if it is required when migrating from 2005 to 2008+, but for earlier upgrades one MUST update ALL statistics with FULL_SCAN. maybe you can try that.

    2) Reverify configuration settings on all hardware and software stack. Ensure all firmware/drivers patched up?

    3) Have you shrunk the database by chance?

    4) do a fileIO stall and waitstats analysis to see where the slowness lies.

    5) check for massive-read plans and tune

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Ooh, I like number 4 (but I doubt I can get away with it 😉 )

  • allmhuran (6/2/2011)


    Ooh, I like number 4 (but I doubt I can get away with it 😉 )

    Does that mean you are afraid it will harm performance of server (it won't) or your company won't hire a consultant to help you out with it? :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I'm sure you have this but - a blanket value for iops on a san is not helpful - you should break that down by lun/database and read writes. if you know the value has increased you should know where? the values you quote are very low anyway, my t-log drive does over 400 iops.

    Anyway you should work out if this is reads or writes - technically you should have as few reads as possible as these would indicate physical reads whereas you really want reads to be from cache.

    I have monitoring on each lun, it records reads, writes, write latency & read latency at 5 sec granularity, I store 6 months data and have summarised data ( 20 sec granularity ) for ever ( well since start ) this means I can track anything like this without too much issue. To be honest unless you have this type of monitoring you'll be blind and subject to every possible performance tuning suggestion in the world!

    So points: - did you do any tests before the upgrade?

    did you set a series of benchmarks to compare?

    do you have historical performance data?

    I can't say I've noticed any issues 2005 to 2008 / R2. As always the optimiser will be different so you should evaluate queries and indexes for chnages.

    Have you checked ( here's the start of the list !! )

    parallelism vs waits

    page life

    memory settings

    any non out of the box settings

    finally you say iops are up but how about performance? have you checked for suggested missing indexes and so on and so on.

    I think you'll need to be a little more specific on your problems.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • TheSQLGuru (6/3/2011)


    allmhuran (6/2/2011)


    Ooh, I like number 4 (but I doubt I can get away with it 😉 )

    Does that mean you are afraid it will harm performance of server (it won't) or your company won't hire a consultant to help you out with it? :hehe:

    Sorry, when I first read that I misread it as a suggestion to actually create an artificial IO interrupt and see what happens to various processes. 😀

  • colin.Leversuch-Roberts (6/3/2011)


    I'm sure you have this but - a blanket value for iops on a san is not helpful - you should break that down by lun/database and read writes[/qote]

    Yeah, this was the value for the data drive used by the database in question. Sorry if that wasn't implied 🙂 . if you know the value has increased you should know where? the values you quote are very low anyway, my t-log drive does over 400 iops.

    Anyway you should work out if this is reads or writes - technically you should have as few reads as possible as these would indicate physical reads whereas you really want reads to be from cache.

    Read is very high, this is why I was wondering about differences in the ad hoc query or buffer engine components.

    In any case, I've found the cause by turning on performance data collection, but not the underlying cause (although I can make guesses). There was a particularly poorly written, long executing query that executed via an automation process. What's odd is that prior to the 2k8R2 upgrade it didn't result in such heavy disk read. I can only guess that the upgrade has done something to the buffers/plan cache/statistics such that this ticking time bomb of a query has finally gone off. I sent the vendor a rewrite which takes logical reads down to about 1/4 of what they were according to statistics IO, and physical down slightly more (at the time of execution). We will see what happens to the SAN stats over the next few days.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply