Performance issue between two "almost" identical SQL Servers - query speed

  • I have the situation where users are complaining more and more about poor performance of SQL Server, timeouts, etc.

    In order to try and discover what the issue was I took the table that was causing the issue (together with a few others) complete with data and indexes and made copies in a different database on both servers. This, I hoped, would eliminate differences in page and index fragmentation and also ensured that both servers had the same data to work with. Indexes are defragmented on a monthly basis to help keep them in check.

    The production server is a quad core 32-bit 2.5Ghz processor with 4Gb RAM running Windows 2003 server R2 SP2

    The second server is a quad core 32-bit 3Ghz processor with 4Gb RAM running Windows 2003 server R2 SP2

    The main two difference between the two setups is that the production server is on a virtual machine with drives on a high-speed SAN, the second machine is on a physical box using RAID drives, also the production server is running SQL Server 2008 standard, whilst the second server is running SQL Server 2008 developer.

    The tests I ran were simple enough, but highlighted an issue.

    I ran a series of the same selects and updates against both systems and monitored the CPU time the query took, the read and writes, the duration of the request and the delay between processing each batch.

    As the second server has no users, I simulated a medium load to match the average user throughput, I also ran the tests out of hours when only bare minimum user traffic was being processed (maybe a request every half hour or so).

    In each case the processor time was fractionally less on the second server (I expected this due to the faster processor), reads and writes were the same, however each batch was taking far longer to complete on the production server than on the second server, even with minimal user traffic). In fact one test I ran took around 20 minutes on the second server, yet took over 4 hours on the production server. The delay between the batches was also higher on the production server but nothing that would give me cause for concern.

    We have even tested all the applications that we use against both servers, and in these the applications fly at all times when connected to the second server.

    One thing I did notice was when I ran an SSIS package which reads and processes a number of pipes. On the production server I was getting a lot of AWAITING RESOURCE issues in the processes which I was not getting on the second server. the package was merely reading from one database, transforming the layout of the fields and writing it out to another database.

    I suspect that the databases are probably as optimized as they can be or else I would see similar issues on both boxes (the second server has copies of the production databases), and that the issue lies somewhere within the infrastructure - but not being a network guy and to be honest, having very little knowledge of how to monitor SQL Server fully, I have no idea where to start narrowing the problem down.

    Can anyone offer any advice?

    Thanks in advance.


    ---------------------------------------
    It is by caffeine alone I set my mind in motion.
    It is by the Beans of Java that thoughts acquire speed,
    the hands acquire shaking, the shaking becomes a warning.
    It is by caffeine alone I set my mind in motion.

  • i would say before moving to conclusion that database is optimized at its best level. we should try few things.like

    database settings.

    network traffic on both servers.

    data volume (same in your case )

    on query level wait stats see. http://www.sqlservercentral.com/articles/Administration/wait_types/856/

    execution plan , whether you are getting same on bother servers or not ?

    Some DMV might be helpful for thud case

    like select * from sys.dm_os_wait_stats

    Sys.dm_os_sys_info,Sys.dm_tran_database_transactions :

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Rayven (8/16/2010)


    The main two difference between the two setups is that the production server is on a virtual machine...... with drives on a high-speed SAN.

    That made be sit up.

    Have you done any IO performance monitoring? I would as, it's often the slowest link in the whole setup.

    Is the SAN being shared with other applications or is it dedicated for the sole use of the SQL Server? My guess is that, seeing it's a virtual machine there are likely to be other virtual (abd/or physical) machines connected to the same SAN.

    I've often seen raid 5 sata drives outperform a SAN because the SAN wasn't configured optimally or other applications were hammering it to the point of rendering it useless.

    Anyway, I'd start by looking at the SAN and the IO performance. Try and eliminate the network too by running statements directly on the boxes.

  • it's probably the SAN. i've seen supposedly high speed SAN's misconfigured where they perform a lot slower than regular server disks.

    in the end the hard drives in the SAN are the same ones as you buy from Dell/HP and you need to control which apps and servers hit which spindles.

    one time years ago we had a SAN that was misconfiguration and every last big of space was provisioned instead of buying new storage for it. this caused a huge performance hit. a year or so before we got rid of it we bought a batch of new storage for it and i hooked up a QA server to it for some testing. i ran alter index on a few large indexes and the new storage performed 3 times faster. same SAN similar server configuration, but the production volumes were misconfigured and performance suffered

  • Yeah SAN can be one of the reasons... but I'll go with what Bhuvnesh suggested..... i.e first look at the execution plan on both servers and make a note of the difference....

    Then you can start looking as why that diff is there......

    Rohit

  • It's going to be hard to say what the issue is based on the information supplied. First, I'd look at the production wait states and queues. Figure out what is causing stuff to run slow. I think contention on the production system is certainly a possibility.

    Next, I agree that looking at the execution plans is in order. You copied the tables & indexes and are running the same queries, but since you didn't do a straight backup of the database to the other server, you have different statistics on those tables and indexes. This can result in different execution plans. You may have all sorts of issues around statistics and plan compiles/recompiles, but when you're not comparing identical databases with identical settings, it's hard to figure out that's where the issues lie until you look at the execution plans.

    Finally, I'd want to see a more complete baseline, showing CPU, IO, etc., over a fairly long period of time to determine when and where the production system is running slowly.

    Based on the information provided, I would not assume that you don't have code or structure issues within the database.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Many thanks for the comments and postings so far.

    I've spent the day avalysing various things, and at least now I can eliminate the network traffic (thanks Karl - it never even crossed my mind).

    I'm slowly whittling down what it isn't, the execution plans are the same, wait stats seem okay (though I'm still trying to crash course myself on what they all mean).

    I'll forward the information about the SAN onto our network manager to see what he can find out (unfortunately, I just look after the database and what I can get to through SSMS, anything to do with the network, hardware, traffic, drives, etc., is out of my hands :crazy: )

    Grant, after what you said I'm in the process of transferring the database backup onto the other server and I'll run my tests again - thanks for this.

    Thanks again fort the advice so far.


    ---------------------------------------
    It is by caffeine alone I set my mind in motion.
    It is by the Beans of Java that thoughts acquire speed,
    the hands acquire shaking, the shaking becomes a warning.
    It is by caffeine alone I set my mind in motion.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply