SQL 2008 R2 performance monitoring and troubleshooting

  • Hi folks,

    I'm a Network Admin that has been assigned the task of evaluating our SQL server for performance issues...and when I say assigned I mean my manager came to me and said "here its your problem now...fix it".

    Background:

    We are a Microsoft shop, the system in question runs on windows 2008 R2, 4 CPUs, 16GB of memory and its visualized with Microsoft HyperV 3.0 (Windows 2012), SQL 2008 R2 using windows authentication.

    The host server this virtual server sits on is a Quad CPU 64 core (total), 256GB of RAM with 14 NIC ports (2 10GBit), the virtual disk sit on a DELL Equalogic iSCSI based storage system.

    The problem:

    The primary program that utilizes the SQL server in question is Financial Edge, created by Blackbaud. This program is accessed through Citrix (Terminal Services).

    Users have been complaining extensively about slow query processing times, and even claiming to have queries take 6-8 hours (cant verify this as they never call IT when it happens). Our department has TRIED the obvious things have them recreate the query, which does work in some cases. unfortunately management is now saying there end users have no time to troubleshoot....:crazy: ....and they think the problem is with our systems and not a end user issue...typical

    What to do:

    Unfortunately I know very little on SQL in terms of performance tweaking, as I mainly focus on Visualization. So I did a little digging around and found PAL, it has performance counters i can use with SQL 2k8 R2. In my first phase of testing I ran it 4 times a day at 30 intervals for 1 hour. Doing this i was able to narrow down peak times when the SQL server gets hit the most as well as possible DISK I/O issues, unfortunately this is the most Ive been able to "decode" from the performance report.

    Among the alerts where:

    ◦SQLServer:Access Methods FreeSpace Scans/sec (Alerts: 2)

    ◦SQLServer:Access Methods Page Splits/sec (Alerts: 1)

    ◦SQLServer:Access Methods Index Searches/sec (Alerts: 2)

    ◦SQLServer:Access Methods Worktables From Cache Ratio (Alerts: 30)

    ◦SQLServer:Buffer Manager Page life expectancy (Alerts: 30)

    ◦SQLServer:Buffer Manager Page lookups/sec (Alerts: 7)

    ◦SQLServer:Buffer Node Page life expectancy (Alerts: 30)

    ◦SQLServer:Databases Log Flush Wait Time (Alerts: 91)

    ◦SQLServer:Databases Log Flush Waits/sec (Alerts: 2)

    ◦SQLServer:Databases Log Growths (Alerts: 60)

    ◦SQLServer:Deprecated Features Usage (Alerts: 960)

    ◦SQLServer:Latches Latch Waits/sec (Alerts: 30)

    ◦SQLServer:Locks Lock Requests/sec (Alerts: 5)

    Not sure if this alone is useful or i need to attach the entire report (which I will in a few)

  • http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-1/

    http://www.simple-talk.com/sql/performance/finding-the-causes-of-poor-performance-in-sql-server,-part-2/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Have a look at the wait statistics on the server

    SELECT TOP 15

    wait_type ,

    wait_time_ms ,

    signal_wait_time_ms ,

    wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,

    100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( )

    AS percent_total_waits ,

    100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( )

    AS percent_total_signal_waits ,

    100.0 * ( wait_time_ms - signal_wait_time_ms )

    / SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits

    FROM sys.dm_os_wait_stats

    WHERE wait_time_ms > 0 -- remove zero wait_time

    AND wait_type NOT IN -- filter out additional irrelevant waits

    ( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH',

    'SQLTRACE_BUFFER_FLUSH','CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',

    'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK', 'SLEEP_BPOOL_FLUSH',

    'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT', 'FT_IFTSHC_MUTEX',

    'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',

    'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',

    'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',

    'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',

    'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',

    'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS',

    'WAITFOR', 'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN',

    'RESOURCE_QUEUE' )

    ORDER BY wait_time_ms DESC

  • SQLSACT (5/21/2013)


    Have a look at the wait statistics on the server

    Thanks I will try that now.

    I've read through the links, thanks. It seems in the first step its pulling the same statistics the PAL program i used disk I/O, CPU, memory ect, the problem is this seems geared to isolating and indexing specific queries, in my case there could be hundreds of user created queries that in them selves could be a problem. Ill go through the info a few more times to see if there is something i can apply to my scenario.

  • And after working through Gail's excellent articles, I'd suggest getting a copy of my book. The 2012 one will be mostly applicable to 2008R2.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • jeremy.jamieson (5/21/2013)


    It seems in the first step its pulling the same statistics the PAL program i used disk I/O, CPU, memory ect,

    No, it's not.

    PAL pulls the disk, CPU, memory statistics. That article pulls the queries that are running against SQL Server and figures out which of the queries used the most CPU, the most reads.

    Tuning a SQL server involves identifying the problematic queries and tuning them, there's no magic buttons or settings, there's no quick fix, you need to identify the slow queries (or the ones using lots of CPU or doing lots of reads depending on your problem) and you need to fix them to run faster/use less CPU, etc.

    It's something that takes practice and experience, maybe suggest to your manager you get in a good SQL consultant who can teach you this stuff and fix the worst of the problems.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/21/2013)


    jeremy.jamieson (5/21/2013)


    It seems in the first step its pulling the same statistics the PAL program i used disk I/O, CPU, memory ect,

    No, it's not.

    PAL pulls the disk, CPU, memory statistics. That article pulls the queries that are running against SQL Server and figures out which of the queries used the most CPU, the most reads.

    Tuning a SQL server involves identifying the problematic queries and tuning them, there's no magic buttons or settings, there's no quick fix, you need to identify the slow queries (or the ones using lots of CPU or doing lots of reads depending on your problem) and you need to fix them to run faster/use less CPU, etc.

    It's something that takes practice and experience, maybe suggest to your manager you get in a good SQL consultant who can teach you this stuff and fix the worst of the problems.

    ahh k I got you .... funny enough the first thing I suggested when I got this was a SQL consultant as I really don't like "toying" around stuff I'm not very familiar with.

    The big problem with isolating the trouble query is that there are so many and we have found that for some reason older queries which ran fine in the past have issues now. if the end user takes some data out of the query it runs in a timely manner. Obviously this is an issue with the query, and the end user needs to refine or recreate it; unfortunately this is not an acceptable solution to my manager (CFO), so you can understand how frustrating this is. I am trying to be proactive and rule out a back end issue.

    We have already talked about moving the SQL logs and database directly onto the iSCSI storage rather than using cluster storage, and even moving the same files onto SSD. Im going to push for the consultant, but in the mean time I will go through and try some of the suggestions in those two links and see what i can come up with.

  • What is among your top waits?

  • SQLSACT (5/21/2013)


    What is among your top waits?

    This is the result of the query, keeping in mine this is not the peak hours of use with our systems

  • jeremy.jamieson (5/21/2013)


    The big problem with isolating the trouble query is that there are so many and we have found that for some reason older queries which ran fine in the past have issues now.

    Increasing data volumes and resultant increasing resource usage by the queries. Hardly unusual or unexpected.

    If you read through that article, it's not about identifying the single problematic query. It's about identifying the queries (plural) that are currently using the most resources and tuning them to use less, then repeating with the next highest resource using query and repeat until performance is acceptable. This isn't something you do once and then everything's fine. It's something you do regularly.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • jeremy.jamieson (5/21/2013)


    SQLSACT (5/21/2013)


    What is among your top waits?

    This is the result of the query, keeping in mine this is not the peak hours of use with our systems

    The waits stats there aren't all that useful as they're cumulative since SQL started. As as result it's very hard to see from that what the problem currently is.

    Can you run that query twice during the peak time, maybe an hour apart and take the difference of the wait times and waits and post that. I'll give a more focused view of what's actually happening.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Will do, that's coming up in about 2 hrs, Ill let you know, thanks for the help. I'm going through the articles again and it actually is making sense to me, thanks again

  • So heres what i have, this was taken at 3:50pm

    4:30pm (only chance i got to get back on the server)

  • Those results are not conclusive enough to point you in a decent direction as to where your performance problems are.

    You could look at your worst performing queries that are still in cache, might be some tuning opportunities there.

    SELECT top 20

    SUBSTRING(t.text, ( s.statement_start_offset / 2 ) + 1,

    ( ( CASE statement_end_offset

    WHEN -1 THEN DATALENGTH(t.text)

    ELSE s.statement_end_offset

    END - s.statement_start_offset ) / 2 ) + 1)

    AS statement_text,

    text,

    objtype,

    cacheobjtype,

    usecounts,

    last_execution_time,

    total_worker_time,

    total_worker_time / execution_count AS [Avg CPU Time],

    execution_count ,

    qp.query_plan

    FROM sys.dm_exec_query_stats AS s

    inner join sys.dm_exec_cached_plans cp

    on s.plan_handle = cp.plan_handle

    CROSS APPLY sys.dm_exec_sql_text(s.sql_handle) AS t

    CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) qp

    order by total_worker_time desc

  • While the plan cache is a decent rough idea, it is not guaranteed to have all the queries that executed, nor accurate execution characteristics for them, as the stats are cleared when the plan is removed from cache. As such, I still usually recommend SQL Trace or extended events to get a complete picture of what was running on the server over a period of time.

    As for the waits, can you copy the results to a text file and attach the text file. I can't query a picture...

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply