CPU's @ 100%

  • Looking for some advice. Every now and again our SQL box flatlines at 100%. The unfortunate thing is that i don't have much time to trouble shoot it due to the fact that users are either being timed out or cannot connect to the app at all. If i fail the servers over it clears it straight away but i cannot troubleshoot it when its fixed.....

    I've tried all techniques that i've found on google and various blogs / forums but cannot find one single (or possibly multiple) processes causing the issue. I've tried killing all spids in each of the databases (dangerous i know) to see if i can narrow down the problem but even this does not reduce the CPU usage.

    Using profiler is difficult to interpret due to the fact that the issue is not an instant hit, it seems to build up over a short period and everything starts taking longer and longer and increases locking / blocking until it pretty much grinds to a halt. I've checked sysprocesses by cpu and physical_io but nothing is standing out as an issue (even after killing relevant database spids). There are tons of opentrans due to the fact that everything is taking much longer to run.

    Could it be a memory leak (or some form of memory issue...) or an internal process which i cannot kill? When we get this issue we get an error with sql backup (log backups every 30mins) - "SQL Backup job failed with exitcode: 300 SQL error code: 0 [SQLSTATE 42000] (Error 50000)" which from researching this leads to an issue with contiguous free memory (http://apps2.red-gate.com/messageboard/viewtopic.php?t=4150&highlight=&sid=b94d9ff93b1a03844bef10c27f413f6f).

    We do occaisionally have high paging but that lasts for a few seconds and the most.

    Our Dell monitoring software is not showing up any hardware faults.

    Rapidly running out of ideas. I was looking to create a script to capture cpu usage over a short period of time to see which processes are utilising cpu when this situation occurs.

    any help / info much appreciated.

    SQL Spec:

    4 x Intel Pentium 4 Xeon MP - 3GHz

    16GB RAM

    Clustered (Active / Passive) on a SAN

    SQL Server 2005 (Enterprise)

    Windows Server 2003 (Enterprise)

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Do you have any open transaction?

    Try with DBCC OPENTRAN.

  • Also this query will show you CPU pressure:

    SELECT TOP 50 (a.total_worker_time/a.execution_count) as [Avg_CPU_Time],

    Convert(Varchar,Last_Execution_Time) as 'Last_execution_Time',

    Total_Physical_Reads,

    SUBSTRING(b.text,a.statement_start_offset/2,

    (case when a.statement_end_offset = -1 then len(convert(nvarchar(max), b.text)) * 2

    else

    a.statement_end_offset end - a.statement_start_offset)/2) as Query_Text,

    dbname=Upper(db_name(b.dbid)),

    b.objectid as 'Object_ID'

    FROM sys.dm_exec_query_stats a

    cross apply

    sys.dm_exec_sql_text(a.sql_handle) as b

    ORDER BY

    [Avg_CPU_Time] DESC

  • You can run this script to see how much your SQL server utilizing CPU:

    DECLARE @CPU_BUSY int, @IDLE int

    SELECT @CPU_BUSY = @@CPU_BUSY, @IDLE = @@IDLE WAITFOR DELAY '000:00:01'

    SELECT (@@CPU_BUSY - @CPU_BUSY)/((@@IDLE - @IDLE + @@CPU_BUSY - @CPU_BUSY) *1.00) *100 AS 'CPU Utilization by sqlsrvr.exe'

  • cheers for the scripts guys. I've been doing a bit more digging and thinking it could very well be the issue described here: http://www.red-gate.com/messageboard/viewtopic.php?t=4240&postdays=0&postorder=asc&start=0

    The only common theme i can find when we get 100% cpu is this error on our log backups. Whether this error is caused by the maxed out system or potentially the cause of it is yet to be determined....

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • If you have not already tried using it, the SP_WHO2 stored proc can also reveal a lot about what processes have locks or are waiting on locks, and also how much CPU each of them has consumed since it started.

    I was fighting a similar issue a few weeks ago, the ghost_cleanup process would just suddenly start using huge amounts of cpu. We ended up tracing it to some generated code that was using transactions when it didn't need to (e.g. primarily simple read operations) and was also leaving those transactions open.. the SP_WHO2 stored proc is what allowed me to see where all the cpu was going (it might also have revealed deadlock conditions if that had been the problem), although it took getting my developers involved to get it down to the issue of open transactions. (you can bet I'm adding the query for that, which was given above, to my bag of tricks)

    good luck getting to the bottom of it.. 100% cpu can be a 'near death' experience for any server, at least from the perspective of the people trying to utilize that server.

  • also try to run this query to know exactly what queries are running on ther server

    SELECT sqltext.TEXT,

    req.session_id,

    req.status,

    req.command,

    req.cpu_time,

    req.total_elapsed_time

    FROM sys.dm_exec_requests req

    CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext

  • SQAPro (2/17/2009)


    If you have not already tried using it, the SP_WHO2 stored proc can also reveal a lot about what processes have locks or are waiting on locks, and also how much CPU each of them has consumed since it started.

    i've used sp_who2 and queried sysprocesses but when it maxes out the cpu everything starts blocking each other (guessing due to the fact processes are taking approximately 50times longer to run) and causes all sorts of problems 🙁

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Run profiler (or preferably a server-side trace) for an hour or so during the server's busy time. Find the 3 or so queries that are running the longest or using the most CPU and see if you can optimise them. This may be code changes it may be index changes, it'll probably be both.

    If server performance isn't improved after that, repeat the process with the next 3 or so procedures.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster, the general performance of the system is fine, we monitor it daily and look to tune the top 10 worst performing procs constantly. The issue we have is as mentioned above, every now and again (10-14days) the system just locks up. Checking our profile records at around this time is useless because ALL procesdures ran whilst it is in this state are massively inflated and pretty much everything is a poorly performing query.

    Last time it happened we first got reports of the system running a little slow at around 11am (log backups / big import process runs at this time so didn't think too much of it). Checked the cpu's (around 30%) / processes running and all seemed fine, no big issues with locks/blocks. Monitored if over the next half hour and then just after 11:30 (log backups ran again) it ground to a halt. However, this time it didn't flatline like normal, it would spike to 100% for up to 10-15secs then drop to 50% and then spike again.

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • I wonder if you could add an alert for 100 cpu and then fire this off to get the SQL code

    SELECT sqltext.TEXT,

    req.session_id,

    req.status,

    req.command,

    req.cpu_time,

    req.total_elapsed_time

    FROM sys.dm_exec_requests req

    CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext

    in a email ........just a thought.

    I have users who forget to do filters on the tables and decide to do select * on our largest tables...three notices after that it is right click disable acct 🙂

  • Swirl80 (2/18/2009)


    SQAPro (2/17/2009)


    If you have not already tried using it, the SP_WHO2 stored proc can also reveal a lot about what processes have locks or are waiting on locks, and also how much CPU each of them has consumed since it started.

    i've used sp_who2 and queried sysprocesses but when it maxes out the cpu everything starts blocking each other (guessing due to the fact processes are taking approximately 50times longer to run) and causes all sorts of problems 🙁

    Apologies if some of this seems basic or pedantic, but I've no idea of your experience level (or of others who might have similar problems and be reading this now or later)

    'everything blocking each other' is obviously not helping.. but of course you have to ask, is that case of collateral damage, or the real source of the issue..

    Does anything in the results stand out as having used a LOT of cpu time? or are they all about even?. If a handful it items show high counts for cpu usage, that's a good place to start scrutinizing, especially if there are blocks associated with them

    Are their any loops in the blocking? trace all of the 'blkby' stuff and make sure there are no loops, e.g. is A blocked by B and B blocked by A ? or A waiting on B which is waiting on C which is waiting on A.. those could represent deadlocks, and that can be a cause of problems like you are seeing. Sometimes these things happen as a result as a 'race condition' so you can be fine most of the time, but every now and then, the timing is just right and you get the deadlock. (troubleshooting and reproducing race conditions is a PITA)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply