SQL Server Hits 100% CPU and nothing works

  • We are recently seeing an issue with our SQL server (2008R2) instance hitting almost 100% and locking out everything. The servers console is responsive enough to get in a an restart the services but we are a at loss as to what is causing this.

    I have been thrown in at the deep end on this and have no experience managing SQL server apart from a bit of add users / databases and some SQL experience. So I am relying on google to try to fix this but am not getting very far.

    I have read some articles on logging the processes that are running and sending emails when the load average gets to high. I currently have the trigger set to 20% but some time I get nothing at all so it suggests that the load is just getting to high very quickly before it has a chance to send emails.

    Is there anywhere I can look before I restart the services that does not need SQL Management Studio (I can't connect or do anything inside an active session once the server gets to this state) or after I have restarted to track down where the problem lies.

    The latest logs show that everything worked until 02:00 today and then the first error after that is at 02:01 - [298] SQLServer Error: 258, Shared Memory Provider: Timeout error [258]. [SQLSTATE 08001]

    Around that time I also have a BugCheck dump that reads

    Memory

    MemoryLoad = 95%

    Total Physical = 24565 MB

    Available Physical = 1070 MB

    Total Page File = 49129 MB

    Available Page File = 20455 MB

    Total Virtual = 8388607 MB

    Available Virtual = 8362927 MB

    **Dump thread - spid = 0, EC = 0x0000000000000000

    ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\LOG\SQLDump0011.tx

    t

    * *******************************************************************************

    *

    * BEGIN STACK DUMP:

    * 12/16/15 02:00:50 spid 3280

    *

    * Non-yielding IOCP Listener

    *

    * *******************************************************************************

    * -------------------------------------------------------------------------------

    * Short Stack Dump

    There are further logs but they are too big to post.

    Any help is much appreciated.

    Thanks

    Alastair

  • The trick is going to be identifying what is causing the so much CPU to be used. Since you're just getting started and you're on 2008, the best approach is to set up a server-side trace. I'd suggest reading Gail Shaw's articles. Part 1[/url] and Part 2[/url]. That's going to be the easiest way to get going identifying the root cause.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I would look at the basics first.

    1. Is there anything else other than your SQL2008R2 instance on the server?

    2. Is the server a VM?

    3. Does the maximum memory in your instance allow enough memory for Windows?

    etc.

  • Don't stop/restart the SQL Server service or you will lose cached diagnostic metrics which is essential for diagnosing this type of issue using DMV queries.

    This step is optional, but if the server is currently maxed out on CPU thus causing you trouble connecting, then connect using the Dedicated Administrator Connection.

    How to: Use the Dedicated Administrator Connection with SQL Server Management Studio

    https://technet.microsoft.com/en-us/library/ms178068(v=sql.105).aspx

    This step is also optional. If the server is currently maxed out and this is interfering without you running diagnostics, then set the database to RESTRICTED_USER mode, which will drop all non-admin connections.

    Understanding Restrict Access to Restricted_User Database Property

    http://blog.sqlauthority.com/2013/08/04/sql-server-understanding-restrict-access-to-restricted_user-database-property/

    While logged into SQL Server instance as SYSADMIN, execute the DMV query described in this article. What it will do is return the TOP 10 cached execution plans by order of CPU utilization. It's not just looking at running queries, it's looking at cached execution plans since the last time the server was restarted. Included are some metrics along with text of the offending SQL statements.

    Top 10 CPU-Consuming Queries in SQL Server

    https://seniordba.wordpress.com/2015/07/26/top-10-cpu-consuming-queries-in-sql-server/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (12/16/2015)


    It's not just looking at running queries, it's looking at cached execution plans since the last time the server was restarted.

    It's looking at queries whose plans are still in cache, not plans that have been cached (and potentially discarded) since the last time SQL started. It's a decent start, but it can easily miss things. It'll never show queries that have a RECOMPILE hint, queries in a proc created WITH RECOMPILE and anything that's had its plan discarded and compiled recently (due to memory pressure, stats, index changes, etc)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks,

    A lot of reading to be done, but looks helpful. I'll need to be able to get a connection to the DB to run this but it looks like that has been covered in a later reply.

    Alastair

  • It is a dedicated physical machine with 24GB of RAM. Apart from SQL / SQL Agent and reporting services (used by one application) nothing else runs on the server at all.

    I have discovered the max memory setting and will be changing that later today once it has been approved. I'll be setting it to 21GB and monitoring the free RAM. I got the info from a 2009 post by GlennBerry.

    The max memory is currently still set to the default values - no one here has any SQL training so things are installed with defaults everywhere. Hopefully I'll get some courses in the new year and be able to understand this a bit better.

    Alastair

  • Eric

    Thanks for the tips on not stopping and being able to get a connection, it is just what I needed. I'll test this on our dev server and use it if / when we get the problem again. I'm hoping the max memory does the trick.

    A combination of the max memory, being able to login and getting the info about current cachedsee what is happening and getting info about current execution plans should get to the root of the problem.

    I have a lot to learn!

    Thanks

  • Just be aware that SQL Server will use up every bit of memory you give it and that's not a bad thing. The real concern is if you're seeing wait stats around memory, not simply the amount used.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I suspect 18GBytes for the maximum memory would be a better place to start although heuristics for this tend to vary quite a bit.

    I have only ever used trial and error to get the best memory settings; I would be interested if anyone can suggest a more scientific approach.

    p.s. I do not have much experience of Reporting Services as it is not used much here. Where it is used, I have always ensured that it is not installed on the DB server as I suspect it may use a fair bit of memory and CPU to render images.

  • Also, confirm if SSIS / SSRS / SSAS or something like an AntiVirus service may be running on the box alongside SQL Server. Except for a high volume OLTP database, it's unusual to see SQL Server max out the CPU at 100 and then stay in a holding pattern for any significant length of time.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • This happened to us. We were running SQL Server on a 16GB TS server (don't ask) and one day SQL Server just pegged the CPU. Impressive, considering this machine has dual Xeons and an otherwise light load. Did it every single day after that too.

    Our fix was to add another 64GB of memory. 🙂 Worked like a champ. SQL Server 2008 R2 typically uses 6GB of memory on our server, but can spike up to 40GB for backups. Don't know why it solved the problem, and frankly don't care.

    Don't know if you can add memory to the box or not, but it's bound to help.

  • Is your system in use 24 hours a day? Do you have any scheduled jobs starting at 2am, specifically, any new or recently changed jobs? I find the 02:01 time of the error message a little suggestive. Also, is this happening throughout the day? A log of when it happens could be useful to you, especially if it ties the CPU pegging to a scheduled job.

    You have a lot of good advice in this thread, it'll just take some digging. Just because it's a good idea as a normal practice, I would make sure that backups are running and can be restored, just in case you have to dig in to some risky areas.

    And remember that logs are your friend. 🙂

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Thanks for all the help.

    I changed the memory settings last night and all looks good so far although tomorrow morning will be the test. I have it set to 21GB but we have more RAM on order so I can increase this as soon as possible and also have more free available. The OS is only standard so I'm limited to 32GB, something else planned for a future upgrade.

    The reporting services does not appear to be an issue, I have spoken to the vendor who had this set up and it is only used during the day, i.e. no scheduled jobs running. I've had this tested today and the CPU never went above about 20% with everything else running as normal.

    There is Antivirus running on the server and I'm working on getting the configuration adjusted so it isn't scanning the database files themselves. I can't get it remove completely. Looking back at logs this has only been happening since a security change on the network, I have stopped a couple of the scanning processes that have caused issues elsewhere but it made no difference, I believe there was some auditing change made which I am looking into. I can't believe auditing would this effect so I am also trying to find out about any other changes we have not been told about.

    The issue generally only occurs between 02:00 and 08:00 and I can't tie it to any scheduled job or other event, nothing around the jobs has changed recently apart from the addition of some monitoring I add after the problem started.

    Thanks again for all you help, it has given me lots more places to look and identify issues. I'm also starting to look at what the individual databases do and how badly some are optimized in terms of indexing or lack of in many cases. Lots of work for me in the future!

    Alastair

  • You definitely don't want antivirus scanning any *.?df files, I don't know if there's any other file extensions for SQL Server that should be excluded. You can also find out when the antivirus is updating its definitions, I've seen AV update at inconsistent times. You could also have the AV configured to skip the database and log drives/directories.

    Sounds like you have some job security sitting there! I'd recommend you take a look at the Redgate Software site, they have several useful free ebooks that will help you out long-term as a DBA, I'd specifically recommend the book Troubleshooting SQL Server: A Guide for the Accidental DBA by Jonathan Kehayias and Ted Krueger and SQL Server Tacklebox by Rodney Landrum. There's also the Stairways series of articles here on this site, just look at the left margin links.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply