SQLServerCentral Article

How to Monitor for High CPU utilization in SQL Server

,

How to Monitor CPU usage over time

Have you ever had a SQL Server become unresponsive? Did your ever take a phone call from a user telling you that their application is down? Of course you have! We are DBA's and we are always on the front line of application troubleshooting. Most companies or clients accept what I like to call "reactive troubleshooting". Reactive troubleshooting is figuring out what is wrong after the application is down. However, I prefer to use "proactive troubleshooting". It is always better to know what is going to happen before it actually happens. Unless your company has shelled out big bucks for 3rd party tools that will help you with your "proactive troubleshooting", you as a DBA are going to be on your own to create process that does this for you.

Having a server crash because the CPU is hung at 100% is sometimes difficult if not impossible to troubleshoot. Once the server is down, the counters are usually gone. Of course, you could set up Performance Counters to export their data overtime, but that means you are going to have to wait for another crash to be able to troubleshoot. Not something I like to explain to management.

I would like to share with you how I monitor CPU usaage overtime and alert me to trouble when it is occuring. I have done this for over 10 years on hundreds of SQL Servers from Windows 2000 and SQL 2000 to Windows 2008 R2 x64 and SQL 2008 R2 x64.

High Level Overview

The way I monitor CPU usage over time involves 3 seperate steps.

  1. We place a small Visual Basic Script file on the server to monitor.
  2. We create a SQL Server Agent job to run that small VB script program.
  3. We create a SQL Server Agent Alert to listen to what the VB script does.

Assumption

There are two assumptions I will make. One is that you have DB Mail setup and working with a valid Operator. The other is that your account that runs the SQL Server Agent has the ability to use SQLCMD to connect to your instance with Windows authentication. If your SQL Agent cannot do this you will need to modify the VB script. If you need help with that, drop me an email.

Step 1

I have always used VB scripts to help me do things SQL Server cannot do alone. I understand there are many other ways to do things, but I am a bit long in the tooth and this is what I know. The VB script that we use looks like this;

On Error Resume Next
HighMark = 90 ' 90 Percent
Violation = 0 ' counter for the number of times CPU hits the high mark
Set objService = GetObject("Winmgmts:{impersonationlevel=impersonate}!\Root\Cimv2")
' Check it 10 times within 1 minutes and if it consistently above the high mark, raise alert
i = 0
 Do While i < 1
 Set objInstance1 = objService.Get("Win32_PerfRawData_PerfOS_Processor.Name='_Total'")
 N1 = objInstance1.PercentProcessorTime
 D1 = objInstance1.TimeStamp_Sys100NS
 ' Sleep for 2 seconds = 2000 ms
 ' This sleep is to capture the two time stamps
 WScript.Sleep(2000)
 Set perf_instance2 = objService.get("Win32_PerfRawData_PerfOS_Processor.Name='_Total'")
 N2 = perf_instance2.PercentProcessorTime
 D2 = perf_instance2.TimeStamp_Sys100NS
 PercentProcessorTime = (1 - ((N2 - N1)/(D2-D1)))*100
 If PercentProcessorTime > HighMark Then
 Violation = Violation + 1
 If Violation = 10 Then
 RaiseError "CPU utilization is higher than 90 Pct for the last minute", "1"
 Violation = 0
 End If
 Else
 Violation = 0
 End If
 ' Sleep for 4 seconds = 4000 ms
 ' This sleep is to raise alert only if the CPU high is constant, all 10 times during the check peariod
 WScript.Sleep(4000)
Loop
Sub RaiseError (msg, sev)
Set objShell = CreateObject("WScript.Shell")
query = "raiserror(" + "'" + msg + "'" + "," + sev + "," + "1) with log"
progStr = "sqlcmd -E -q " + """" + query + """"
objShell.exec(progStr)
End Sub

In order to use that script, you just have copy and paste the code into a file with a .vbs extension. Remember that in Windows, in order to save that file with something like notepad, you have to choose "Save as type: All Files". The default behavior is to use a .txt extension. That of course will not work.

I have always named mine CPUpct.vbs

Hey, what is this VBS script doing?

Let's break it down.

On Error Resume Next
HighMark = 90 ' 90 Percent
Violation = 0 ' counter for the number of times CPU hits the high mark
Set objService = GetObject("Winmgmts:{impersonationlevel=impersonate}!\Root\Cimv2")
' Check it 10 times within 1 minutes and if it consistantly above the high mark, raise alert

HighMark = 90 means the highest you want to see your CPU run at is 90%. You can set that number to anything you would like. If you feel 65% is too high, set the 90 to 65.

Violation = 0 is a starting point for the number of violation that occur during the while loop. You do not need to edit this value.

i = 0
 Do While i < 1
 Set objInstance1 = objService.Get("Win32_PerfRawData_PerfOS_Processor.Name='_Total'")
 N1 = objInstance1.PercentProcessorTime
 D1 = objInstance1.TimeStamp_Sys100NS
 ' Sleep for 2 seconds = 2000 ms
 ' This sleep is to capture the two time stamps
 WScript.Sleep(2000)
 Set perf_instance2 = objService.get("Win32_PerfRawData_PerfOS_Processor.Name='_Total'")
 N2 = perf_instance2.PercentProcessorTime
 D2 = perf_instance2.TimeStamp_Sys100NS
 PercentProcessorTime = (1 - ((N2 - N1)/(D2-D1)))*100
 If PercentProcessorTime > HighMark Then
 Violation = Violation + 1
 If Violation = 10 Then
 RaiseError "CPU utilization is higher than 90 Pct for the last minute", "1"
 Violation = 0
 End If
 Else
 Violation = 0
 End If
 ' Sleep for 4 seconds = 4000 ms
 ' This sleep is to raise alert only if the CPU high is constant, all 10 times during the check peariod
 WScript.Sleep(4000)
Loop

Above is where all the fun happens. We need to first set 4 variables to run a formula. The variables are;

  • N1 = the first percentage of processor time. D1 = is the first time stamp.
  • N2 = the second percentage of processor time. D2 = this is the second time stamp.

If at 8:42:01 the CPU was at 94% and at 8:42:03 the CPU was at 97%, the variable would look like this.

N1=4879911796875

N2=4879912578125

D1=129315337212597240

D2=129315337233378623

Now we run the formula,

PercentProcessorTime =(1-((4879912578125-4879911796875)/(129315337233378000-129315337212597000)))*100

That means our PercentProcessorTime = 96.24056 which is greater than the HighMark. Since it is greater than our HighMark, we set the variable Violation = 1. Once the variable Violation = 10, we move onto the next step.

If the variable PercentProcessorTime falls below the HighMark at anytime during the while loop, the variable Violation will drop back down to 0. Since 1 loop takes 6 seconds and the PercentProcessorTime variable must be greater than the HighMark 10 times in a row we know that the CPU was above the HighMark for at least 1 minute.

OK, I am a little lost on the math up there???

Microsoft does not provide a lot of documentation on how or what the values really represent. If you are really interested in what is going on, let me show you. Copy and paste the code below into a seperate file. To keep it simple, lets just name it test.vbs. Remember to make sure it has the proper vbs extension. If you are not intereseted in what is going on, you can just skip ahead to Step 2.

On Error Resume Next
Set objService = GetObject("Winmgmts:{impersonationlevel=impersonate}!\Root\Cimv2")
i = 0
 Do While i < 1
 Set objInstance1 = objService.Get("Win32_PerfRawData_PerfOS_Processor.Name='_Total'")
 N1 = objInstance1.PercentProcessorTime
 D1 = objInstance1.TimeStamp_Sys100NS
 ' Sleep for 2 seconds = 2000 ms
 ' This sleep is to capture the two time stamps
 WScript.Sleep(2000)
 Set perf_instance2 = objService.get("Win32_PerfRawData_PerfOS_Processor.Name='_Total'")
 N2 = perf_instance2.PercentProcessorTime
 D2 = perf_instance2.TimeStamp_Sys100NS
 PercentProcessorTime = (1 - ((N2 - N1)/(D2-D1)))*100
 WScript.Echo "N1 = " & N1
 WScript.Echo "D1 = " & D1
 WScript.Echo "N2 = " & N2
 WScript.Echo "D2 = " & D2
 WScript.Echo "PercentProcessorTime = " & PercentProcessorTime 
 WScript.Echo " "
 WScript.Sleep(4000)
Loop

Once you have it saved as test.vbs, let's run it. Open a command prompt window and run the command 'cscript' . You have to call the VB script by path name unless you are already in the directory. See screen shot below.

You will see what is happening every 4 seconds. As I said the values of N1, D1, N2 and D2 are not really helpful. But after they are run through the formula, it is a nice value shown as PercentageProcessorTime.

OK, I understand, let's move on.

Sub RaiseError (msg, sev)
Set objShell = CreateObject("WScript.Shell")
query = "raiserror(" + "'" + msg + "'" + "," + sev + "," + "1) with log"
progStr = "sqlcmd -E -q " + """" + query + """"
objShell.exec(progStr)
End Sub

Here we have a SQLCMD that uses the raiserror command to send messages to the Windows Event Application Log. This is the step I made one of my assumptions. The -E is saying to use a Trusted Connection. If you need to pass different credentials, edit the SQLCMD command. By using the RAISERROR command this way, it gives us a way to use the SQL Server Agent Alert service. We will get to that step in a bit.

Step 2

So, we have our newly created CPUpct.vbs and we are ready to move on. Let's talk about how I use SQL Server to run this VB script. Since the code is to run a while loop forever, I simply use the SQL Agent to create a job to run the script every 2 minutes. The SQL Server Agent will try to start every 2 minutes, but if the job is already running, it is skipped. If it is not running, it will restart it. That way we ensure that the VB script is always running.

You have to copy the CPUpct.vbs script to the server you are going to monitor. I have an E: drive on all my servers, so I make a directory called E:\Monitor and put the script in there. You can put it where you want though. Please note that if you DO NOT have an E: drive, you will have to edit the script below with the new path you have chosen.

Let's create the SQL Agent job that will run the script with the code below. Open a new Query window in SSMS on the instance you are going to monitor and paste in the script seen below.

USE [msdb]
GO
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'Watch_CPU_Usage', 
 @enabled=1, 
 @notify_level_eventlog=0, 
 @notify_level_email=0, 
 @notify_level_netsend=0, 
 @notify_level_page=0, 
 @delete_level=0, 
 @description=N'No description available.', 
 @category_name=N'[Uncategorized (Local)]', 
 @owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'run CPUpct', 
 @step_id=1, 
 @cmdexec_success_code=0, 
 @on_success_action=1, 
 @on_success_step_id=0, 
 @on_fail_action=2, 
 @on_fail_step_id=0, 
 @retry_attempts=0, 
 @retry_interval=0, 
 @os_run_priority=0, @subsystem=N'CmdExec', 
@command=N'cscript "E:\Monitor\CPUpct.vbs"', 
 @flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'always', 
 @enabled=1, 
 @freq_type=4, 
 @freq_interval=1, 
 @freq_subday_type=4, 
 @freq_subday_interval=2, 
 @freq_relative_interval=0, 
 @freq_recurrence_factor=0, 
 @active_start_date=20100915, 
 @active_end_date=99991231, 
 @active_start_time=0, 
 @active_end_time=235959
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
 IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

This will create a new SQL Server Agent job called "Watch_CPU_Usage". It will automatically start within the next two minutes.

Something to note here. Pay attention to the line that says @command=N'cscript "E:\Monitor\CPUpct.vbs"' That location E:\Monitor is where I placed the vbs script we made. You can put yours wherever you want, but change that line to where you put it. Also in case you do not run your SQL Server in mixed mode authentication, you will have to change the line that says @owner_login_name=N'sa' to a user that can run the job.

Is it safe to run a forever while loop on my SQL Server?

Like I said, it has been over ten years that I have been using this. It has yet to cause any issues for me. The amount of memory is takes to run our little cscript is always less than 6,000K of memory. From a systems memory usage perspective, that not too much to be worried about.

Step 3

Now we have to create a SQL Agent alert to call the job we created. Open SSMS and log into the instance we are capturing Deadlock information for and expand the SQL Server Agent. Right click on the word Alerts and choose "New Alert..." On the General page it should look like this;

The Response page should look like this;

The Options page should look like this;

And that is it. Next time your CPU goes over 90% for 1 minute, you will get an email telling you so. If your CPU hangs over 90% for 30 minutes, you would get 30 emails. Hopefully, that is never the case.

Rate

4.53 (79)

You rated this post out of 5. Change rating

Share

Share

Rate

4.53 (79)

You rated this post out of 5. Change rating