Blog Post

Best way to monitor high worker thread on SQL Server


It is common to see high worker count issue on the production server and it can have a range of negative impacts on SQL Server performance and stability, potentially leading to degraded user experience and even downtime. Some of the common reasons for high worker thread issues in SQL Server.

  • Blocking
  • Sudden workload spike
  • Resource contention

Problem Statement:

When it comes to monitoring worker thread count in SQL Server, most DBAs turn to the sys.dm_os_schedulers DMV, which is the most commonly used and reliable option. However, they often prefer to examine the sum of column current_workers_count from the DMV over a certain period, which may not accurately indicate how long the server was experiencing a high worker thread problem.

Monitor high worker thread count:

To simulate or monitor this, let’s capture worker thread count on SQL Server by executing below script. It will create the following objects inside the system database tempdb.

  • Table tbl_workerthreadcount_info is going to store the historical data.
  • SP sp_captureworkerthreadcount_info is going to capture required worker thread data.
  • Job Monitor_high_worker_thread is going to run every 10 seconds to capture the historical data.
USE tempdb
IF OBJECT_ID('tempdb.dbo.tbl_workerthreadcount_info')>0
DROP TABLE tempdb.dbo.tbl_workerthreadcount_info
IF OBJECT_ID('tempdb.dbo.sp_captureworkerthreadcount_info')>0
DROP PROCEDURE dbo.sp_captureworkerthreadcount_info
--Create table
CREATE TABLE [tempdb].[dbo].[tbl_workerthreadcount_info](
[capturedatetime] [datetime] NOT NULL,
[total_current_workers_count] [int] NULL,
[total_current_tasks_count] [int] NULL,
[total_active_workers_count] [int] NULL
--Create SP
CREATE PROCEDURE [dbo].[sp_captureworkerthreadcount_info]
INSERT INTO [tempdb].[dbo].[tbl_workerthreadcount_info]

GETDATE() capturedatetime,
sum(current_workers_count) total_current_workers_count ,
sum(current_tasks_count) total_current_tasks_count,
sum(active_workers_count) total_active_workers_count
FROM sys.dm_os_schedulers
--Create Job
USE [msdb]
IF  EXISTS (SELECT job_id FROM msdb.dbo.sysjobs_view WHERE name = N'Monitor_high_worker_thread')
EXEC msdb.dbo.sp_delete_job @job_name=N'Monitor_high_worker_thread' --, @delete_unused_schedule=1
/****** Object:  Job [Monitor_high_worker_thread]    Script Date: 31/3/2023 3:44:07 pm ******/BEGIN TRANSACTION
SELECT @ReturnCode = 0
/****** Object:  JobCategory [[Uncategorized (Local)]]    Script Date: 31/3/2023 3:44:07 pm ******/IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'Monitor_high_worker_thread', 
@description=N'No description available.', 
@category_name=N'[Uncategorized (Local)]', 
@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [01_CaptureWorkerthreadCount]    Script Date: 31/3/2023 3:44:07 pm ******/EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'01_CaptureWorkerthreadCount', 
@os_run_priority=0, @subsystem=N'TSQL', 
@command=N'EXEC [dbo].[sp_captureworkerthreadcount_info]', 
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [02_ArchivingHistoricalData]    Script Date: 31/3/2023 3:44:07 pm ******/EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'02_ArchivingHistoricalData', 
@os_run_priority=0, @subsystem=N'TSQL', 
FROM [tempdb].[dbo].[tbl_workerthreadcount_info]
WHERE CaptureDateTime<=DATEADD(WEEK,-2,GETDATE()); ', 
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'CaptureWorkerthreadCount_Schd', 
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
GOTO EndSave

Generate stress (workload)

Now, let’s download and restore the database WideWorldImporters backup and create the below stored procedure.

USE WideWorldImporters
CREATE OR ALTER PROC [dbo].[sel_ByCityName_Ostrees] 
FROM [WideWorldImporters].[Application].[Cities]
order by NEWID()

To generate the load on the SQL Server instance, let’s use OStress tool to generate 500 concurrent sessions and each session is going to run the SP 500 times.

ostress -S"DESKTOP-BADCLECINST2K19" -E -Q"[sel_ByCityName_Ostrees];" -n500 -r500 -q -dWideWorldImporters

Collected Data Analysis:

Let’s analyze the collected the high worker thread count data. 

  • At 16:56:36.665, the OStress tool started creating worker threads. We can see from the snapshot below that the total_current_workers_count and other values went high on the spot on the SQL Server which indicated high worker thread problem.
  • At 17:04:27.266, the workload was cancelled to avoid the high worker thread problem on the server.
  • After the workload spike, the total_current_workers_count went down from 1000+ to 200+ worker thread count almost 1 hour later, even though the workload was cancelled within 10 mins.

Take Away:

A workload spike or contention will increase the current_workers_count, but it doesn’t go down on the spot after the workload spike or contention gets settled down. So, it isn’t recommended to monitor total_current_workers_count instead, we should monitor total_active_workers_count to know how long the server was experiencing a high worker thread problem.

Workers are allocated to the scheduler when there is work to be done, and the number of workers increases when there is a sudden surge in workload. As the workload reduces, workers become idle and wait for new tasks or requests to be assigned to them. General observation – workers will wait for ~15 minutes in an idle state and start to ramp down later. This allows us to have hot workers for incoming requests.

  • Active worker = bound to a task and executing a request.
  • Current worker count = total workers associated with scheduler.
  • Current task count = total tasks bound to workers or in the scheduler queue waiting to be bound to a worker.

Tasks are not an indicator of workers.  Task can be queued to the scheduler as the scheduler has a limited worker level.

  • Worker pool for scheduler = Workers waiting for work + workers bound to tasks
  • Task pool for scheduler = Tasks bound to workers + tasks queued to the scheduler (waiting for worker)

In general, you take the system’s max worker setting and divide by the number of visible schedulers. This is the target for the worker pool associated with an individual scheduler.

A connection is separate from a worker. The worker is bound to a task when it is considered active. A connection that is active has a pending request.

  • New connect request – Binds to task which binds to worker and performs the login.
  • Login complete – Connection remains open but worker and task returned to scheduler pool and can be used by other requests.
  • Connection Pool (.net) – Has active connection but no active request, does not consume worker or task for SQL Server.
  • Connection pool sends a request – Bound to new worker and task for the lifetime of the request. When a query is completed, the worker and task return to the scheduler pool.

Happy learning!

The post Best way to monitor high worker thread on SQL Server appeared first on .

Original post (opens in new tab)
View comments in original post (opens in new tab)


You rated this post out of 5. Change rating




You rated this post out of 5. Change rating