How many workers can I use and how many are active?

Question

How many workers can I use and how many are active?

kevaburg

SSCoach

Points: 18131
More actions
December 18, 2017 at 8:00 am

#409871

Hi folks,
we are trying to troubleshoot a problem that seems to be the result of too many threads being allocated to queries. We came to this conclusion after seeing the THREADPOOL (poison) wait becoming more and more prominent as a SQL Server wait stat.
After looking at the DMVs I figured this out:
SELECT max_workers_count FROM sys.dm_os_sys_info;
reported 2944 workers.
The calculation for determining how many threads can be supported.....
SELECT 512 + ((80-4) * 16)
reported back 1728 threads.
Looking at
SELECT COUNT(*) FROM sys.dm_os_threads;
I saw that 985 threads are open. Does this number include the open and reserved threads?
The first question: Can anyone explain the discrepancy between max_workers_count and the calculation as I presented it here?
The second question: Why is there such a large amount of THREADPOOL waits when clearly we aren't exhausting the available workers?
My first thought was that our current MAXDOP and Threshold for Parallelism being 16 and 50 respectively are not optimally set for our environment. Could a possible solution be to reduce the MAXDOP to 8 to reduce the threads available to a single transaction?
Most of the queries going into the database are AdHoc and use quite complex Dynamic-SQL and nested queries, an unfortunate necessity due to the way the software has been built.
Any helpful comments and advice would be appreciated! 🙂
Regards,
Kev

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Uwe Ricken Hall of Fame Points: 3246 More actions · Answer 1

Hi Kev,
" Can anyone explain the discrepancy between max_workers_count and the calculation as I presented it here?"
You can set the max number of threads manually (basically not recommended) with
sp_configure N'max worker threads', num_of_threads; RECONFIGURE WITH OVERRIDE;
You can check the actual set value by querying the system dmv [sys].[configurations]!
"Why is there such a large amount of THREADPOOL waits when clearly we aren't exhausting the available workers?"
The reason is “MAXDOP is always specified per operator in the execution plan and not per execution plan”.
If [THREADPOOL] waits occure it might be the reason of complex execution plans with lots of parallel operators!

Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 2

kevaburg - Monday, December 18, 2017 8:00 AM
We are trying to troubleshoot a problem that seems to be the result of too many threads being allocated to queries. We came to this conclusion after seeing the THREADPOOL (poison) wait becoming more and more prominent as a SQL Server wait stat.

What is the actual problem that you're trying to troubleshoot? Is it performance or something else? I ask because I've found that things like "the THREADPOOL (poison) wait becoming more and more prominent as a SQL Server wait stat" is a symptom of a larger problem and trying to fix the symptom rather than the cause is usually a futile cause that sometimes causes more harm than good.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Uwe Ricken Hall of Fame Points: 3246 More actions · Answer 3

Hi Jeff,
THREADPOOL is definitely a problem of a heavy used system with lots of requests at the same time. To fight against it I would of course check the parallelism settings and the most expensive parallel queries. Another issue could be "blocking" but first of all I would fight the parallelism 🙂

Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)

kevaburg SSCoach Points: 18131 More actions · Answer 4

Uwe Ricken - Tuesday, December 19, 2017 5:35 AM
Hi Kev,
" Can anyone explain the discrepancy between max_workers_count and the calculation as I presented it here?"
You can set the max number of threads manually (basically not recommended) with
sp_configure N'max worker threads', num_of_threads; RECONFIGURE WITH OVERRIDE;
You can check the actual set value by querying the system dmv [sys].[configurations]!
"Why is there such a large amount of THREADPOOL waits when clearly we aren't exhausting the available workers?"
The reason is â€œMAXDOP is always specified per operator in the execution plan and not per execution planâ€.
If [THREADPOOL] waits occure it might be the reason of complex execution plans with lots of parallel operators!

Hi Uwe,

OK.....now you have touched on a subjct I clearly didn't understand enough. Some of our execution plans are very complex and deeply nested so this would explain the requirement for what would appear to be an excessive number of threads for a given query.

Max Workers Threads we have left alone with the default of 0. This won't be changed.

What I am going to suggest based on your explanation is a test of the code with a reduced MAXDOP to see how it reacts. I will post the results here.

Many thanks!

Regards,
Kev

kevaburg SSCoach Points: 18131 More actions · Answer 5

Jeff Moden - Tuesday, December 19, 2017 6:13 AM
kevaburg - Monday, December 18, 2017 8:00 AM
We are trying to troubleshoot a problem that seems to be the result of too many threads being allocated to queries. We came to this conclusion after seeing the THREADPOOL (poison) wait becoming more and more prominent as a SQL Server wait stat.
What is the actual problem that you're trying to troubleshoot? Is it performance or something else? I ask because I've found that things like "the THREADPOOL (poison) wait becoming more and more prominent as a SQL Server wait stat" is a symptom of a larger problem and trying to fix the symptom rather than the cause is usually a futile cause that sometimes causes more harm than good.

Hi Jeff,

the problem I am trying to troubleshoot is concerned with a dashboard that displays reliability statistics over a given period. Unfortunately it displays a red light as soon as a poison wait threshold is reached and in this case the wait was THREADPOOL.

Although performance is ok, we are beginning to express concern that as the environment grows, inefficient SQL will compound the problem. In end effect I am trying to proactively anticipate a potential problem and resolve it before I have to post an emergency HEEEELLLLPPPPP here!

From what Uwe has explained, reducing the size of the execution plan (at least the expensive components that need to be parallelised) might help. The next step from my side is to identify the most expensive plans and try to optimise them. Because the MAXDOP is only going to be applied to threads that need to be parallelised I am going to suggest raising the threshold of parallelism and reduce the MAXDOP.

It could be interesting....

Regards,
Kev

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 6

kevaburg - Tuesday, December 19, 2017 6:54 AM
Jeff Moden - Tuesday, December 19, 2017 6:13 AM
kevaburg - Monday, December 18, 2017 8:00 AM
We are trying to troubleshoot a problem that seems to be the result of too many threads being allocated to queries. We came to this conclusion after seeing the THREADPOOL (poison) wait becoming more and more prominent as a SQL Server wait stat.
What is the actual problem that you're trying to troubleshoot? Is it performance or something else? I ask because I've found that things like "the THREADPOOL (poison) wait becoming more and more prominent as a SQL Server wait stat" is a symptom of a larger problem and trying to fix the symptom rather than the cause is usually a futile cause that sometimes causes more harm than good.
Hi Jeff,
the problem I am trying to troubleshoot is concerned with a dashboard that displays reliability statistics over a given period. Unfortunately it displays a red light as soon as a poison wait threshold is reached and in this case the wait was THREADPOOL.
Although performance is ok, we are beginning to express concern that as the environment grows, inefficient SQL will compound the problem. In end effect I am trying to proactively anticipate a potential problem and resolve it before I have to post an emergency HEEEELLLLPPPPP here!
From what Uwe has explained, reducing the size of the execution plan (at least the expensive components that need to be parallelised) might help. The next step from my side is to identify the most expensive plans and try to optimise them. Because the MAXDOP is only going to be applied to threads that need to be parallelised I am going to suggest raising the threshold of parallelism and reduce the MAXDOP.
It could be interesting....
Regards,
Kev

Heh... that's one of the things I don't like about a lot of dashboards. It's like a car... the check-engine light comes on for something as trivial as the gas cap being a bit loose or worn.

Not that it matters to anyone but my poor man's version of "Resource Governor" is to set system wide MAXDOP to no more that 1/4 of the total CPUs and never more than 8. If someone needs more than 8 CPUs for their junk to run, then they need to have a close up experience with the 3 banded pork chop launcher. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

The other thing, that Jeff hasn't said yet, check the dynamic code that is being generated as well as how it is being executed. I solved some of our dynamic SQL issues by changing how it was generated and executed. Many of our dynamic SQL code being generated where the same except for variable data that could be passed in as scalar variables or table valued parameters using sp_executesql.

In the process I also found better ways to write the SQL as well.

kevaburg SSCoach Points: 18131 More actions · Answer 8

Lynn Pettis - Tuesday, December 19, 2017 3:19 PM
The other thing, that Jeff hasn't said yet, check the dynamic code that is being generated as well as how it is being executed. I solved some of our dynamic SQL issues by changing how it was generated and executed. Many of our dynamic SQL code being generated where the same except for variable data that could be passed in as scalar variables or table valued parameters using sp_executesql.
In the process I also found better ways to write the SQL as well.

Hi Lynn,

I did this and guess what I found; hidden in the dynamic code were views referencing views, in one case, three deep.

I reckon this is where I will be learning alot about rewriting SQL as well... 🙂

Regards,
Kev

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 9

Lynn Pettis - Tuesday, December 19, 2017 3:19 PM
The other thing, that Jeff hasn't said yet, check the dynamic code that is being generated as well as how it is being executed. I solved some of our dynamic SQL issues by changing how it was generated and executed. Many of our dynamic SQL code being generated where the same except for variable data that could be passed in as scalar variables or table valued parameters using sp_executesql.
In the process I also found better ways to write the SQL as well.

On that note, also check how it is being cached. While poorly written and non-parameterized dynamic SQL may execute very quickly, it may have to compile each and every time it occurs. We had such a thing... it was executing in 100ms (which is still about 10-20 times longer than it needed to) but it was taking between 2 and 22 SECONDS to compile and it had to compile on every call and it was being called an insane number of times every hour.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)