Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

The Toilet Analogy … or Why I Never Recommend Increasing Worker Threads

The Toilet Analogy … or Why I Never Recommend Increasing Worker Threads

Lately I’ve noticed an increasing number of people recommend increasing worker threads. The reasons for the recommendations include things as simple as a troubleshooting step for reports of general system slowness or as complex as a means to mirror very large number of databases on a single server.

 

In one instance, someone successfully convinced another person on a discussion group that they could recommend to their client that it is okay to mirror 400 to 600 databases on a single server and all they needed to do was increase the number of worker threads. Despite my advice to the contrary, the poster chose to take the ill-given advice of the other person because it was what he wanted to hear. Even the SQL Cat (Customer Advisory Team) has made this recommendation in their Technical Note Mirroring a Large Number of Databases in a Single SQL Server Instance (read my comment at the bottom).

 

Does the Argument Hold Water?

 

There is merit to the suggestion that increasing the number of worker threads can allow you to mirror more databases than you would be able to mirror otherwise. For example, if a majority of the databases are inactive most of the time, then worker threads not doing anything will not cause you much overhead or delays. But as more and more of the databases become active, the wait for CPU time will become longer and more noticeable. If most or all databases are busy, the longer waits for CPU time can make the entire server seem sluggish and slow.

 

My biggest problem with the recommendations is that they are not prefaced with a warning of the negative effects of increasing with the number of worker threads. The SQL Cat team does recommend testing the it yourself with your own workload more than once in their paper, and that’s a good thing. However, I also feel that their test is flawed and their stated results misleading.

 

All-in-all, I think the SQL Cat team does a thorough job in their testing and their findings definitely have merit. My intent here is not to rail against the SQL Cat team. I have a great deal of respect for the team and its members, and I recognize that they are real experts in the field. My only beef here is with recommending increasing the number of worker threads without the proper warning.

 

The Toilet Analogy

 

The Toilet Analogy is how I like to explain to people why I’m against increasing worker threads in almost all cases. Many people, even skilled, experienced DBA’s, do not quite grasp the difference between worker threads and CPU threads. One hint: they’re not the same thing. Increasing worker threads does not increase CPU threads. If you are using the default settings on a 64 bit server with 4 logical CPU’s, you have 512 worker threads and 4 CPU threads. There is only ever 1 CPU thread per logical CPU. SQL Server uses schedulers to manage which worker thread is active on a CPU. There is never one worker thread actively using a CPU thread. Out of the 512 worker threads, only 4 of them can actively be using a CPU (assuming nothing else is using any CPU).

 

I like to think of CPU’s as stalls in a public bathroom and worker threads as spots in line to the bathroom (counting the 4 stalls). For the example server mentioned in the previous paragraph, it would be a bathroom with 4 stalls and a maximum capacity of 512 spots in line. There can only be 4 people using a stall at a time, so there would always be at least 508 people that cannot fit into the bathroom.

 

Some of the people may be standing in line to talk to others (SPID sleeping), some may be in line because they know they’ll need to go soon (waiting on resources), some spots in line may be empty (low number of connections), and some may need to go into the bathroom ASAP (CPU/signal wait). Let’s assume in this scenario, the line is filled to capacity and other people in the facility are keeping an eye on the line hoping for a spot to open (errors due to no worker thread available)

 

Now let’s suppose that you’re the facilities manager and you want to be able to serve more people in the bathroom. If you increase the number of spots in line (worker threads), you’re still only going to be able to have at most 4 people in the bathroom at a time. The only thing you’ve increased is the line waiting resulting in a lot of people have to wait much longer to get into the bathroom. If business is slow, and the demand to get in right away is not heavy, then there may be perceived improvements in bathroom throughput. However, if business is heavy and there is very high demand for the bathroom, the waits can be extensive and customers will be unhappy. If you were to add another bathroom with 4 stalls (increase CPU to 8 logical processors), the number of spots in line would be greater and you’d be able to server twice as many people at the same time. It’s a win-win situation.

Comments

Posted by Anonymous on 8 May 2010

Robert Davis(blog/twitter) recently blogged The Toilet Analogy … or Why I Never Recommend Increasing

Posted by Anonymous on 8 May 2010

Pingback from  Twitter Trackbacks for                 Blogged] The Toilet Analogy ??? or Why I Never Recommend Increasing Worker Threads:  #sql #sqlserver #sqlgeek         [sqlservercentral.com]        on Topsy.com

Posted by Steve Jones on 10 May 2010

This is interesting. When you say that the number of CPU threads are limited to logical CPUs, I assume you mean cores, correct? If I have a 2 socket, quad core, then I have 8 CPU threads, correct? What about hyperthreading?

Posted by Jason Brimhall on 10 May 2010

I like the analogy.  Good information.  Thanks.

Posted by Anonymous on 11 May 2010

Pingback from  Help Me Come Up With A Name For My Company? | Consult services

Posted by chrisleonard on 11 May 2010

But ... we have a situation ... RIGHT NOW ... where we are seeing bottlenecks and broken connections due to insufficient worker thread allocation.  This is per Microsoft's own analysis.  Sometimes you need to let a lot of people stand in line, and if you can't accommodate them, you need to turn them away.  In our case, turning them away means broken apps THAT OUR USERS CAN SEE (we have about 5M or so active users, so we do everything we can to avoid errors they can see).

In such a situation like that, would you really not allocate more worker threads based on an analogy?  This is an interesting analogy, but analogies and conclusions are (or should be) two different things.  Further, I think that analogies are most interesting where they break down, and I think this one breaks down in assuming that it's just as easy to add the extra cores as it is to increase the length of the queues.  This is just not so.  Increasing the length of the queues takes a mere reconfiguration.  Increasing the number of cores takes a purchase that in our case would cost well into the 6 figures once all the gear was purchased.  What would you do?  I suspect that you would, in fact, increase the length of the worker thread queue.  Am I right?

Cheers,

Chris

Posted by Robert Davis on 12 May 2010

Hi Chris. I'm not saying that it's never a valid step to take. In your case, it could be a valid temporary solution until you can increase the number of actual CPU's. But it should be a temporary solution. Your ultimate goal should be to size your system appropriately for the load rather than simply tweak it so that end user's don't see errors.

The analogy is simply a tool to explain the difference between CPU threads and worker threads.

Posted by Robert Davis on 12 May 2010

Steve, CPU threads are limited to 1 per logical CPU not per core. Using your example, 2 quad core CPU's would be 8 logical CPU's and result in 8 CPU threads. If it was per core, there would be only 2 CPU threads, but there are 8 in this scenario.

Likewise, if you had 4 hyperthreaded cores resulting in 8 logical CPU's, you would have 8 CPU threads.

Posted by Chris Sherlock on 15 May 2010

Uh? That doesn't make any sense:

"Using your example, 2 quad core CPU's would be 8 logical CPU's and result in 8 CPU threads. If it was per core, there would be only 2 CPU threads, but there are 8 in this scenario."

In his example, he's got two CPUs with 4 cores on each. That would make 8 cores, not two ("If it was per core, there would be only 2 CPU threads")!

Posted by rhunt on 17 May 2010

You sparked my curiosity with regard to the ratio of worker threads to active databases.  In general, is there a good way to determine how many DB's a system should be able to handle?  With applications like SharePoint there is a decision to be made on the overall architecture: fewer very large DB's (hard for minor DR/recovery type tasks) or many, many, many smaller databases with much more focused amounts of data (GB's stored, queries/sec, trans/sec don't really change but the number of active DB's sure does).

Posted by Robert Davis on 17 May 2010

This is in reference to worker threads for mirrored databases and not all active databases. An active database does not have dedicated threads assigned to them. It really depends on how many active tasks you have going rather than how many databases. The threads assigned for the mirrored databases are for the mirroring tasks that are occurring and not for the databases themselves. So simply being an active database does not consume a worker thread.

Posted by sqldba on 9 March 2011

Hello Robert:

This is very good article on workers & CPU threads. can you please explain how to monitor and what willbe the CPU utilization based upon he analogy & scenerio you provided.

because if your CPU is 90% utilization by SQL Server process then increasing workers thread is bad approach and if SQL Server process is using 30% of CPU and system is idle for 65% and during that time you have 5k+ users actively accessing databased via Apps\web server so in that scenrio increasing worker thread is goos or bad.

Thanks

Jay

Posted by Robert L Davis on 25 April 2011

You're absolutely right. It depends on your CPUusage whether you would see any benefit at all. That's the problem I had with the original post. It makes it sound like you can mirror any number of databases simply by increasing worker threads. That's not the case at all.

Leave a Comment

Please register or log in to leave a comment.