Lately I’ve noticed an increasing number of people recommend increasing worker threads. The reasons for the recommendations include things as simple as a troubleshooting step for reports of general system slowness or as complex as a means to mirror very large number of databases on a single server.
In one instance, someone successfully convinced another person on a discussion group that they could recommend to their client that it is okay to mirror 400 to 600 databases on a single server and all they needed to do was increase the number of worker threads. Despite my advice to the contrary, the poster chose to take the ill-given advice of the other person because it was what he wanted to hear. Even the SQL Cat (Customer Advisory Team) has made this recommendation in their Technical Note Mirroring a Large Number of Databases in a Single SQL Server Instance (read my comment at the bottom).
Does the Argument Hold Water?
There is merit to the suggestion that increasing the number of worker threads can allow you to mirror more databases than you would be able to mirror otherwise. For example, if a majority of the databases are inactive most of the time, then worker threads not doing anything will not cause you much overhead or delays. But as more and more of the databases become active, the wait for CPU time will become longer and more noticeable. If most or all databases are busy, the longer waits for CPU time can make the entire server seem sluggish and slow.
My biggest problem with the recommendations is that they are not prefaced with a warning of the negative effects of increasing with the number of worker threads. The SQL Cat team does recommend testing the it yourself with your own workload more than once in their paper, and that’s a good thing. However, I also feel that their test is flawed and their stated results misleading.
All-in-all, I think the SQL Cat team does a thorough job in their testing and their findings definitely have merit. My intent here is not to rail against the SQL Cat team. I have a great deal of respect for the team and its members, and I recognize that they are real experts in the field. My only beef here is with recommending increasing the number of worker threads without the proper warning.
The Toilet Analogy
The Toilet Analogy is how I like to explain to people why I’m against increasing worker threads in almost all cases. Many people, even skilled, experienced DBA’s, do not quite grasp the difference between worker threads and CPU threads. One hint: they’re not the same thing. Increasing worker threads does not increase CPU threads. If you are using the default settings on a 64 bit server with 4 logical CPU’s, you have 512 worker threads and 4 CPU threads. There is only ever 1 CPU thread per logical CPU. SQL Server uses schedulers to manage which worker thread is active on a CPU. There is never one worker thread actively using a CPU thread. Out of the 512 worker threads, only 4 of them can actively be using a CPU (assuming nothing else is using any CPU).
I like to think of CPU’s as stalls in a public bathroom and worker threads as spots in line to the bathroom (counting the 4 stalls). For the example server mentioned in the previous paragraph, it would be a bathroom with 4 stalls and a maximum capacity of 512 spots in line. There can only be 4 people using a stall at a time, so there would always be at least 508 people that cannot fit into the bathroom.
Some of the people may be standing in line to talk to others (SPID sleeping), some may be in line because they know they’ll need to go soon (waiting on resources), some spots in line may be empty (low number of connections), and some may need to go into the bathroom ASAP (CPU/signal wait). Let’s assume in this scenario, the line is filled to capacity and other people in the facility are keeping an eye on the line hoping for a spot to open (errors due to no worker thread available)
Now let’s suppose that you’re the facilities manager and you want to be able to serve more people in the bathroom. If you increase the number of spots in line (worker threads), you’re still only going to be able to have at most 4 people in the bathroom at a time. The only thing you’ve increased is the line waiting resulting in a lot of people have to wait much longer to get into the bathroom. If business is slow, and the demand to get in right away is not heavy, then there may be perceived improvements in bathroom throughput. However, if business is heavy and there is very high demand for the bathroom, the waits can be extensive and customers will be unhappy. If you were to add another bathroom with 4 stalls (increase CPU to 8 logical processors), the number of spots in line would be greater and you’d be able to server twice as many people at the same time. It’s a win-win situation.