Running out of workers

  • dwilliscp (8/9/2013)


    Sorry should have added.. 64bit. Any change that would effect Disaster Recover, has to be done by the App Admin team. So I put in a request, and will see if they make the change. I know when I asked to get worker threads set to 1,000 from the 800 or so that it currently is.. they wanted me to prove that the problems we are having was worker threads. Thus this posting.. on trying to track their use and what happens when they run out.

    So you are the DBA, but you don't have permission to connect to the server machine directly? Well, then you have more than one problem!

    In the meantime, the persons who have access to the servers, will have to run the diagnostic queries for you.

    In this particular case, you should be able to diagnose the issue if the DAC is enabled for remote access. If SQL Server has run out of worker threads, there are of course no threads left for new connections. But there are other problems where you may find that you cannot connect, not over a remote DAC - but where any local connection works. (These problems are known as network problems, but the DBA will get the blame nevertheless.)

    /Erland

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Erland Sommarskog (8/9/2013)


    dwilliscp (8/9/2013)


    Sorry should have added.. 64bit. Any change that would effect Disaster Recover, has to be done by the App Admin team. So I put in a request, and will see if they make the change. I know when I asked to get worker threads set to 1,000 from the 800 or so that it currently is.. they wanted me to prove that the problems we are having was worker threads. Thus this posting.. on trying to track their use and what happens when they run out.

    So you are the DBA, but you don't have permission to connect to the server machine directly? Well, then you have more than one problem!

    In the meantime, the persons who have access to the servers, will have to run the diagnostic queries for you.

    In this particular case, you should be able to diagnose the issue if the DAC is enabled for remote access. If SQL Server has run out of worker threads, there are of course no threads left for new connections. But there are other problems where you may find that you cannot connect, not over a remote DAC - but where any local connection works. (These problems are known as network problems, but the DBA will get the blame nevertheless.)

    /Erland

    Yea welcome to my world... the folks that have access are the App Admin team.. not a DBA in the group. I have tried to Remote Desktop to the server and connect..using Management Studio, but it times out. (using my login) It will not let me use Admin:<my login>, states that DAC is not supported. I was local admin ... on our 2005 servers, but when we got the new servers.. got no access. Can not even access the drives except for my user folder.

  • Gail,

    I see that you said that the default 0 for worker threads should be good. What is the impact of having a process spin up several thousand worker threads? Today I was looking at the Activity Monitor and saw that there were two different processes that each had about 3,000 worker threads. This is a new system for me and I was thinking that might be bad.

    We are using SQL Server 2008 R2 SP1, Enterprise 64-bit, 4 processors , 146 GB RAM

    All opinions are welcomed.

    Thank you.

  • ELLEN-610393 (10/3/2013)


    Gail,

    I see that you said that the default 0 for worker threads should be good. What is the impact of having a process spin up several thousand worker threads? Today I was looking at the Activity Monitor and saw that there were two different processes that each had about 3,000 worker threads. This is a new system for me and I was thinking that might be bad.

    We are using SQL Server 2008 R2 SP1, Enterprise 64-bit, 4 processors , 146 GB RAM

    All opinions are welcomed.

    Thank you.

    That, my friend, sounds like a "connection leak" somewhere in the code for those two processes or the code that controls those two processes and both will certainly need to be repaired.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/4/2013)


    ELLEN-610393 (10/3/2013)


    Gail,

    I see that you said that the default 0 for worker threads should be good. What is the impact of having a process spin up several thousand worker threads? Today I was looking at the Activity Monitor and saw that there were two different processes that each had about 3,000 worker threads. This is a new system for me and I was thinking that might be bad.

    We are using SQL Server 2008 R2 SP1, Enterprise 64-bit, 4 processors , 146 GB RAM

    All opinions are welcomed.

    Thank you.

    That, my friend, sounds like a "connection leak" somewhere in the code for those two processes or the code that controls those two processes and both will certainly need to be repaired.

    Jeff [or any other wise and knowledgable person 🙂 ]

    Can you point me in a direction that will help me troubleshoot what is going on? and maybe an article that will give me more in depth knowledge of worker threads? Since I am rather new here I would like to approach the people that have created these stored procedures for BO reports with some more knowledge and details.

    Thank you in advance for any assistance.

  • I don't personally have any good sources for how to troubleshoot connection leaks. They can happen in the front-end or in SQL code (usually because of not closing connections when using sp_OA* code). Connection leaks are very difficult to find.

    Hopefully, someone else will have some good articles on the subject they can point us to. I haven't looked at the articles themselves, but there seems to be a decent selection of articles on the subject if you Google for "troubleshoot connection leaks" once you filter out the plumbing problems. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ELLEN-610393 (10/3/2013)


    I see that you said that the default 0 for worker threads should be good. What is the impact of having a process spin up several thousand worker threads? Today I was looking at the Activity Monitor and saw that there were two different processes that each had about 3,000 worker threads. This is a new system for me and I was thinking that might be bad

    I'm not sure why Jeff thinks this is a connection leak. I assume that with "process" you mean a single spid/session_id. A connection leak is an application problems and manifests itself with lots of session_ids, all having the same client_process_id and hostname. If you have a session with 3000 workers, that sounds like a query which is doing something massively parallel, which is something completely different.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Erland Sommarskog (10/15/2013)


    ELLEN-610393 (10/3/2013)


    I see that you said that the default 0 for worker threads should be good. What is the impact of having a process spin up several thousand worker threads? Today I was looking at the Activity Monitor and saw that there were two different processes that each had about 3,000 worker threads. This is a new system for me and I was thinking that might be bad

    I'm not sure why Jeff thinks this is a connection leak. I assume that with "process" you mean a single spid/session_id. A connection leak is an application problems and manifests itself with lots of session_ids, all having the same client_process_id and hostname. If you have a session with 3000 workers, that sounds like a query which is doing something massively parallel, which is something completely different.

    Perhaps I'm using the wrong words but I have to disagree. I don't have many examples but if you have a connection leak, for example, when using sp_OA*, you will have thousands of different spids and there will be one connection for each one. The same holds true for some of the connection leaks that you can get via an app. I'm not familiar with why the connections wouldn't automatically close when it comes from an app but we currently have a machine at work that has to be rebooted once a week (until AppDev takes the time to figure out where the leak is) because of this very problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Erland is correct. What I am seeing is thousands of threads for the same spid/session id [not an app that spun up thousands of spids]. Also what is showing is that while there may be several thousand threads for the same spid there will be multiple 'Execution Context IDs' . One example had about 52 different Execution Context IDs with each of those having 50 some threads.

    After further investigation and looking into the formula used by SQL to determine the allowable worker threads ... it is possible that what I am seeing is legit. The server running SQL actually has 4 processors with 6 cores each and is hyper threaded. It is new for me but may be common for this configuration. The server also has 192 GB RAM. I did read in one article that the max threads maybe should be limited to 2048 but I was not clear if that would be per spid or for the server. I am doing further research.

    Thank you all for your comments.

  • ELLEN-610393 (10/15/2013)


    What I am seeing is thousands of threads for the same spid/session id [not an app that spun up thousands of spids]. Also what is showing is that while there may be several thousand threads for the same spid there will be multiple 'Execution Context IDs' . One example had about 52 different Execution Context IDs with each of those having 50 some threads.

    You ceartinly have a parallel query - all those execution context is a sure sign of that. I am surprised that there are so many workers per execution context, but then again this is nothing I have paid attention to. As I am at PASS in Charlotte, I should be able to find someone who can explain this to me.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Jeff Moden (10/15/2013)Perhaps I'm using the wrong words but I have to disagree. I don't have many examples but if you have a connection leak, for example, when using sp_OA*, you will have thousands of different spids and there will be one connection for each one. The same holds true for some of the connection leaks that you can get via an app. I'm not familiar with why the connections wouldn't automatically close when it comes from an app but we currently have a machine at work that has to be rebooted once a week (until AppDev takes the time to figure out where the leak is) because of this very problem.

    I'm not saying that connection leaks cannot occur, because they certainly do. With proper coding where you say

    using (SqlConnection cn = new SqlConnection(connstring)) {

    // code here

    }

    They do not happen, but if you rely on garbage collection to taking care of your connection, it can happen.

    But what Ellen is seeing is something completely different, and not related to the client layer, but it happens in SQL Server alone.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • So if you have one SPID, but 700 rows, that means the server is trying to do parallel processing? They all show CXPacket.. So how do I track down whay this is going on? (about twice per week)?

  • dwilliscp (10/16/2013)


    So if you have one SPID, but 700 rows, that means the server is trying to do parallel processing? They all show CXPacket.. So how do I track down whay this is going on? (about twice per week)?

    You and Ellen are completely unrelated, aren'y you? I am just wondering if we are trying to to solve one or two problems here.

    Where do you see 700 rows? CXPacket waits token of a parallel query, where some threads have completed their job and are waiting for some other thread to complete. Typically this happens when the optimizer makes a misestimation on how to partition the data over the threads.

    To see what is going on, the first step is to find which querythe process is running. My own favourite tool is beta_lockinfo, but then again I wrote it. You find it on http://www.sommarskog.se/sqlutil/beta_lockinfo.html. This gives you a snapshot on what is going on in the system, including current query and plan.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Erland Sommarskog (10/15/2013)


    You ceartinly have a parallel query - all those execution context is a sure sign of that. I am surprised that there are so many workers per execution context, but then again this is nothing I have paid attention to. As I am at PASS in Charlotte, I should be able to find someone who can explain this to me.

    I spoke with some people here, and they agreed on that 3000 workers for a single thread is excessive, and we can't understand how that occur.

    When you see this condition, can you query sys.dm_os_workers directly? Just run a SELECT COUNT(*). Maybe this is an issue in Activity Monitor. Also, can you use sp_configure and report the setting for "max worker threads"?

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Erland Sommarskog (10/15/2013)


    Jeff Moden (10/15/2013)Perhaps I'm using the wrong words but I have to disagree. I don't have many examples but if you have a connection leak, for example, when using sp_OA*, you will have thousands of different spids and there will be one connection for each one. The same holds true for some of the connection leaks that you can get via an app. I'm not familiar with why the connections wouldn't automatically close when it comes from an app but we currently have a machine at work that has to be rebooted once a week (until AppDev takes the time to figure out where the leak is) because of this very problem.

    I'm not saying that connection leaks cannot occur, because they certainly do. With proper coding where you say

    using (SqlConnection cn = new SqlConnection(connstring)) {

    // code here

    }

    They do not happen, but if you rely on garbage collection to taking care of your connection, it can happen.

    But what Ellen is seeing is something completely different, and not related to the client layer, but it happens in SQL Server alone.

    Thanks, Erland. I appreciate the feedback. It'll be interesting to see what this actually turns out to be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply