Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»

Running out of workers Expand / Collapse
Author
Message
Posted Thursday, October 17, 2013 6:08 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, May 2, 2014 6:15 AM
Points: 36, Visits: 102
Erland
Our 'max worker threads' is set at the default "0"
Next time I see this condition I will query the sys.dm_os_workers tables directly to get the count.

Since the Activity Monitor in SQL Server 2008 is so bad I usually take a quick look at it then run a couple of different queries that I found on this forum to really see what is going on.
I have wondered if maybe there is a bad join in one of the queries. If you have time and are willing, I could post the two queries that I use for you to see. On the surface they both look correctly formed but there was one table/view that I am not familiar with so I took the poster's word for the correctness. For "normal" processes the number of records returned is accurate so it gave me the feeling that the number for the massive records could be correct.
Post #1505646
Posted Thursday, October 17, 2013 2:46 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, August 21, 2014 3:04 PM
Points: 801, Visits: 715
ELLEN-610393 (10/17/2013)I have wondered if maybe there is a bad join in one of the queries. If you have time and are willing, I could post the two queries that I use for you to see. On the surface they both look correctly formed but there was one table/view that I am not familiar with so I took the poster's word for the correctness. For "normal" processes the number of records returned is accurate so it gave me the feeling that the number for the massive records could be correct.


Yes, it sounds like there is a join problem. Please post your queries, although it may take some days before I get the time to look at them. But maybe someone who is not at PASS can step in.


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1505913
Posted Thursday, October 17, 2013 2:55 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, August 14, 2014 7:06 AM
Points: 321, Visits: 569
Erland Sommarskog (10/16/2013)
dwilliscp (10/16/2013)
So if you have one SPID, but 700 rows, that means the server is trying to do parallel processing? They all show CXPacket.. So how do I track down whay this is going on? (about twice per week)?


You and Ellen are completely unrelated, aren'y you? I am just wondering if we are trying to to solve one or two problems here.

Where do you see 700 rows? CXPacket waits token of a parallel query, where some threads have completed their job and are waiting for some other thread to complete. Typically this happens when the optimizer makes a misestimation on how to partition the data over the threads.

To see what is going on, the first step is to find which querythe process is running. My own favourite tool is beta_lockinfo, but then again I wrote it. You find it on http://www.sommarskog.se/sqlutil/beta_lockinfo.html. This gives you a snapshot on what is going on in the system, including current query and plan.


Unrelated ... The following is one of the queries that I use to show what is going on... when the server is not responding, and unlike the worker query, this is the one that showed me the 700 rows with the same SPID (all generated by the same query), and I could not see one that was not CXPACKET..but maybe I should have filtered out the CXPACKET wait types after I noticed the SPID creating all the issues.

USE master
GO
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT
er.session_Id AS [Spid]
, sp.ecid
, er.start_time
, DATEDIFF(SS,er.start_time,GETDATE()) as [Age Seconds]
, sp.nt_username
, er.status
, er.wait_type
, SUBSTRING (qt.text, (er.statement_start_offset/2) + 1,
((CASE WHEN er.statement_end_offset = -1
THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
ELSE er.statement_end_offset
END - er.statement_start_offset)/2) + 1) AS [Individual Query]
, qt.text AS [Parent Query]
, sp.program_name
, sp.Hostname
, sp.nt_domain

FROM sys.dm_exec_requests er
INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle)as qt
WHERE session_Id > 50
AND session_Id NOT IN (@@SPID)
ORDER BY session_Id, ecid
Post #1505917
Posted Thursday, October 17, 2013 8:59 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Saturday, August 16, 2014 8:16 AM
Points: 107, Visits: 513
When in doubt, tinker. Tinker with what you are allowed to change (which doesn't sound like much). Can you set MAXDOP to 4? MAXDOP 0 can be a killer as it'll bulk up on threads for a query on the assumption nothing else is going to run. A few milliseconds later when 100 similar queries have been executed, VOILA! Thousands of extra threads. For my dollar, even with hyperthreading MAXDOP should be set no higher than the number of physical cores - half that in AMD environments or busy Intel environments.
Usually things are disk bound anyway. Sure would be nice to install a few SSDs and use them for TEMPDB - that can make a huge difference everywhere.

Oh - and bad statistics. That could indirectly cause a lot of parallelism and extra threads.
Post #1505963
Posted Friday, October 18, 2013 3:19 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 3:40 AM
Points: 25, Visits: 1,272
Hello,

You can use the query below, execute it every 30 seconds or so .. and whatch the number of connections as well !


SELECT
SUM(current_workers_count) current_visible_workers,
SUM(active_workers_count) active_visible_workers,
(SELECT COUNT(*) FROM sys.dm_exec_connections) connections
FROM sys.dm_os_schedulers
-- Those schedulers that have IDs greater than or equal to 255 are used internally by SQL Server
-- check BOL !
WHERE scheduler_id < 255

Cheers,
R
Post #1506042
Posted Friday, October 18, 2013 3:20 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Yesterday @ 3:40 AM
Points: 25, Visits: 1,272
..
Post #1506043
Posted Friday, October 18, 2013 7:20 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, August 21, 2014 3:04 PM
Points: 801, Visits: 715
dwilliscp (10/17/2013)

FROM sys.dm_exec_requests er
INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid


Here is a potential problem. This join is good in most cases, but not if you use MARS (Multiple Active Result Sets) or SOAP requests, in which case there may be multiple requests for the same session id.

If we ignore that, and assume that a single request is generating 700 worker threads, that is is excessive. You need to look into the query plan for that query, and investigate if you can add indexes or improve the query.

There has been some other suggestions for diagnostic queries. I can offer beta_lockinfo, whicb you find at http://www.sommarskog.se/sqlutil/beta_lockinfo.html. This gives you both the query, the plan as well as a bunch of other information. And itjoins the various DMVs correctly. :--)


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1506148
Posted Friday, October 18, 2013 7:23 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, May 2, 2014 6:15 AM
Points: 36, Visits: 102
ELLEN-610393 (10/17/2013)
Erland
Our 'max worker threads' is set at the default "0"
Next time I see this condition I will query the sys.dm_os_workers tables directly to get the count.

Since the Activity Monitor in SQL Server 2008 is so bad I usually take a quick look at it then run a couple of different queries that I found on this forum to really see what is going on.
I have wondered if maybe there is a bad join in one of the queries. If you have time and are willing, I could post the two queries that I use for you to see. On the surface they both look correctly formed but there was one table/view that I am not familiar with so I took the poster's word for the correctness. For "normal" processes the number of records returned is accurate so it gave me the feeling that the number for the massive records could be correct.


Thank you [anyone] that can look at this query and if there is a flaw I welcome correction.
The query that I use is:
SELECT
[Session ID] = s.session_id,
[User Process] = CONVERT(CHAR(1), s.is_user_process),
[Login] = s.login_name,
[Database] = ISNULL(db_name(p.dbid), N''),
[Task State] = ISNULL(t.task_state, N''),
[Command] = ISNULL(r.command, N''),
[Application] = ISNULL(s.program_name, N''),
[Wait Time (ms)] = ISNULL(w.wait_duration_ms, 0),
[Wait Type] = ISNULL(w.wait_type, N''),
[Wait Resource] = ISNULL(w.resource_description, N''),
[Blocked By] = ISNULL(CONVERT (varchar, w.blocking_session_id), ''),
[Head Blocker] =
CASE
-- session has an active request, is blocked, but is blocking others or session is idle but has an open tran and is blocking others
WHEN r2.session_id IS NOT NULL AND (r.blocking_session_id = 0 OR r.session_id IS NULL) THEN '1'
-- session is either not blocking someone, or is blocking someone but is blocked by another party
ELSE ''
END,
[Total CPU (ms)] = s.cpu_time,
[Total Physical I/O (MB)] = (s.reads + s.writes) * 8 / 1024,
[Memory Use (KB)] = s.memory_usage * 8192 / 1024,
[Open Transactions] = ISNULL(r.open_transaction_count,0),
[Login Time] = s.login_time,
[Last Request Start Time] = s.last_request_start_time,
[Host Name] = ISNULL(s.host_name, N''),
[Net Address] = ISNULL(c.client_net_address, N''),
[Execution Context ID] = ISNULL(t.exec_context_id, 0),
[Request ID] = ISNULL(r.request_id, 0),
[Workload Group] = ISNULL(g.name, N'')
FROM sys.dm_exec_sessions s WITH (NOLOCK)
LEFT OUTER JOIN sys.dm_exec_connections c WITH (NOLOCK) ON (s.session_id = c.session_id)
LEFT OUTER JOIN sys.dm_exec_requests r WITH (NOLOCK) ON (s.session_id = r.session_id)
LEFT OUTER JOIN sys.dm_os_tasks t WITH (NOLOCK) ON (r.session_id = t.session_id AND r.request_id = t.request_id)
LEFT OUTER JOIN
( -- In some cases (e.g. parallel queries, also waiting for a worker), one thread can be flagged as
-- waiting for several different threads. This will cause that thread to show up in multiple rows
-- in our grid, which we don't want. Use ROW_NUMBER to select the longest wait for each thread,
-- and use it as representative of the other wait relationships this thread is involved in.
SELECT *, ROW_NUMBER() OVER (PARTITION BY waiting_task_address ORDER BY wait_duration_ms DESC) AS row_num
FROM sys.dm_os_waiting_tasks WITH (NOLOCK)
) w ON (t.task_address = w.waiting_task_address) AND w.row_num = 1
LEFT OUTER JOIN sys.dm_exec_requests r2 WITH (NOLOCK) ON (s.session_id = r2.blocking_session_id)
LEFT OUTER JOIN sys.dm_resource_governor_workload_groups g WITH (NOLOCK) ON (g.group_id = s.group_id)
LEFT OUTER JOIN sys.sysprocesses p WITH (NOLOCK) ON (s.session_id = p.spid)
WHERE s.session_id > 50
ORDER BY s.session_id;
Post #1506154
Posted Friday, October 18, 2013 9:18 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Saturday, August 16, 2014 8:16 AM
Points: 107, Visits: 513
If one query is generating several hundred worker threads that really sounds like you have MAXDOP set to 0 (the default). Since you know the problematic query apparently, just take that query and append OPTION (MAXDOP 4) to the end of it and fire it off in a few query windows. If it doesn't cause a couple thousand worker threads you've found your problem.
Post #1506219
Posted Friday, October 18, 2013 9:10 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, August 21, 2014 3:04 PM
Points: 801, Visits: 715
ELLEN-610393 (10/18/2013)

LEFT OUTER JOIN sys.sysprocesses p WITH (NOLOCK) ON (s.session_id = p.spid)


Here is the problem. Sysprocess does not have one per process, it has one per process and execution context id. And sometimes there can even be multiple rows for ecid = 0. Since you already have one per task, you get a complete explosion here.

Does this query come from the Activity Monitor?

In my beta_lockinfo, I do this to avoid this problem:

   JOIN   (SELECT spid, dbid = MIN(dbid), open_tran = MIN(open_tran)
FROM sys.sysprocesses
WHERE ecid = 0
GROUP BY spid) AS sp ON sp.spid = es.session_id



Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1506359
« Prev Topic | Next Topic »

Add to briefcase «««12345»»

Permissions Expand / Collapse