September 2, 2011 at 10:48 am
Latch_Ex on what kind of latch?
Latches (as opposed to PageLatch and PageIOLatch) are memory-related waits. Hence need to know what kind of latchwaits you're seeing.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 2, 2011 at 12:41 pm
Unfortunately I am not tracking that data historically, but looking at sys.dm_os_latch_Stats
ACCESS_METHODS_DATASET_PARENT Max of 8331 average of 9
BUFFER Max of 9922 average of 0
All others of note are "Internal use only" DBCC* and NESTING*
I'll start gathering values over time. (at least for those two)
PAGELATCH and PAGEIOLATCH have total wait time of 1/1000th of LATCH
September 2, 2011 at 12:48 pm
Reset counters
DBCC SQLPERF ('sys.dm_os_latch_stats', CLEAR);
GO
Setup monitor to track value ever 10 minutes... And Now off to do some reading. Just the way to start off a holiday weekend.
September 2, 2011 at 12:49 pm
hmm.... @@version?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 2, 2011 at 1:04 pm
Microsoft SQL Server 2008 (SP2) - 10.0.4266.0 (X64) Nov 5 2010 16:11:39 Copyright (c) 1988-2008 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )
September 2, 2011 at 1:17 pm
I was just reading about monitoring latch waits over on Paul Randal's blog. He suggested this white paper to understand what's happening with latch waits..
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 2, 2011 at 1:19 pm
Grant Fritchey (9/2/2011)
I was just reading about monitoring latch waits over on Paul Randal's blog. He suggested this white paper to understand what's happening with latch waits..
There seems to be an echo in here... 😉
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 2, 2011 at 5:04 pm
GilaMonster (9/2/2011)
Grant Fritchey (9/2/2011)
I was just reading about monitoring latch waits over on Paul Randal's blog. He suggested this white paper to understand what's happening with latch waits..There seems to be an echo in here... 😉
HA! I didn't even click through on that one. Probably should have. If I'd noticed the head of the URL I would have known.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 12, 2011 at 9:03 am
Update.
SELECT * FROM sys.dm_os_wait_stats ORDER BY wait_time_ms desc
showed that CXPACKET was in the top 10 (actually 4th highest) waits.
So we concentrated on parallel queries.
First thing I found out is that apparently in practice breaking up a query more than 8 degree's will reduce performance, so the current rule of thumb for MAXDOP is to set the value to the number of cores in a single socket. So in our case it was 6. In addition, some people have seen a benefit in disabling Hyper threading. We are not currently considering that, but might in the future.
I found a great article on parallel queries http://sqlserverpedia.com/blog/sql-server-bloggers/find-query-plans-that-may-utilize-parallelism/
This query is great
--
-- Find query plans that may run in parallel
--
SELECT DB_NAME(p.dbid) AS DBName ,
OBJECT_NAME(p.objectid, p.dbid) AS OBJECT_NAME ,
cp.usecounts ,
p.query_plan ,
q.text ,
p.dbid ,
p.objectid ,
p.number ,
p.encrypted ,
cp.plan_handle
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) p
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS q
WHERE cp.cacheobjtype = 'Compiled Plan'
AND p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan"; max(//p:RelOp/@Parallel)', 'float') > 0
ORDER BY p.dbid, p.objectID
September 12, 2011 at 9:06 am
Cxpackt at 4th place??? I wouldn't bother, AT ALL, with it atm.
Cost threshold for parallelism (CXPACKET) http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/01/19/tuning-cost-threshold-of-parallelism-from-the-plan-cache.aspx
Paul White: Understanding parallelism http://www.simple-talk.com/sql/learn-sql-server/understanding-and-using-parallelism-in-sql-server/
Microsoft White Paper on waits http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/Performance_Tuning_Waits_Queues.doc
September 12, 2011 at 9:08 am
How long's the peak at 50%?
Sounds to me like you just have a olap query taking a little more ressources the your other "normal" queries.
September 12, 2011 at 10:09 am
The event's usually last between 5-10 minutes.
However, I should have clarified the 4th number. The other three were ignorable waits. In fact, all in the top 10 except CXPACKET where ignorable.
LAZYWRITER_SLEEP 716375247
DISPATCHER_QUEUE_SEMAPHORE 188715613
XE_DISPATCHER_WAIT 182711826
CXPACKET 179763111
BROKER_TASK_STOP 179128691
We found a table that was in need of indexing. Our best guess at this point, is we went into parallel paralysis, when a certain class of reports were run (which depended heavily on this table).
We suspect the table was the primary issue. The MAXDOP change is in place to hopefully protect us from future queries crippling the system. At this point, I am not comfortable going as far as setting it to 1.
At
September 12, 2011 at 10:14 am
Bob Fazio (9/12/2011)
The event's usually last between 5-10 minutes.However, I should have clarified the 4th number. The other three were ignorable waits. In fact, all in the top 10 except CXPACKET where ignorable.
LAZYWRITER_SLEEP 716375247
DISPATCHER_QUEUE_SEMAPHORE 188715613
XE_DISPATCHER_WAIT 182711826
CXPACKET 179763111
BROKER_TASK_STOP 179128691
We found a table that was in need of indexing. Our best guess at this point, is we went into parallel paralysis, when a certain class of reports were run (which depended heavily on this table).
We suspect the table was the primary issue. The MAXDOP change is in place to hopefully protect us from future queries crippling the system. At this point, I am not comfortable going as far as setting it to 1.
At
I'm really no expert with that big of a server, but good sense tells me that there probably should be a maxdop setting at server level (maybe 8, 12 or 16).
That way a rogue query shouldn't kill the whole server.
That being said, what does this return?
/*
Cost threshold for parallelism (CXPACKET) http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/01/19/tuning-cost-threshold-of-parallelism-from-the-plan-cache.aspx
Paul White: Understanding parallelism http://www.simple-talk.com/sql/learn-sql-server/understanding-and-using-parallelism-in-sql-server/
Microsoft White Paper on waits http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/Performance_Tuning_Waits_Queues.doc
Next query by Paul Randal http://www.sqlskills.com/BLOGS/PAUL/post/Wait-statistics-or-please-tell-me-where-it-hurts.aspx
*/
WITH Waits AS
(SELECT
wait_type,
wait_time_ms / 1000.0 AS WaitS,
(wait_time_ms - signal_wait_time_ms) / 1000.0 AS ResourceS,
signal_wait_time_ms / 1000.0 AS SignalS,
waiting_tasks_count AS WaitCount,
100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage,
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS RowNum
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',
'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',
'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',
'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',
'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',
'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', 'BROKER_RECEIVE_WAITFOR')
)
SELECT
W1.wait_type AS WaitType,
CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S,
CAST (W1.WaitS * 1000 / W1.WaitCount AS DECIMAL(14, 2)) AS AvgWait_MS,
CAST (W1.ResourceS AS DECIMAL(14, 2)) AS Resource_S,
CAST (W1.SignalS AS DECIMAL(14, 2)) AS Signal_S,
W1.WaitCount AS WaitCount,
CAST (W1.Percentage AS DECIMAL(4, 2)) AS Percentage
FROM Waits AS W1
INNER JOIN Waits AS W2
ON W2.RowNum <= W1.RowNum
GROUP BY W1.RowNum, W1.wait_type, W1.WaitS, W1.ResourceS, W1.SignalS, W1.WaitCount, W1.Percentage
HAVING SUM (W2.Percentage) - W1.Percentage < 97 -- percentage threshold
AND W1.WaitCount > 0;
GO
September 12, 2011 at 10:23 am
WaitTypeWait_SAvgWait_MSResource_SSignal_SWaitCountPercentage
CXPACKET25955.363.8222734.463220.90678771938.08
OLEDB17137.440.1417137.440.0012570371925.14
PAGEIOLATCH_SH8344.7426.258298.3446.4031791312.24
SOS_SCHEDULER_YIELD4127.010.1851.294075.72223269846.05
BACKUPIO2285.140.822273.7311.4127762383.35
WRITELOG1565.000.741444.10120.9021012872.30
LATCH_EX1354.820.161092.97261.8583067901.99
BACKUPTHREAD1235.28135.541235.010.2791141.81
ASYNC_IO_COMPLETION1017.534086.451017.520.012491.49
BACKUPBUFFER980.150.16900.9279.2263068881.44
ASYNC_NETWORK_IO936.730.30899.3837.3631182731.37
SQLTRACE_LOCK487.045.49465.6721.37887360.71
PREEMPTIVE_OS_AUTHENTICATIONOPS389.930.17389.930.0022847360.57
SLEEP_BPOOL_FLUSH328.392.93324.913.481119840.48
September 12, 2011 at 10:26 am
Tx, and now these 2?
SELECT* FROM sys.dm_exec_query_optimizer_info
Don't do any of the recommendations, it's just very rough guidelines for us.
SELECT
optz.counter
, optz.occurrence
, CONVERT(DECIMAL(18,2), optz.value) AS AvgValue
, conf.name
, conf.value
, conf.value_in_use
, Uptime.DaysUptime AS [Days Server UPTIME & Last Stats Reset]
, CASE WHEN Uptime.DaysUptime < 45 THEN 'You may not have very meaningful stats because of a recent restart' ELSE NULL END AS [Stats Warning]
, CASE WHEN optz.value < conf.value THEN 'Cost threshold for parallelism might be too low' ELSE NULL END AS [Cost Threshold Warning]
, CASE WHEN conf.value_in_use <> conf.value THEN 'Server needs to be restarted for the setting to take effect' ELSE NULL END [Restart Warning]
FROM
sys.dm_exec_query_optimizer_info optz
CROSS JOIN sys.configurations conf
OUTER APPLY(
SELECT
CONVERT(DECIMAL(18 , 2) , DATEDIFF(hh , create_date , GETDATE()) / 24.0) AS DaysUptime
FROM
sys.databases
WHERE
name = 'tempdb'
) Uptime
WHERE
optz.counter = 'final cost'
AND conf.name = 'cost threshold for parallelism'
GO
Viewing 15 posts - 1 through 15 (of 34 total)
You must be logged in to reply to this topic. Login to reply