Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

SQL Trace causes application to hang Expand / Collapse
Author
Message
Posted Thursday, November 22, 2012 3:53 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:11 AM
Points: 834, Visits: 2,442
Hi,

We've recently upgraded to Windows 2008 R2, SQL Server 2008 R2 SP + CU4, multi-instance cluster. 32 cores (64 with h/t), 384GB per node. Maxdop 4 server wide, 320GB max server memory. It used to be Win2k3 R2, SQL Server 2005 SP3, 2 x single instance clusters - 8 cores, 96GB RAM, Maxdop 0 server wide.

We have 6 databases on this server, the main one being an Enterprise Data Management Platform that munges data from lots of different sources. It's a 3rd party product that has something like 18000 encrypted stored procedures, 12000 tables, 3000 views. Good, eh . There are triggers on a lot of these tables that send insert/update/deletes to a local staging database, then on to a repository database across a linked server.

For the past few years we have been using a SQL monitoring product (name witheld to protect the innocent) against out production cluster (multi-instance) without any problems. However, on the day we upgraded, when we turned on a particular feature that monitors the types of queries being executed against the server, some data load went from completing in 15 mins, to taking 14 hrs to run. When we turned off this feature, it returned to normal. No other databases appear to be affected.

We contacted the vendor of the monitoring software and found out in essense that this feature runs a SQL trace capturing the following events: SQL:BatchCompleted, SQL:StmtCompleted, SP:Completed for events with a run duration of greater than 5000ms. Sure enough, just running the trace against the server whilst processing data had exactly the same effect as using the query monitoring feature of the software. So, I think we say that although the monitoring software causes the problem, it does so only by leveraging a feature native to SQL Server.

The EDM vendor was aware of the conflict between their product and the monitoring software but have been unable to explain the cause, or reproduce it in their environment. They've also tried against a version of the DB that doesn't have any encrypted procedures (I don't have that copy).

During the slow down, I query sys.dm_exec_requests and all of the sessions associated with the hung jobs have a RUNNABLE status with a 'last_wait_type' of SOS_SCHEDULER_YIELD. There's nothing in sys.dm_os_waiting_tasks. No evident blocking. CPU's are completely idle. I tried changing MAXDOP settings, affinity, hyperthreading, etc. as well (to make it more like the old system), but no effect. As soon as I stop the trace, everything resumes at normal speeds and runs through to completion.

These are the top 10 waitstats:

wait_type	wait_time_ms	signal_wait_time_ms	resource_wait_time_ms	percent_total_waits	percent_total_signal_waits	percent_total_resource_waits
PREEMPTIVE_OS_WRITEFILEGATHER 4729935 0 8553549 0.308682089780738 0.000000000000000 0.308682089780738
BACKUPTHREAD 4721076 2466 8620446 0.311185274808784 0.001188989695061 0.311096281335619
ASYNC_IO_COMPLETION 1543967 532 18797472 0.678385914247604 0.000256505481659 0.678366715331252
OLEDB 1019783 0 93509304 3.374581401551842 0.000000000000000 3.374581401551842
LCK_M_IX 192111 707 458001 0.016553940830775 0.000340882284837 0.016528426481413
MSQL_DQ 167309 0 2099902 0.075781659483654 0.000000000000000 0.075781659483654
LCK_M_IS 94128 69 1338334 0.048300539928959 0.000033268568109 0.048298049843943
LCK_M_U 86708 316 270195 0.009762251994894 0.000152360398880 0.009750848127286
LCK_M_S 77223 1699 245245 0.008911761653415 0.000819178220563 0.008850447820931
LCK_M_X 61582 1207 241601 0.008762500905236 0.000581958865344 0.008718942461557


Any ideas what else I might check? Like I said, this used to work fine under SQL 2005/Win2k3, with the same version of the monitoring software.

Thanks,
Andy
Post #1387817
Posted Friday, November 23, 2012 7:56 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:11 AM
Points: 834, Visits: 2,442
looks specifically to be sp:Completed that's causing the problems. Any ideas?
Post #1388172
Posted Friday, November 23, 2012 8:02 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 1:26 PM
Points: 6,300, Visits: 13,580
adb2303 (11/22/2012)
Windows 2008 R2, SQL Server 2008 R2 SP + CU4, multi-instance cluster.



adb2303 (11/22/2012)
32 cores (64 with h/t),



adb2303 (11/22/2012)
384GB per node.




adb2303 (11/22/2012)
Maxdop 4 server wide

??
Why dont you just pull all the CPUs back out except for the one socket in that case.

Why have you installed 32 cores and then put a choker around SQL Servers neck?




-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1388173
Posted Friday, November 23, 2012 8:04 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:11 AM
Points: 834, Visits: 2,442
weeks of testing demonstrated that MAXDOP 4 was most performant for our workload
Post #1388174
Posted Friday, November 23, 2012 8:09 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 1:26 PM
Points: 6,300, Visits: 13,580
you never considered raising the Cost threshold for parallelism instead?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs"
Post #1388176
Posted Friday, November 23, 2012 8:14 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:11 AM
Points: 834, Visits: 2,442
yes. We still achieved more consistent results with setting maxdop.

Post #1388181
Posted Tuesday, November 27, 2012 3:09 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:11 AM
Points: 834, Visits: 2,442
anyone seen this before? It looks specifically to be the SP:Completed event that causes the delays. I suspect it's to do with the encrypted procedures, but 2005 worked okay.
Post #1389044
Posted Tuesday, November 27, 2012 5:13 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 4:34 AM
Points: 231, Visits: 740
if ı were you Firtly ı make maxdop 1 and ı set automaticly set processor affinity
and automaticly set i/o affinity
Post #1389111
Posted Tuesday, November 27, 2012 6:35 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:11 AM
Points: 834, Visits: 2,442
Hi,

I've already tried a variety of maxdop and affinitiy options, to no avail. I can reproduce this problem in our test environments, which only have 2 vCPUs...
Post #1389152
Posted Tuesday, November 27, 2012 6:37 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, August 20, 2014 4:34 AM
Points: 231, Visits: 740
I think it is related your encreptyted procedures
Post #1389153
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse