|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, April 05, 2013 8:15 AM
Points: 25,
Visits: 85
|
|
Hi Gurus,
I'm trying to find a starting point from which to troubleshoot an issue we are facing where the same operations in one environment cause a deadlock, whereas in a different environment the operations just block indefinitely. One process is a query involving many tables, one of them being "TableA" we'll say. The second process is just one that truncates "TableA". Isolation Level of the databases are identical. The 2 environments are configured identically, same hardware, OS, settings etc. In one environment, we kick off the 2 processes which will contend with eachother and they start blocking, but SQL Server will shortly declare a deadlock and kill one of them In the second environment, we kick of the same 2 processes and they just stay blocking forever.
Does anyone have a good technical overview of how SQL Server 2005 decides that 2 processes are in a deadlock? I know the "general" scenario of a deadlock: ProcessA has a lock on X and wants Y ProcessB has a lock on Y and wants X
But in this case, It doesn't fit nicely into this canned scenario. One of the processes is just trying to truncating a table that the other process is doing a select on. In my simple mind I don't see this as a scenario that should cause a deadlock. The truncate should just keep waiting until the select is done.
Thanks
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 5:55 AM
Points: 37,735,
Visits: 30,006
|
|
If it is just a single truncate and a single select (neither in a transaction), that shouldn't deadlock. For a deadlock, the SQL deadlock detector needs to be able to walk the lock request chains and find a loop.
Can you post the deadlock graph (from traceflag 1222)?
Are the two environments the same w.r.t. database settings?
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, April 05, 2013 8:15 AM
Points: 25,
Visits: 85
|
|
Thanks!!! Here's the trace. Check out the last deadlock between SPID 113 and 20
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 5:55 AM
Points: 37,735,
Visits: 30,006
|
|
This one?
<deadlock victim="process59ee718"> <process-list> <process id="process59ee718" taskpriority="0" logused="0" waitresource="OBJECT: 11:1529108538:3 " waittime="4109" ownerId="1805859773" transactionname="TRUNCATE TABLE" lasttranstarted="2012-07-30T07:00:56.693" XDES="0x18752eef0" lockMode="Sch-M" schedulerid="12" kpid="9428" status="suspended" spid="113" sbid="0" ecid="0" priority="0" transcount="1" lastbatchstarted="2012-07-30T07:00:56.693" lastbatchcompleted="2012-07-30T07:00:27.413" clientapp="Microsoft JDBC Driver for SQL Server" hostname="sworksetlap1" hostpid="0" loginname="SIEBEL" isolationlevel="read committed (2)" xactid="1805859773" currentdb="11" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056"> <executionStack> <frame procname="adhoc" line="1" sqlhandle="0x01000b001cd16b34d07bedd00c0000000000000000000000"> TRUNCATE TABLE WC_EPCR_SALESPERMISSION_D </frame> </executionStack> <inputbuf> TRUNCATE TABLE WC_EPCR_SALESPERMISSION_D </inputbuf> </process> <process id="processb0276d8" taskpriority="0" logused="0" waitresource="OBJECT: 11:1529108538:2 " waittime="1156" ownerId="1805864888" transactionname="Lookup" lasttranstarted="2012-07-30T07:00:59.803" XDES="0xb4db192f8" lockMode="Sch-S" schedulerid="4" kpid="11692" status="suspended" spid="80" sbid="0" ecid="0" priority="0" transcount="0" lastbatchstarted="2012-07-30T07:00:54.417" lastbatchcompleted="2012-07-30T07:00:37.770" clientapp="Oracle Business Intelligence" hostname="SWORKSOBIAP2" hostpid="1848" loginname="SIEBEL" isolationlevel="read uncommitted (1)" xactid="1805854139" currentdb="11" lockTimeout="4294967295" clientoption1="687865888" clientoption2="128056"> <executionStack> <frame procname="adhoc" line="1" sqlhandle="0x020000006a1a2a045235b1e2b649205d19221596a59e7a9f"> select max(D4.c2) as c4, max(D4.c1) as c5 from (select distinct T307241."PRODUCTROLLUPFS" as c2 from "WC_EPCR_ARREGION_TERR_D" T307909, "W_DAY_D" T70600 /* Dim_W_DAY_D_Order_Date */ , "WC_EPCR_SALESPERMISSION_D" T306520, "WC_EPCR_ARSALESP_D" T306527, "WC_EPCR_FSPRODCOMP_D" T307241, "WC_EPCR_BOOKING_DETAIL_F" T307839, "WC_EPCR_MKTROLLUP_D" T307886, "WC_EPCR_FSPRODCOMP_HIST_D" T311257 where ( T70600."ROW_WID" = T307839."ORDER_DATE_WID" and T306520."SALESPERSON_CODE" = T306527."SALESPERSON_CODE" and T306527."ROW_WID" = T307839."SALES_PERSON_WID" and T307839."TERRITORY_WID" = T307909."ROW_WID" and T307241."ROW_WID" = T307839."FSPRODCOMP_WID" and T307241."ROW_WID" = T311257."FSPRODCOMP_WID" and T70600."PER_NAME_MONTH" = N'2012 / 06' a </frame> <frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000"> unknown </frame> </executionStack> <inputbuf> select max(D4.c2) as c4, max(D4.c1) as c5 from (select distinct T307241."PRODUCTROLLUPFS" as c2 from "WC_EPCR_ARREGION_TERR_D" T307909, "W_DAY_D" T70600 /* Dim_W_DAY_D_Order_Date */ , "WC_EPCR_SALESPERMISSION_D" T306520, "WC_EPCR_ARSALESP_D" T306527, "WC_EPCR_FSPRODCOMP_D" T307241, "WC_EPCR_BOOKING_DETAIL_F" T307839, "WC_EPCR_MKTROLLUP_D" T307886, "WC_EPCR_FSPRODCOMP_HIST_D" T311257 where ( T70600."ROW_WID" = T307839."ORDER_DATE_WID" and T306520."SALESPERSON_CODE" = T306527."SALESPERSON_CODE" and T306527."ROW_WID" = T307839."SALES_PERSON_WID" and T307839."TERRITORY_WID" = T307909."ROW_WID" and T307241."ROW_WID" = T307839."FSPRODCOMP_WID" and T307241."ROW_WID" = T311257."FSPRODCOMP_WID" and T70600."PER_NAME_MONTH" = N'2012 / 06' </inputbuf> </process> </process-list> <resource-list> <objectlock lockPartition="2" objid="1529108538" subresource="FULL" dbid="11" objectname="OLAP.dbo.WC_EPCR_SALESPERMISSION_D" id="lock5771500" mode="Sch-M" associatedObjectId="1529108538"> <owner-list> <owner id="process59ee718" mode="Sch-M"/> </owner-list> <waiter-list> <waiter id="processb0276d8" mode="Sch-S" requestType="wait"/> </waiter-list> </objectlock> <objectlock lockPartition="3" objid="1529108538" subresource="FULL" dbid="11" objectname="OLAP.dbo.WC_EPCR_SALESPERMISSION_D" id="lock5d704ba80" mode="Sch-S" associatedObjectId="1529108538"> <owner-list> <owner id="processb0276d8" mode="Sch-S"/> </owner-list> <waiter-list> <waiter id="process59ee718" mode="Sch-M" requestType="wait"/> </waiter-list> </objectlock> </resource-list> </deadlock> </deadlock-list> It looks like the select from the oracle app is part of a user transaction. Can you find out if that's the case? If so, what else is in the transaction? Do you have lock escalation configured for the partition level?
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, April 05, 2013 8:15 AM
Points: 25,
Visits: 85
|
|
Yes, that's the one.
The SQL is application-generated, but it's from a reporting tool that does not do any DML, only reads from star-schema data models. Still, I'll try and dig deeper and see if it's part of a bigger transaction.
About lock escalation: It's SQL Server 2005, so I don't think partition level lock escalation is possible. I will run dbcc tracestatus (-1) on all environments to see if any lock escalation flags are set.
Thanks so much for diving into this.
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 5:55 AM
Points: 37,735,
Visits: 30,006
|
|
Yeah, you're right, partition locks were 2008. Lock escalation trace flags aren't relevant here.
I suspect it's just got to do with the order that locks are acquired. If the select takes a schema stability lock on partition 2, then the truncate requests a schema mod on partition 2, then the truncate takes a schema mod on partition 3 and then the select requests a schema stability on partition 3, you've got a deadlock with just two statements. From the deadlock graph, that looks like what's happening.
One thing that I would suggest - the Oracle's running read uncommitted, that's not generally a good thing. See if you can get it to default (read committed). Might help as the truncate couldn't start taking schema mod locks with shared locks in place.
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, April 05, 2013 8:15 AM
Points: 25,
Visits: 85
|
|
Thanks Gail, Appreciate all the insights provided so far. I'm still trying to understand what the variable could be to explain the difference in behavior in the 2 environments we have. In one, we can consistently reproduce the deadlock, in the other the statements just block, but never deadlock. Any other variables at all besides order of lock aquisition? Are there any instance or db settings that affect sensitivity of when SQL Server detects a deadlock?
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 5:55 AM
Points: 37,735,
Visits: 30,006
|
|
Isolation levels, data distribution, execution plans for the select. No direct settings.
If it's a deadlock, it will be detected as such, so the question is why is it just blocking in one environment and deadlocking in the other. Exact same schema and database settings?
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, April 05, 2013 8:15 AM
Points: 25,
Visits: 85
|
|
schema is exactly the same, database settings are the same definitely for IL (default is READ_COMMITTED), and we're trying to weed through all other possible settings to see anything that looks different.
Data distribution could be the key, because this is certainly different.
I like your earlier hint to check if we can control how our application connects to the DB - we found it does change isolation level to dirty reads when it establishes a session and that this is configurable, but again it's another thing that is the same in both environments.
As a side question (sorry if it's a newbie one) : if the session coming from the Oracle app is set to read_uncommitted, why are we blocking the truncate table operation? In other words, why is it taking a Schema lock at all?
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, April 05, 2013 8:15 AM
Points: 25,
Visits: 85
|
|
nevermind... got my answer:
http://aboutsqlserver.com/2012/04/05/locking-in-microsoft-sql-server-part-13-schema-locks/
|
|
|
|