Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Any info on Deadlock detection algorithms? Expand / Collapse
Author
Message
Posted Wednesday, August 15, 2012 9:19 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, June 11, 2014 7:26 AM
Points: 25, Visits: 89
Hi Gurus,

I'm trying to find a starting point from which to troubleshoot an issue we are facing where the same operations in one environment cause a deadlock, whereas in a different environment the operations just block indefinitely. One process is a query involving many tables, one of them being "TableA" we'll say. The second process is just one that truncates "TableA".
Isolation Level of the databases are identical. The 2 environments are configured identically, same hardware, OS, settings etc.
In one environment, we kick off the 2 processes which will contend with eachother and they start blocking, but SQL Server will shortly declare a deadlock and kill one of them
In the second environment, we kick of the same 2 processes and they just stay blocking forever.

Does anyone have a good technical overview of how SQL Server 2005 decides that 2 processes are in a deadlock? I know the "general" scenario of a deadlock:
ProcessA has a lock on X and wants Y
ProcessB has a lock on Y and wants X

But in this case, It doesn't fit nicely into this canned scenario. One of the processes is just trying to truncating a table that the other process is doing a select on. In my simple mind I don't see this as a scenario that should cause a deadlock. The truncate should just keep waiting until the select is done.

Thanks
Post #1345336
Posted Wednesday, August 15, 2012 9:24 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:50 AM
Points: 39,977, Visits: 36,339
If it is just a single truncate and a single select (neither in a transaction), that shouldn't deadlock. For a deadlock, the SQL deadlock detector needs to be able to walk the lock request chains and find a loop.

Can you post the deadlock graph (from traceflag 1222)?

Are the two environments the same w.r.t. database settings?



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1345341
Posted Wednesday, August 15, 2012 9:31 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, June 11, 2014 7:26 AM
Points: 25, Visits: 89
Thanks!!!
Here's the trace. Check out the last deadlock between SPID 113 and 20


  Post Attachments 
sworksobidbc_Deadlock07JULY2012.trc.txt (17 views, 90.50 KB)
Post #1345351
Posted Wednesday, August 15, 2012 9:53 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:50 AM
Points: 39,977, Visits: 36,339
This one?

<deadlock victim="process59ee718">
<process-list>
<process id="process59ee718" taskpriority="0" logused="0" waitresource="OBJECT: 11:1529108538:3 " waittime="4109" ownerId="1805859773" transactionname="TRUNCATE TABLE" lasttranstarted="2012-07-30T07:00:56.693" XDES="0x18752eef0" lockMode="Sch-M" schedulerid="12" kpid="9428" status="suspended" spid="113" sbid="0" ecid="0" priority="0" transcount="1" lastbatchstarted="2012-07-30T07:00:56.693" lastbatchcompleted="2012-07-30T07:00:27.413" clientapp="Microsoft JDBC Driver for SQL Server" hostname="sworksetlap1" hostpid="0" loginname="SIEBEL" isolationlevel="read committed (2)" xactid="1805859773" currentdb="11" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
<executionStack>
<frame procname="adhoc" line="1" sqlhandle="0x01000b001cd16b34d07bedd00c0000000000000000000000">
TRUNCATE TABLE WC_EPCR_SALESPERMISSION_D </frame>
</executionStack>
<inputbuf>
TRUNCATE TABLE WC_EPCR_SALESPERMISSION_D </inputbuf>
</process>
<process id="processb0276d8" taskpriority="0" logused="0" waitresource="OBJECT: 11:1529108538:2 " waittime="1156" ownerId="1805864888" transactionname="Lookup" lasttranstarted="2012-07-30T07:00:59.803" XDES="0xb4db192f8" lockMode="Sch-S" schedulerid="4" kpid="11692" status="suspended" spid="80" sbid="0" ecid="0" priority="0" transcount="0" lastbatchstarted="2012-07-30T07:00:54.417" lastbatchcompleted="2012-07-30T07:00:37.770" clientapp="Oracle Business Intelligence" hostname="SWORKSOBIAP2" hostpid="1848" loginname="SIEBEL" isolationlevel="read uncommitted (1)" xactid="1805854139" currentdb="11" lockTimeout="4294967295" clientoption1="687865888" clientoption2="128056">
<executionStack>
<frame procname="adhoc" line="1" sqlhandle="0x020000006a1a2a045235b1e2b649205d19221596a59e7a9f">
select max(D4.c2) as c4,
max(D4.c1) as c5
from
(select distinct T307241."PRODUCTROLLUPFS" as c2
from
"WC_EPCR_ARREGION_TERR_D" T307909,

"W_DAY_D" T70600 /* Dim_W_DAY_D_Order_Date */ ,

"WC_EPCR_SALESPERMISSION_D" T306520,

"WC_EPCR_ARSALESP_D" T306527,

"WC_EPCR_FSPRODCOMP_D" T307241,

"WC_EPCR_BOOKING_DETAIL_F" T307839,

"WC_EPCR_MKTROLLUP_D" T307886,

"WC_EPCR_FSPRODCOMP_HIST_D" T311257
where ( T70600."ROW_WID" = T307839."ORDER_DATE_WID" and T306520."SALESPERSON_CODE" = T306527."SALESPERSON_CODE" and T306527."ROW_WID" = T307839."SALES_PERSON_WID" and T307839."TERRITORY_WID" = T307909."ROW_WID" and T307241."ROW_WID" = T307839."FSPRODCOMP_WID" and T307241."ROW_WID" = T311257."FSPRODCOMP_WID" and T70600."PER_NAME_MONTH" = N'2012 / 06' a </frame>
<frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
select max(D4.c2) as c4,
max(D4.c1) as c5
from
(select distinct T307241."PRODUCTROLLUPFS" as c2
from
"WC_EPCR_ARREGION_TERR_D" T307909,

"W_DAY_D" T70600 /* Dim_W_DAY_D_Order_Date */ ,

"WC_EPCR_SALESPERMISSION_D" T306520,

"WC_EPCR_ARSALESP_D" T306527,

"WC_EPCR_FSPRODCOMP_D" T307241,

"WC_EPCR_BOOKING_DETAIL_F" T307839,

"WC_EPCR_MKTROLLUP_D" T307886,

"WC_EPCR_FSPRODCOMP_HIST_D" T311257
where ( T70600."ROW_WID" = T307839."ORDER_DATE_WID" and T306520."SALESPERSON_CODE" = T306527."SALESPERSON_CODE" and T306527."ROW_WID" = T307839."SALES_PERSON_WID" and T307839."TERRITORY_WID" = T307909."ROW_WID" and T307241."ROW_WID" = T307839."FSPRODCOMP_WID" and T307241."ROW_WID" = T311257."FSPRODCOMP_WID" and T70600."PER_NAME_MONTH" = N'2012 / 06' </inputbuf>
</process>
</process-list>
<resource-list>
<objectlock lockPartition="2" objid="1529108538" subresource="FULL" dbid="11" objectname="OLAP.dbo.WC_EPCR_SALESPERMISSION_D" id="lock5771500" mode="Sch-M" associatedObjectId="1529108538">
<owner-list>
<owner id="process59ee718" mode="Sch-M"/>
</owner-list>
<waiter-list>
<waiter id="processb0276d8" mode="Sch-S" requestType="wait"/>
</waiter-list>
</objectlock>
<objectlock lockPartition="3" objid="1529108538" subresource="FULL" dbid="11" objectname="OLAP.dbo.WC_EPCR_SALESPERMISSION_D" id="lock5d704ba80" mode="Sch-S" associatedObjectId="1529108538">
<owner-list>
<owner id="processb0276d8" mode="Sch-S"/>
</owner-list>
<waiter-list>
<waiter id="process59ee718" mode="Sch-M" requestType="wait"/>
</waiter-list>
</objectlock>
</resource-list>
</deadlock>
</deadlock-list>

It looks like the select from the oracle app is part of a user transaction. Can you find out if that's the case? If so, what else is in the transaction?
Do you have lock escalation configured for the partition level?



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1345365
Posted Wednesday, August 15, 2012 10:19 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, June 11, 2014 7:26 AM
Points: 25, Visits: 89
Yes, that's the one.

The SQL is application-generated, but it's from a reporting tool that does not do any DML, only reads from star-schema data models.
Still, I'll try and dig deeper and see if it's part of a bigger transaction.

About lock escalation: It's SQL Server 2005, so I don't think partition level lock escalation is possible.
I will run dbcc tracestatus (-1) on all environments to see if any lock escalation flags are set.

Thanks so much for diving into this.
Post #1345380
Posted Wednesday, August 15, 2012 10:50 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:50 AM
Points: 39,977, Visits: 36,339
Yeah, you're right, partition locks were 2008. Lock escalation trace flags aren't relevant here.

I suspect it's just got to do with the order that locks are acquired. If the select takes a schema stability lock on partition 2, then the truncate requests a schema mod on partition 2, then the truncate takes a schema mod on partition 3 and then the select requests a schema stability on partition 3, you've got a deadlock with just two statements. From the deadlock graph, that looks like what's happening.

One thing that I would suggest - the Oracle's running read uncommitted, that's not generally a good thing. See if you can get it to default (read committed). Might help as the truncate couldn't start taking schema mod locks with shared locks in place.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1345398
Posted Thursday, August 16, 2012 6:54 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, June 11, 2014 7:26 AM
Points: 25, Visits: 89
Thanks Gail,
Appreciate all the insights provided so far.
I'm still trying to understand what the variable could be to explain the difference in behavior in the 2 environments we have.
In one, we can consistently reproduce the deadlock, in the other the statements just block, but never deadlock.
Any other variables at all besides order of lock aquisition? Are there any instance or db settings that affect sensitivity of when SQL Server detects a deadlock?



Post #1345910
Posted Thursday, August 16, 2012 7:07 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:50 AM
Points: 39,977, Visits: 36,339
Isolation levels, data distribution, execution plans for the select. No direct settings.

If it's a deadlock, it will be detected as such, so the question is why is it just blocking in one environment and deadlocking in the other. Exact same schema and database settings?



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1345920
Posted Thursday, August 16, 2012 2:57 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, June 11, 2014 7:26 AM
Points: 25, Visits: 89
schema is exactly the same, database settings are the same definitely for IL (default is READ_COMMITTED), and we're trying to weed through all other possible settings to see anything that looks different.

Data distribution could be the key, because this is certainly different.

I like your earlier hint to check if we can control how our application connects to the DB - we found it does change isolation level to dirty reads when it establishes a session and that this is configurable, but again it's another thing that is the same in both environments.

As a side question (sorry if it's a newbie one) : if the session coming from the Oracle app is set to read_uncommitted, why are we blocking the truncate table operation? In other words, why is it taking a Schema lock at all?
Post #1346256
Posted Thursday, August 16, 2012 3:03 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, June 11, 2014 7:26 AM
Points: 25, Visits: 89
nevermind... got my answer:

http://aboutsqlserver.com/2012/04/05/locking-in-microsoft-sql-server-part-13-schema-locks/

Post #1346259
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse