nolock experiment scripts

Question

Post reply

nolock experiment scripts

Gabriel P

SSCarpal Tunnel

Points: 4289
More actions
June 2, 2015 at 3:41 pm

#317892

Does anyone have any scripts which show how nolock can produce duplicate records in a select statement? I remember Itzik Ben Gan demonstrating with a pair of scripts at SQL Pass a long time ago. All I remember is that one script involved an infinite while statement, and the other involved doing selects in which u had to run the script manually
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

mister.magoo SSC-Forever Points: 47174 More actions · Answer 1

Yes, I do:

Setup some test data first:

use tempdb;

go

-- create some test data, only 10 rows is fine

if object_id('test') is not null

drop table test;

go

create table [dbo].[test](

[id] [char](10) not null,

[dated] [date] null,

[value] [money] null,

constraint [pk_test_46137] primary key clustered

(

[id] asc

)with (fillfactor = 100)

) ;

go

insert test(id, dated, value)

select top 10 cast(row_number() over(order by @@spid) as char(10)), dateadd(day, row_number() over(order by @@spid), '1 jan 2010'), cast(abs(checksum(newid())) as money)

from (values(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(a)

go

Then open two new sessions:

Session 1

use tempdb;

go

-- use a cte to slow down our select so we have time to change some data

with delay as

(

select cast(1 as bigint) as c

union all

select cast(c+1 as bigint)

from delay

where c<300000 -- increase this number to give yourself more time to run the second query

)

select *

into test2

from test with(nolock)

cross apply (

select count(*)

from delay

where c <> value

) x(x)

order by id

option(maxrecursion 0, maxdop 1);

-- display the results - you should have two entries for 2nd Jan 2010

select *

from test2

order by dated;

-- clean up this session

drop table test2;

Session 2

use tempdb;

go

update test

set id='9999999999'

where id = '1';

Run the code in Session 1 (Read with nolock)

Wait a second or two, the run the code in the Session 2 (update)

Switch back to Session 1 and you should see a duplicate for 2nd Jan 2010.

It is slightly contrived, but demonstrates the problem.

EDIT: Wait, what? I just tried this without NOLOCK and it still created duplicates! More research needed!!! :w00t:

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 2

Hmm...unless I'm missing something, that's not terribly surprising, is it?

READ COMMITTED only guarantees that you don't read data changed by uncommitted transactions. It doesn't prevent reading a row multiple times (or missing it altogether) if an UPDATE moves it around an index that is being scanned.

Since READ COMMITTED releases shared locks as soon as the data is read, nothing prevents the SELECT from reading the row with ID=1 and then releasing the shared lock, allowing the other session to update the ID and move it to the end of the clustered index, where the SELECT will happily read it again.

To prevent that sort of phenomenon, you need SERIALIZABLE and its range locks (at least, you need that if you want to prevent both duplicates and phantoms; if you just want to prevent duplicates, REPEATABLE READ should also work).

It's late, so I may be missing the point entirely, but I think that's what is at issue here.

Going back to Gabriel's original question, you don't need NOLOCK to produce duplicate records. As mister.magoo has already shown, you can get duplicate records in READ COMMITTED.

The unique problem with NOLOCK is reading changes from uncommitted transactions. Reading rows twice or missing them completely when they move around an index being scanned applies to READ COMMITTED just as well.

Of those two (counting a row twice when it's updated after being read, and missing a row when it's updated before being read), the latter applies still yet to REPEATABLE READ.

Only SERIALIZABLE (well, of the pessimistic models) prevents both of those.

Cheers!

mister.magoo SSC-Forever Points: 47174 More actions · Answer 3

Well, of course it's obvious if you explain it clearly 🙂

In my defence, I never said I knew anything about anything!

Thanks Jacob

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 4

Jacob Wilkins (6/2/2015)
Going back to Gabriel's original question, you don't need NOLOCK to produce duplicate records. As mister.magoo has already shown, you can get duplicate records in READ COMMITTED.

You can, but far less commonly.

To get a duplicate in read committed, you must be scanning an index and a concurrent change must be made to a row, changing the key value of the index you're scanning from one you've already read to one you haven't yet read. The number of rows you see duplicated (or missed) is the number of rows updated. Also assumes that locks are row or page and that SQL hasn't decided to take a table lock.

To get a duplicate in read uncommitted all you need is for a change (insert or update) to split a page and have that half a page of rows moved either ahead or behind the current scan point, a single change can result in you seeing up to 4k of data twice or not at all.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

The demo code I used in a recent article.

Setup:

CREATE TABLE Transactions (

TransactionID INT IDENTITY PRIMARY KEY,

TransactionDate DATETIME,

Amount NUMERIC(12,4),

Filler VARCHAR(200)

);

CREATE TABLE Results (

Rows INT,

SumAmount NUMERIC(28,4) -- should now be large enough

);

WITH Rows AS (

SELECT TOP (5000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS Number FROM msdb.sys.columns AS c1 CROSS JOIN msdb.sys.columns AS c2

)

INSERT INTO Transactions (TransactionDate, Amount, Filler)

SELECT DATEADD(minute,RAND(CHECKSUM(NEWID()))*1000000, '2014-01-01') AS SomeDate, RAND(CHECKSUM(NEWID()))*25000 AS Amount, '' FROM Rows;

In window 1, run this:

WHILE (1=1)

UPDATE dbo.Transactions

SET Filler = Filler + ' '

WHERE TransactionID = ROUND(RAND(CHECKSUM(NEWID()))*5000000,0)

It updates one row at a time, increasing the length of the filler column by 1 character each time

In Window 2, run this:

WHILE (1=1)

INSERT INTO Results

SELECT COUNT(*) AS Rows, SUM(Amount) AS Total FROM dbo.Transactions WITH (NOLOCK)

Run them for a while, then stop both and look at the contents of the results table. In my test (SQL 2012), I was getting row counts that were incorrect by up to 2500 and just over 60% of the queries returned incorrect results.

Under read committed, because the key of the clustered index isn't getting changed and hence rows aren't moving within the index, this scenario can't result in duplicate or missing rows.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 6

mister.magoo (6/2/2015)
Well, of course it's obvious if you explain it clearly 🙂
In my defence, I never said I knew anything about anything!
Thanks Jacob

No problem! Believe me, I've been embarrassingly perplexed by far more obvious things in my time 🙂

GilaMonster (6/3/2015)
Jacob Wilkins (6/2/2015)
Going back to Gabriel's original question, you don't need NOLOCK to produce duplicate records. As mister.magoo has already shown, you can get duplicate records in READ COMMITTED.
You can, but far less commonly.
To get a duplicate in read committed, you must be scanning an index and a concurrent change must be made to a row, changing the key value of the index you're scanning from one you've already read to one you haven't yet read. The number of rows you see duplicated (or missed) is the number of rows updated. Also assumes that locks are row or page and that SQL hasn't decided to take a table lock.
To get a duplicate in read uncommitted all you need is for a change (insert or update) to split a page and have that half a page of rows moved either ahead or behind the current scan point, a single change can result in you seeing up to 4k of data twice or not at all.

An excellent point! In my head I was certain I had at least mentioned this, but in silly fashion I obviously completely omitted it (probably the victim of my excessive re-writing). Thanks for clarifying that, Gail!

That's also a nice demo. If you don't mind I'll probably stash that one away. I have one similar to mister.magoo (I use a function to slow down the select to 1 second per row, just to make the timing predictable and make sure populating the tables won't be beyond the machines I might run on) for demonstrating duplicate reads under READ COMMITTED, but I'm curiously lacking a NOLOCK-centric demo.

On my first run of the demo (2014 Developer Edition) it looks like the sum of the amounts managed to overrun NUMERIC (14,4), so I had to change that, but otherwise it works like a charm.

Thanks again!