Risks of NOLOCK, part 1

  • TomThomson (11/2/2015)


    I got this one wrong because I reckoned that when there is no concurrent activity the saving from not requesting shared locks is so small as to be absolutely negligible, so I answered that NOLOCK in the absence of concurrent activity had no effect on performance :blush:. Although if sufficient things are accessed to make requesting and releasing shared locks a significant overhead SQL Server will upgrade to use locks with a higher grain, to push the cost back down, it's still true that the cost is non-zero, so Hugo's answer is correct and I should have said that it may reduce cost.

    That's what I thought when answering. And also got it wrong. Yes, the answer is technically right, but might mislead people. It's a good question, though.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Using NOLOCK is something I was taught to do by an 'expert' when I first started using SQL. Fortunately, I learned better later on. Unfortunately, by that time, I had a lot of stored procedures with this in it and lots of people had copied that code into their own stored procedures.

  • TomThomson (11/2/2015)


    I got this one wrong because I reckoned that when there is no concurrent activity the saving from not requesting shared locks is so small as to be absolutely negligible, so I answered that NOLOCK in the absence of concurrent activity had no effect on performance :blush:.

    As mentioned before, I expect this to be around 1-2 percennt of total query time (for simple queries). Won't make you the hero of the department, but may still help.

    Don't forget that it's relatively complex to take a lock, due to the locking hierarchy. A shared lock on a row also implies an IS lock on the page and on the object; all three have to be checked against competing locks and then taken, and released again after the read completes. Plus, for the row-level lock the index values need to be hashed, and hashing tends to be a relatively expensive action for the CPU.

    Maybe in an environment where the DBAs are really disciplined it wouldn't happen, no-one would run any updates or inserts when they shouldn't

    I like to call that type of environment a read-only filegroup. 🙂

    And in that case you do not need NOLOCK, when a filegroup is marked read-only SQL Server knows that it doesn't have to take locks.

    That being said, I will accept NOLOCK in some rare cases - long an complex reports that for whatever reason have to run on the OLTP server, and that are aggregated down enough that the errors introduced by the issues mentioned in this and tomorrow's question become statistically insignificant. The reason for allowing it would not be the performance gain, but to avoid OLTP processes being blocked. And I will make sure that the developers work around all the potential issues!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • on top of that, if sql knows there is nolock hint used, it might decide to do allocation order scan in same situation which might be faster.

  • Luis Cazares (11/2/2015)


    That's what I thought when answering. And also got it wrong. Yes, the answer is technically right, but might mislead people. It's a good question, though.

    Yes, it's a very good question. It seems that a lot of people have a lot to learn from it. Only 7% correct answers so far, 93% got it wrong.

    About 2% (between 6 and 8 people) so far got none of the four correct options at all.

    Tom

  • I got the three primary issues right, but missed the performance one. I would be surprised if the performance difference was as high as 1-2% and I suspect it is a tiny fraction of a percent. Yes - it doesn't have to check for certain types of locks, so that particular code doesn't execute, but if there is no concurrency, there aren't any lock records to check anyway.

    BTW - I am a believer in "nolock" hints - as long as they are applied the the appropriate places. If everyone coded their SQL correctly and took into account concurrency, you shouldn't need them, but unfortunately that isn't the case.

    In one application that I support, the operational model is such that records, once written, are rarely if ever changed. Instead, a new "supplement" record is added. So - when a user is retrieving data about a particular person, incident, etc., that data is virtually always "static", as is all the data in the code table/pick list tables referenced in the query (all of which are static values). So - using "with (nolock)" on all these queries avoids having the queries block on other transactions against the tables, since we know "a priori" that those transactions will not interfere. Further, should the user actually manage to "see" an uncommitted transaction that is subsequently rolled back, that would cause no significant harm (and the likelihood of this occurring is effectively nil).

    The problem is that the update/add of a new "record" into the system involves inserting/updating 1..n records in as many as 20 different tables in a single transaction. Further, the application code is running on one of N application servers and using VB and .Net code (begin transaction, open recordsets on multiple tables, get empty or existing records, update recordset fields locally, then close recordsets to write to the database, and finally commit transaction). Thus, transactions can create locks on multiple tables for quite a long time - sometimes several seconds - and locks frequently escalate to table-level.

    Without getting the developers to completely re-design and re-write all the application code, using "nolock" hints on most queries is the only way to keep those queries from unnecessary blocking on inserts/updates on unrelated records.

    Yes - it is a "kludge", but it works and is effective, and when used properly is safe.

    BTW - I agree with the comment about using "with (ReadUncommitted)" instead of "with (nolock)". The actual behavior is much clearer that way.

  • Good questions, thanks Hugo. Seeing as only 8% got all four correct, I'd have liked to see each set of options as a separate QotD on four consecutive days.


    [font="Tahoma"]Personal blog relating fishing to database administration:[/font]

    [font="Comic Sans MS"]https://davegugg.wordpress.com[/url]/[/font]

  • My devs think this is the magical 'sql gofast' hint.

    *facepalm*

  • Kim Crosser (11/2/2015)


    Yes - it is a "kludge", but it works and is effective, and when used properly is safe.

    I, too, think that NOLOCK certainly has its uses. You just have to be aware of all the potential side effects.

    In your case, the question that is scheduled for tomorrow might be more applicable.

    david.gugg (11/2/2015)


    Good questions, thanks Hugo. Seeing as only 8% got all four correct, I'd have liked to see each set of options as a separate QotD on four consecutive days.

    Heh! Since there is another set of four scheduled for tomorrow, your plan would expand this into eight yes/no questions on 8 consecutive days. I think that a lot of people would grow bored.

    Also, the percentage correct/incorrect is not what matters; the goal of the QotD is to educate. I am happy to hear from the feedback given by you and others that I achieved that goal.

    Manic Star (11/2/2015)


    My devs think this is the magical 'sql gofast' hint.

    *facepalm*

    So point them here. Or show them how dangerous it is. It is quite easy to set up a demo using two SSMS windows - one to simulate modifications, the second to simulate reads. Run both in a tight loop and program the second to only show the "wrong" results, then see how many of those wrong results show up in a few minutes.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Kim Crosser (11/2/2015)


    BTW - I am a believer in "nolock" hints - as long as they are applied the the appropriate places. If everyone coded their SQL correctly and took into account concurrency, you shouldn't need them, but unfortunately that isn't the case.

    .....

    Without getting the developers to completely re-design and re-write all the application code, using "nolock" hints on most queries is the only way to keep those queries from unnecessary blocking on inserts/updates on unrelated records.

    Yes - it is a "kludge", but it works and is effective, and when used properly is safe.

    BTW - I agree with the comment about using "with (ReadUncommitted)" instead of "with (nolock)". The actual behavior is much clearer that way.

    Actually I would tend to disagree with you on most of the statements above, and I guess the quiz tomorrow will shed some light on that.;)

    I would just say that queries with nolock have a tendency to return "funny" results on already committed rows, so I would never call nolock as being a "safe" hint.

    Btw, take a look at snapshot isolation setting as an option for your legacy apps.

  • serge 65885 (11/2/2015)


    I would just say that queries with nolock have a tendency to return "funny" results on already committed rows, so I would never call nolock as being a "safe" hint.

    If you are referring to queries resulting in table or index scans failing to pick up index page splits, the probability of that occurring in this particular application is virtually nil. The clustered indexes are on monotonically increasing integer values, so index page splits aren't going to occur very often. But more to the point, the queries on which we use "nolock" tend to be for a set of chained records across tables using primary key linkages (should rarely generate scans, even index scans), and lots of joins with many tables that are effectively static (updated maybe once a year or even less often).

    Btw, take a look at snapshot isolation setting as an option for your legacy apps.

    We have looked at it, but have concerns with how it uses an optimistic concurrency model and if a conflicting transaction occurs, it will error out and roll back the transaction.

    A lot of the code in this application dates back a long way, and we know that it doesn't handle errors all that well already. Changing the default database mode would create a new set of possible failure conditions. Right now, the application's database transactions rarely fail, and since the update statements for a particular portion of the application all reference the tables in the same order within any given transaction, the ReadCommitted locking ensures that transactions may block, but deadlocks are very, very, rare.

    Since insert/update/delete transactions are initiated from multiple threads on up to 6 different application servers currently, it is very possible that with Snapshot Isolation, race conditions could occur where random deadlocks occur. At this point, "if it ain't broke, don't fix it" applies.

    If someone out there could reassure me that this concern is invalid (with supporting detail), I will take another look at Snapshot Isolation. Otherwise, changing the database from ReadCommitted to Snapshot Isolation just to avoid using "nolock" in some queries seems like an unnecessary risk.

    Of course, we NEVER use "nolock" hints in anything that will itself result in updates to any of the databases.

  • Gr8 question, only the Constraints Violation part I was not sure about, thanx.

    Thanks & Best Regards,
    Hany Helmy
    SQL Server Database Consultant

  • I looked some more at the amount of performance difference caused by NOLOCK (when it is safe) and because 1 or 2 percent seemed rather tight I devised some tests.

    I have some problems with elapsed time tests using current_timestamp and datediff because I don't want to mess about with what system tasks are running on the machine that I've just managed to get back into a usable state, and that means I'm letting non-sql things kick in; I did some tests that way but discovered that the variation in elapsed time for each method was rather large and I could see no statistically significant difference between running with and without locks. That was on extremely trivial queries (looking for about 2^17 rows in about 2^20 with simple search criteria) and locking was faster than not locking as often as it was slower.

    I didn't believe that was useful, or even credible. So I decided to try using sys.dm_exec_procedure_stats instead, and look at worker time (CPU usage) as well as elapsed time. Of course that also made things a lot simpler, far less wrapping code around the code being measured - having seen the reduction in wrapping garbage I believe I should have gone to the system view first and not considered the other approach.

    The results, however, are quite startling. Two sets of code, one of which uses NOLOCK on its read-only tables and the other of which doesn't. According to that system view, the code using NOLOCK requires more worker time (about 6% more) and more elapsed time (about 4% more) than the code using shared locks. The min and max worker times and elapsed times are greater for the NOLOCK case than for the case with locks, as well as the max times. I've run through this a few thousand times, and the results are pretty consistent.

    I don't for a moment believe that this is typical behaviour (I can remember designing a locking system with intention locks and similar complications for an Oracle port way back then, and I know that the sort of locking system that is needed if one doesn't put special hardware in to assist can be expensive (I tend to keep on forgetting it because I worked with clever hardware at other times, but a negative cost is beyond belief).

    So it seems that it is possible for use of NOLOCK to make a query run slower, instead of faster.

    Or maybe I'm missing something, so I've put the code below to see if someone can spot an error in what I'm doing. It uses 3 built in tables to construct a temp table (a heap) with just one unindexed integer column containing 1048551 rows, with many values occurring more than once - the construction is pretty crude and straightforwards, the temp table contains both positive and negative integers between -275535 and +275535, not all numbers in that range are present, and some values are present twice. The 3 built in tables are a simple tally table(single integer column which is the primary key clustered) containing all values from 0 to 65535 inclusive; a prime table (single integer column which is primary key clustered) containing the prime numbers from 2 to 65521 inclusive, and a composite table (which is the primary key on which the table is clustered) containing all the composite numbers from 4 to 65535 inclusive. The query being measured is to replace all values in the temp table which are in the composite table and not in the prime table by null and compare the case where that query has NOLOCK hints on the prime and composite tables with the case where it doesn't; the two queries to be measured are encapsulated in two single line SPs so that sys_dm_exec_procedure_stats will catch them. The calls are wrapped in begin transaction and rollback, so that the next call gets the same data. The number of runs is determined by inserting extra lines of code - six lines is enough to run the test once with and once without nolock hints. It could have been a while loop instead, of course.

    Here are the main results I get when running the test 27 times:-

    SP.......cached_time....................execution_count..AvWorkerTime...total_worker_time...AvElapsedTime...total_elapsed_time

    dxxN…2015-11-03.19:38:55.603….....27......................445184...........12019989.................198794...............5367461

    dxxL...2015-11-03.19:38:54.140….....27......................420030...........11340833.................190366...............5139892

    Here's the code:

    use playpen

    go

    set nocount on;

    create table #temp(t int);

    go

    if exists (select 1 from sys.procedures where name = 'dxxL' and schema_id = 1) drop proc dxxL

    go

    create proc dbo.dxxL as

    update #temp set t = null where t not in (select p from prime) and t in (select c from composite)

    go

    if exists (select 1 from sys.procedures where name = 'dxxN' and schema_id = 1) drop proc dxxN

    go

    create proc [dbo].[dxxN] as

    update #temp set t = null where t not in (select p from prime(nolock)) and t in (select c from composite(nolock))

    go

    insert #temp(t) select c from composite;

    insert #temp(t) select p from prime;

    insert #temp(t) select i from tally;

    insert #temp(t) select -i from tally;

    insert #temp(t) select -c from composite;

    insert #temp(t) select -p from prime;

    insert #temp(t) select 70000*sign(t)+t from #temp;

    insert #temp(t) select 140000*sign(t)+t from #temp;

    delete #temp where t in (0,1);

    select count(*) from #temp;

    select * from #temp where t = 1; -- try to get it all in cache

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    begin tran ;

    exec dxxL;

    rollback tran ;

    begin tran ;

    exec dxxN;

    rollback tran ;

    select * from sys.dm_exec_procedure_stats ;

    select top(5) p.name [SP], qs.cached_time, qs.execution_count,

    qs.total_worker_time/qs.execution_count [AvWorkerTime], qs.total_worker_time,

    qs.total_elapsed_time/qs.execution_count as [AvElapsedTime], qs.total_elapsed_time

    from sys.procedures p with (nolock) inner join sys.dm_exec_procedure_stats qs

    ON p.object_id = qs.object_id

    WHERE qs.database_id = DB_ID()

    ORDER BY qs.total_worker_time desc ;

    go

    drop table #temp ;

    drop proc dbo.dxxL,dbo.dxxN ;

    Tom

  • TomThomson (11/3/2015)


    So it seems that it is possible for use of NOLOCK to make a query run slower, instead of faster.

    Thank you for all the work here, but I think the rollbacks and "update" transactions in your test scenario may be creating some of the difference.

    I created a 1 Million row table of random integers between 1 and 1,000,000

    create table tRand (ix int not null, rand int not null, constraint pk_tRand primary key (ix));

    and a similar prime table (values from 2..65521 like yours). I also created a "result" table of integers to store the result of the "select" statement

    I created two stored procedures, which did:

    create procedure testwnl as

    declare @i int = 100;

    set nocount on;

    while @i > 0

    begin

    insert into tResult (rval)

    select tr.ix from rRand with (nolock)

    join tPrime tp with (nolock)

    on tp.prime = tr.rand);

    set @i = @i - 1;

    end;

    create procedure testwlock as

    declare @i int = 100;

    set nocount on;

    while @i > 0

    begin

    insert into tResult (rval)

    select tr.ix from rRand -- with (nolock)

    join tPrime tp -- with (nolock)

    on tp.prime = tr.rand);

    set @i = @i - 1;

    end;

    I then ran the two using a simple shell:

    truncate table tResult;

    exec dbo.testnl;

    truncate table tResult;

    exec dbo.testwlock;

    I got exactly the reverse results:

    After multiple runs, the average worker time and average elapsed time was consistently lower for the "with (nolock)".

    The "with (nolock)" procedure only used 94.22% of the average worker time and 92.48% of the average elapsed time compared to the procedure without the "nolock" hints.

    I do believe there are *interesting* scenarios where the "with (nolock)" could potentially slow down the SQL engine compared to the default locking, but I suspect that even there the difference will be far less than then 1-2% as originally stated.

    (Edit)

    BTW - for fun, I also rand these joining the Prime values to the "ix" column (the primary key).

    This was even more in favor of the (nolock) - the (nolock) queries ran at 81.78% of worker time, and 81.86% of elapsed time.

  • Tom, KIm: Thanks for all the effort you put in. I plan to run some tests myself now, but that will have to wait a few days until I can free up the time.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply