Phantoms and isolation

  • david.gugg (8/14/2014)


    It would have been helpful know the number of correct answers to choose.

    Agree, though you can get it right by just reading the answers sequence and deduce the # of correct answers of them, it`s 10 choices every consecutive 2 are opposit to each other. So of course you can`t choose both of them 😉

    Thanks & Best Regards,
    Hany Helmy
    SQL Server Database Consultant

  • Not an easy one, thanks Tom.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Oops got it wrong but nice question....

  • Hany Helmy (8/17/2014)


    Raghavendra Mudugal (8/15/2014)


    Thank you Tom, very interesting one.

    (I used only READ UNCOMMITTED, and others I have just read them but never had any practical experience on them, so had really no idea what to choose based on reality, but learnt a lot. thank you.)

    For me, I am using "READ COMMITTED" as my production databases can`t tolerate dirty reads, but had read a lot about Isolation levels so could got it right. Anyway it`s always good to share information & practical experiences.

    that's default, so you mention "read committed" again in the script block? or you are saying you don't use any isolation level literally?

    (actually, I was having this discussion on the twitter, and I was pointed that "read uncommitted" or "nolock" is a good sign of poor coding and/or poor db design. they can be used in dwh level but not in general. I never found any good example on how to avoid these, say we use nolock and it reads dirty and executes the select statement faster, now how to achieve the same speed without using nolock? what are the changes we have to make... stuff like that... that would be really extraordinary thing to read for me.) 🙂

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Going to be very honest about my opinions of this QOTD here.

    Some things are right and that is easy to notice on the surface.

    Some things are noticeably wrong because they try to find truth in a flaw that may not exist.

    1. I got this question "wrong" selecting the "right" answers sent out in email the next day.

    How interesting. Must explain why 78% of the people answering got it "wrong".

    2. The explanation uses the natural behavior of @@TranCount to "prove" a misprint in the BOL?!

    Then does not quote any source other than his own opinion that the BOL are not correct.

    http://technet.microsoft.com/en-us/library/aa933178(v=SQL.80).aspx"> http://technet.microsoft.com/en-us/library/aa933178(v=SQL.80).aspx

    Note: Rolling back a Tran and saving the Point Name of that tran does not affect @@TranCount.

    Test how @@TranCount works in any nested transaction.

    3. There has to be a less obtuse way to make a point if you are trying to teach knowledge.

    4. I did find the subject matter informative and do believe several people learned something from your QOTD.

    It definitely helped reinforce a few things I had already learned.

    Thanks so much for all of the above.

  • Raghavendra Mudugal (8/18/2014)


    say we use nolock and it reads dirty and executes the select statement faster, now how to achieve the same speed without using nolock?

    That's the wrong question to ask.

    The short answer is that NOLOCK and READUNCOMMITTED jeopardize the correctness of the data. What is more important? Do you want incorrect answers fast, or do you prefer correct answers even if that takes a bit more time? If you value speed over correctness, than I suggest that you simply replace the entire query with [font="Courier New"]SELECT 42;[/font] - this will probably also return incorrect data, but you won't get it any faster than this!

    The slightly longer answer is that if you have performance problems, NOLOCK is not the answer. First investigate the cause. If your queries are slow because they have to wait excessively long for locks to be released, find the long-running transactions that hold those locks and speed them up. If they are just slow and you hope to get a bit of extra performance by avoiding the overhead of taking and releasing locks, then you should tune the query - rewrite, check indexes, etc. If you did all tuning possible and it's still fast enough, then either spend $$$ on better hardware, spend $$$ on better consultants, or accept the slower performance and work on expectation management.

    Compromising correctness of the results is never the answer. Unless you are in a DW where you report rounded to millions and a few thousand off can be considered insignificant rounding error (and even then, you'll still have to include retry logic because a scan with NOLOCK can abort due to error data movement (error 601)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (8/18/2014)


    Raghavendra Mudugal (8/18/2014)


    say we use nolock and it reads dirty and executes the select statement faster, now how to achieve the same speed without using nolock?

    That's the wrong question to ask.

    The short answer is that NOLOCK and READUNCOMMITTED jeopardize the correctness of the data. What is more important? Do you want incorrect answers fast, or do you prefer correct answers even if that takes a bit more time? If you value speed over correctness, than I suggest that you simply replace the entire query with [font="Courier New"]SELECT 42;[/font] - this will probably also return incorrect data, but you won't get it any faster than this!

    The slightly longer answer is that if you have performance problems, NOLOCK is not the answer. First investigate the cause. If your queries are slow because they have to wait excessively long for locks to be released, find the long-running transactions that hold those locks and speed them up. If they are just slow and you hope to get a bit of extra performance by avoiding the overhead of taking and releasing locks, then you should tune the query - rewrite, check indexes, etc. If you did all tuning possible and it's still fast enough, then either spend $$$ on better hardware, spend $$$ on better consultants, or accept the slower performance and work on expectation management.

    Compromising correctness of the results is never the answer. Unless you are in a DW where you report rounded to millions and a few thousand off can be considered insignificant rounding error (and even then, you'll still have to include retry logic because a scan with NOLOCK can abort due to error data movement (error 601)

    Or you could just do lots of testing to make sure using NOLOCK is not affecting your queries results adversely and move on.

    Why waste time, money, and effort on things that can be cheaply tested and reliably implemented.

  • PHYData DBA (8/18/2014)


    Or you could just do lots of testing to make sure using NOLOCK is not affecting your queries results adversely and move on.

    Why waste time, money, and effort on things that can be cheaply tested and reliably implemented.

    So how would you plan to cover all possible race conditions?

    Remember that with things like this (both the effect of dirty reads and the chance of the error due to data movement) effect on timing. One millisecond difference between the timing of two transactions can make a huge difference.

    There's simply no way to test this extensively. NOLOCK in production code is a ticking timebomb. Not going to happen on my servers!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (8/18/2014)


    Raghavendra Mudugal (8/18/2014)


    say we use nolock and it reads dirty and executes the select statement faster, now how to achieve the same speed without using nolock?

    That's the wrong question to ask.

    The short answer is that NOLOCK and READUNCOMMITTED jeopardize the correctness of the data. What is more important? Do you want incorrect answers fast, or do you prefer correct answers even if that takes a bit more time? If you value speed over correctness, than I suggest that you simply replace the entire query with [font="Courier New"]SELECT 42;[/font] - this will probably also return incorrect data, but you won't get it any faster than this!

    The slightly longer answer is that if you have performance problems, NOLOCK is not the answer. First investigate the cause. If your queries are slow because they have to wait excessively long for locks to be released, find the long-running transactions that hold those locks and speed them up. If they are just slow and you hope to get a bit of extra performance by avoiding the overhead of taking and releasing locks, then you should tune the query - rewrite, check indexes, etc. If you did all tuning possible and it's still fast enough, then either spend $$$ on better hardware, spend $$$ on better consultants, or accept the slower performance and work on expectation management.

    Compromising correctness of the results is never the answer. Unless you are in a DW where you report rounded to millions and a few thousand off can be considered insignificant rounding error (and even then, you'll still have to include retry logic because a scan with NOLOCK can abort due to error data movement (error 601)

    Thank you, Hugo, for the details.

    But I still don't understand how my asking is wrong? my first part of the questions implies of the dirty reads, excessive locks on the table and we use nolock , second part of the questions implies of not using nolocks and still get the better performance with proper data. I was trying to know to avoid nolock what the steps like, in changes in db design, or in script or in indexes.. to be made, like hints?

    ofcourse, "do you prefer correct answers even if that takes a bit more time?" this is our choice. see, in our web app, 70000 to 90000 users are logged in from all part of the world (even more in some peak hours), even though we keep tuning all the SPs from time to time, ... there are so many SPs where nolock usage is still there. I am not sure that "bit more time" is helpful in all the scenarios, even if we could avoid it we are still using it and as a low level developer I can't raise such questions/concerns to them. All I can do is improvise our skills of knowing how to avoid nolock, fetch the data and send to the client with speed and to make sure to repeat this process to all the users.

    as you said "nolock - not on my servers" - I am much interested in knowing the design tips and how to consider the SP logic and deliver the best performance with consistent data.

    thank you.

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Raghavendra Mudugal (8/18/2014)


    But I still don't understand how my asking is wrong?

    You asked how to achieve the same speed without nolock. That question suggests that if it would not be possible, nolock would be a valid alternative. It is not.

    The question you can ask is: how to improve performance. That is a very valid question, but there is no simple answer. It depends on what causes the slowness. Maybe you need to tune a long-running update to reduce blocking. Maybe you need to tweak indexes to get a faster plan. Maybe you need to rewrite a query. Maybe you have to revisit the entire data model of the database. Or maybe you have no choice but to upgrade hardware, or to accept that you are at the best achievable performance.

    If you use nolock to solve performance problems, you are fooling yourself.

    Have you ever bought a house where the previous owner had painted the walls purple, but you want them light yellow? In that case, you know that you need to apply at least two or three layers of white before applying the yellow to prevent the purple from shining through. That takes time (sand the wall, apply paint, wait for paint to dry, sand the wall again, etc etc). You can choose to apply just a single layer of white before applying the yellow, and that will make the process a lot faster - but every housepainter can tell you that you'll end up with a pinkish glow on the yellow, and no professional housepainter will accept an assignment to paint the walls yellow without applying sufficient layers of white primer.

    Using nolock is like skipping a layer of primer. The only excuse I will ever accept for it is ignorance, and I invest the time to write these replies to combat that ignorance. Those who know the potential risks of NOLOCK and still use it, are knowingly exposing their employer to the full liability of potentially serving incorrect data. (And just to make sure you understand the full potential, NOLOCK not only can return uncommitted data, if can also skip existing rows, or return rows twice. Or, as mentioned before, cause a runtime error that you should make sure to handle).

    as you said "nolock - not on my servers" - I am much interested in knowing the design tips and how to consider the SP logic and deliver the best performance with consistent data.

    I normalize my data model as far as I can (currently working on an inherited database with little time to change the data model, but still making improvements where I can)

    I write all code that modifies data to make transactions as short-lived as possible

    I write all code to be as efficient as possible

    I think very carefully about indexes - not just for a single query, but for the entire database

    And if I had over 70,000 simultaneous users, as you have, I would give the in-memory OLTP option (Hekaton) some very serious consideration.

    I'm sorry that I cannot give easier answers - but as you probably know, there is no simple recipe for performance tuning.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • PHYData DBA (8/18/2014)


    1. I got this question "wrong" selecting the "right" answers sent out in email the next day.

    How interesting. Must explain why 78% of the people answering got it "wrong".

    As you said that I looked to see what the daily email on the 15th said. It said the things which, if you had checked the corresponding boxes, would have ensured that the system told you that you had got it right. Since I think it unlikely that SqlServerCentral sent you a different version of teh daily email, I imagine that you must inadvertently have checked a box that you didn't intend to check.

    2. The explanation uses the natural behavior of @@TranCount to "prove" a misprint in the BOL?!

    The only behaviour of trancount that is interesting in this connection is that it doesn't access any data in the database; I could equally well have chosen select 'x' as a statement that doesn't manipulate any database data; the point is that such a statement at that point executes to completion, so the transaction has not been aborted before that point; the following select statement, which does attempt to manipulate (read) database data does not execute to completion, so clearly the transaction is aborted when that statement begins, before it delivers any results.

    Since the BOL page says that the transaction is aborted before it gets to the point where the extra select is inserted, it's clearly wrong. An extra set transaction isolation level statement after the one that BoL says aborts the transaction can make the transaction run tsuccessfully to completion - and there is no magic mechanism to unabort an aborted transaction, so that too makes it clear that the statement in BOL is wrong.

    Tom

  • Raghavendra Mudugal (8/15/2014)


    Thank you Tom, very interesting one.

    (I used only READ UNCOMMITTED, and others I have just read them but never had any practical experience on them, so had really no idea what to choose based on reality, but learnt a lot. thank you.)

    There are cases where READ UNCOMMITTED isolation level is acceptable: the obvious case is the case where you know for certain, perhaps because a ban on certain actions at certain times is enforced, that nothing can insert rows, delete rows, or update rows in any table that the code running with that isolation level will touch. Most other cases are going to be unacceptable, because there's no point at all in running code which may produce an incorrect result.

    In fact for applications with non-trivial transactions it's quite probably that READ COMMITTED is unacceptable and REPEATABLEREAD is needed (again, because there's no point in code that is likely to deliver incorrect results), and for applications with genuinely complex transactions, SERIALIZABLE or SNAPSHOT will be needed.

    Note though that for business intelligence applications using a data warehouse which is updated only at specific times the "obvious case" mentioned above is rather common. And for business intelligence applications working on archived historical data (eg on records of activity undertaken last year or earlier) it applies almost all the time.

    Tom

  • Thank you, Hugo, for all the in depth details and for your time. This is very helpful.

    Hugo Kornelis (8/18/2014)


    Those who know the potential risks of NOLOCK and still use it, are knowingly exposing their employer to the full liability of potentially serving incorrect data....

    Actually our app is running from more than a decade now, and there is no single user had complained about the incorrect data, but we received 1000s of replies at the Level-1 about performance (this is like a till 2006), then in parallel we built the same on the SQL server 2005, but due to some reason it was not live. till 2008 and R2 came, we were on sql 2000, and finally the parallel line work on 2008R2 was acceptable to them and then full testing with all aspects with every angle, in 2011 the switch was made. Now the performance is fine, on the new hardware on the sql server, and till now not even single user complained about the bad data. Actually this is where all my concerns are pointing to, "how?". I need to gain few more years of experience, so I can question them or I get the permission so I can study the code.

    Hugo Kornelis (8/18/2014)


    And if I had over 70,000 simultaneous users, as you have, I would give the in-memory OLTP option (Hekaton) some very serious consideration.

    Yup, talks are going on to upgrade to 2014, hope soon this happens, being like 2% part of this transformation I am very excited...:-D

    I normalize my data model as far as I can (currently working on an inherited database with little time to change the data model, but still making improvements where I can)

    I write all code that modifies data to make transactions as short-lived as possible

    I write all code to be as efficient as possible

    I think very carefully about indexes - not just for a single query, but for the entire database

    I'm sorry that I cannot give easier answers - but as you probably know, there is no simple recipe for performance tuning.

    This is more than a tip, and thank you so much for your guidance.

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • TomThomson (8/18/2014)


    There are cases where READ UNCOMMITTED isolation level is acceptable: the obvious case is the case where you know for certain, perhaps because a ban on certain actions at certain times is enforced, that nothing can insert rows, delete rows, or update rows in any table that the code running with that isolation level will touch.

    But then there would be no locks held that would be incompatible with SELECT statements and so READ UNCOMMITTED is unnecessary anyway. That's why I ask developers not to use it, full stop.

    Raghavendra Mudugal (8/19/2014)


    Actually our app is running from more than a decade now, and there is no single user had complained about the incorrect data

    And that is just as worrying as if dozens of users had complained. The only thing worse than incorrect data is incorrect data that you don't know about.

    John

  • TomThomson (8/18/2014)


    There are cases where READ UNCOMMITTED isolation level is acceptable: the obvious case is the case where you know for certain, perhaps because a ban on certain actions at certain times is enforced, that nothing can insert rows, delete rows, or update rows in any table that the code running with that isolation level will touch. Most other cases are going to be unacceptable, because there's no point at all in running code which may produce an incorrect result.

    In fact for applications with non-trivial transactions it's quite probably that READ COMMITTED is unacceptable and REPEATABLEREAD is needed (again, because there's no point in code that is likely to deliver incorrect results), and for applications with genuinely complex transactions, SERIALIZABLE or SNAPSHOT will be needed.

    Note though that for business intelligence applications using a data warehouse which is updated only at specific times the "obvious case" mentioned above is rather common. And for business intelligence applications working on archived historical data (eg on records of activity undertaken last year or earlier) it applies almost all the time.

    Thank Tom, for the information, very useful.

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply