UPDATE when the values are the same

  • ChrisM@Work (8/16/2012)


    but for a CI, there is so much activity associated with a value change that I'm surprised there's no check.

    I suspect it's because of the level and point at which things are done.

    This 'don't dirty the page, don't log the update' is going to very likely be a storage engine optimisation. When updating a clustered index key, the query optimiser generates a plan that splits the update into delete/insert (and caches that plan for future reuse), so when the query processor runs that plan, there's no update operator and all the storage engine sees is a delete and an insert.

    To get this optimisation for clustered index keys, the optimiser would have to check whether the value is going to be changed or not and generate a different plan if it is vs if it's not. That plan would then not be reusable, so we have multiple plans per query which can't currently happen

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/16/2012)


    Paul showed that for a heap (the only case in discussion here), the page is not dirtied and the update log record not written

    I just changed the table definition to:

    CREATE TABLE [dbo].[Test](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [Name] [varchar](100) NOT NULL,

    PRIMARY KEY (ID)

    ) ON [PRIMARY]

    Now it's not a heap anymore.

    You may try it for yourself - still no dirty pages.

    Sorry Paul.

    It's different only when PK is declared on (Name).

    Then it goes like Paul described - delete row-insert row.

    But it's kind of different thing.

    _____________
    Code for TallyGenerator

  • GilaMonster (8/16/2012)


    They however are different binary values when stored in the data page.

    Yep.

    Exactly my point.

    To find out the values are different SQL Server first needs to write a new value to the page.

    Actually write.

    _____________
    Code for TallyGenerator

  • GilaMonster (8/16/2012)

    The page was not marked as dirty, the update log record was not written when the update does not change the value of the row (as in the exact binary representation of the row on the page).

    You agreed again.

    "the exact binary representation of the row on the page" can be find out only AFTER the page is updated with a new value.

    Otherwise - it just does not exist.

    So, the new value is written to the page.

    _____________
    Code for TallyGenerator

  • GilaMonster (8/16/2012)


    ChrisM@Work (8/16/2012)


    but for a CI, there is so much activity associated with a value change that I'm surprised there's no check.

    I suspect it's because of the level and point at which things are done.

    This 'don't dirty the page, don't log the update' is going to very likely be a storage engine optimisation. When updating a clustered index key, the query optimiser generates a plan that splits the update into delete/insert (and caches that plan for future reuse), so when the query processor runs that plan, there's no update operator and all the storage engine sees is a delete and an insert.

    To get this optimisation for clustered index keys, the optimiser would have to check whether the value is going to be changed or not and generate a different plan if it is vs if it's not. That plan would then not be reusable, so we have multiple plans per query which can't currently happen

    Apart from parallel & serial. Makes sense to me, thanks Gail.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Sergiy (8/16/2012)


    Now it's not a heap anymore.

    You may try it for yourself - still no dirty pages.

    Sorry Paul.

    It's different only when PK is declared on (Name).

    Then it goes like Paul described - delete row-insert row.

    But it's kind of different thing.

    Why are you apologising to me? That does not conflict with anything I wrote in http://bit.ly/NonUpdatingUpdates. Quote:

    Non-updating updates to a clustered table generally avoid extra logging and page flushing, unless a column that forms (part of) the cluster key is affected by the update operation.

  • ChrisM@Work (8/16/2012)


    It's interesting that in only the cheapest case has update to self been optimized out. Perhaps checking for change in a LOB column is more expensive than optimizing out - but for a CI, there is so much activity associated with a value change that I'm surprised there's no check.

    Hi Chris,

    The main optimization avoids updating non-clustered indexes unnecessarily; my blog post was intended to expand on that well-known behaviour to explore some of the quirky details concerning the base table. The non-clustered index optimization is much easier to see in wide (per index) update plans, because there are lots of extra Compute Scalar and Filter operators that make it easy to see what the query processor is doing. The same optimization occurs in narrow (per-table) plans for non-clustered indexes, but the details are not so visible. As far as clustered index updates are concerned, these can be optimized, but only if no part of the key is affected.

  • Sergiy,

    I've been trying to work out why you're getting so excited about all this. Perhaps you are wondering whether an update to the same value is actually written to the heap or clustered base table page or not. I'm not sure why you care, to be honest. If the value doesn't change on the page, the page is exactly the same (byte for byte) before and after the operation, so who cares whether the value was updated in memory or not? Without checking SQL Server source code, how could you ever know for sure? No data change log records are written, the page isn't marked dirty (and so will not be flushed to disk), so where's the real-world issue here?

    On a second point, I want to emphasise something Gail tried to make clear to you earlier. Collation only determines the semantics of a comparison for the query processor (so 'tom' = 'TOM' for a case-insensitive collation). The storage engine only cares about binary representation; the strings 'tom' and 'TOM' are represented by the byte sequences 0x546F6D and 0x544F4D on the page regardless of collation. It's easy to make sense of some of the behaviours seen one you realise this difference.

  • SQL Kiwi (8/22/2012)


    ChrisM@Work (8/16/2012)


    It's interesting that in only the cheapest case has update to self been optimized out. Perhaps checking for change in a LOB column is more expensive than optimizing out - but for a CI, there is so much activity associated with a value change that I'm surprised there's no check.

    Hi Chris,

    The main optimization avoids updating non-clustered indexes unnecessarily; my blog post was intended to expand on that well-known behaviour to explore some of the quirky details concerning the base table. The non-clustered index optimization is much easier to see in wide (per index) update plans, because there are lots of extra Compute Scalar and Filter operators that make it easy to see what the query processor is doing. The same optimization occurs in narrow (per-table) plans for non-clustered indexes, but the details are not so visible. As far as clustered index updates are concerned, these can be optimized, but only if no part of the key is affected.

    Thanks for the confirmation Paul, that's pretty much what I digested from your article - but I'll stick to my comment above. Updating a clustered index key causes a delete/insert to the table and updates any non-clustered indexes to reflect the new key. And whatever else. That's a lot of (unnecessary) work if the new value of the key column is the same as the old one. Optimising out a change to a non-key column, whether or not associated with a non-clustered index, is far less dramatic.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/22/2012)


    Updating a clustered index key causes a delete/insert to the table and updates any non-clustered indexes to reflect the new key. And whatever else. That's a lot of (unnecessary) work if the new value of the key column is the same as the old one. Optimising out a change to a non-key column, whether or not associated with a non-clustered index, is far less dramatic.

    I take your point; it is a good reason not to update clustering keys ever, even to the same value. If you feel strongly about this, and/or want an explanation of the reasons for not optimizing in this case, consider filing a suggestion on the Connect website.

  • SQL Kiwi (8/22/2012)


    ChrisM@Work (8/22/2012)


    Updating a clustered index key causes a delete/insert to the table and updates any non-clustered indexes to reflect the new key. And whatever else. That's a lot of (unnecessary) work if the new value of the key column is the same as the old one. Optimising out a change to a non-(cluster)key column, whether or not associated with a non-clustered index, is far less dramatic.

    I take your point; it is a good reason not to update clustering keys ever, even to the same value. If you feel strongly about this, and/or want an explanation of the reasons for not optimizing in this case, consider filing a suggestion on the Connect website.

    Heh - your point, my italics. This is effectively discrimination against natural-key advocates. When do surrogate keys change other than during data migration?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • SQL Kiwi (8/22/2012)


    Sergiy,

    I've been trying to work out why you're getting so excited about all this. Perhaps you are wondering whether an update to the same value is actually written to the heap or clustered base table page or not. I'm not sure why you care, to be honest. If the value doesn't change on the page, the page is exactly the same (byte for byte) before and after the operation, so who cares whether the value was updated in memory or not? Without checking SQL Server source code, how could you ever know for sure? No data change log records are written, the page isn't marked dirty (and so will not be flushed to disk), so where's the real-world issue here?

    On a second point, I want to emphasise something Gail tried to make clear to you earlier. Collation only determines the semantics of a comparison for the query processor (so 'tom' = 'TOM' for a case-insensitive collation). The storage engine only cares about binary representation; the strings 'tom' and 'TOM' are represented by the byte sequences 0x546F6D and 0x544F4D on the page regardless of collation. It's easy to make sense of some of the behaviours seen one you realise this difference.

    SQL Kiwi (8/22/2012)


    Sergiy,

    I've been trying to work out why you're getting so excited about all this. Perhaps you are wondering whether an update to the same value is actually written to the heap or clustered base table page or not. I'm not sure why you care, to be honest. If the value doesn't change on the page, the page is exactly the same (byte for byte) before and after the operation, so who cares whether the value was updated in memory or not? Without checking SQL Server source code, how could you ever know for sure? No data change log records are written, the page isn't marked dirty (and so will not be flushed to disk), so where's the real-world issue here?

    On a second point, I want to emphasise something Gail tried to make clear to you earlier. Collation only determines the semantics of a comparison for the query processor (so 'tom' = 'TOM' for a case-insensitive collation). The storage engine only cares about binary representation; the strings 'tom' and 'TOM' are represented by the byte sequences 0x546F6D and 0x544F4D on the page regardless of collation. It's easy to make sense of some of the behaviours seen one you realise this difference.

    Paul,

    I don't quite understand why are you so vigorously defending the wrong answer?

    The question was :

    Does an UPDATE query overwrite existing values if the already existing value is the same as the one your updating with?

    Correct answer is "Yes, it does".

    You may look up in the SQL Server source code and see it's true.

    I don't need to do this as I can find what's in the code by the outcome of its actions. It's a normal practice for scientists.

    I bet there is nobody who survived being inside of a nuclear bomb during its explosion. But there are pretty detailed descriptions of the processes happening in there. How could they ever know for sure?

    As a result of updating to the same value SQL Server starts transaction, locks the resources, updates the page, closes transaction.

    After that Storage engine starts comparing data in the updated page with its previous contents and may find it's not changed, so no saving to disk operation is required.

    But it does not change the fact that the data from "no-update UPDATE" has been written to the page.

    You may check - when we do "where A <> @a" there is no transaction logged, no lock applied, SQL Server does not even start to validate the contents of the pages.

    And it has to run all those activities when we update values with the same ones.

    I'm not sure what you where trying to explain with that collation remark.

    I know some facts about collations, some of them are even exceeding the knowledge you shared with us.

    When we talk if the values are the same value in a cell before and after we're talking about comparison according to the data type rules.

    When we're asking the server if a page is the same before and after an update we're talking about 8K of binary block. Regardless of what kind of data we put in this block.

    _____________
    Code for TallyGenerator

  • Sergiy,

    A transaction is indeed started and an exclusive lock is taken in preparation for an update, but how do you know for sure that the page is actually written to? (And more to the point, why do you care?)

    Say we are 'updating' from 'tom' to 'tom'; how do you know that the storage engine doesn't simply compare what is already there on the page (bytes 0x546F6D) with what the query processor has asked it to update the value to (also bytes 0x546F6D) and simply not bother doing anything? That explains the observed behaviour just as well, doesn't it? By the way, I'm not advocating either way on this point. I simply don't know for sure (and I don't think you do either). On the other hand, I stand by everything I wrote in my blog, and the remarks to Chris Morris about non-updating updates to non-clustered indexes. The present discussion is completely confined to base table pages.

    Your suggestion that the storage engine compares before and after versions of the page also raises questions. (1) Where does the storage engine keep the old version of the 8KB page to compare with? (2) Why not compare the single row instead? (3) Why not compare just any 'updated' column values?

    You may check - when we do "where A <> @a" there is no transaction logged, no lock applied, SQL Server does not even start to validate the contents of the pages.

    I can't find any code like that in this thread so it's not clear what you are referring to. Perhaps you are thinking of your 'Tom' test. In that case, a predicate (@I % 2 = 0) is applied in the Table Scan operator. If a row doesn't pass that test, it never gets as far as the Table Update operator, so of course no activity is seen.

    I'm not sure what you where trying to explain with that collation remark.

    I was trying to help you think about the 'Tom' versus 'TOM' test. You seemed confused between the comparisons the query processor makes (taking account of collation) and the comparisons made by the storage engine (which deals in raw bytes).

  • Sergiy (8/23/2012)


    The question was: "Does an UPDATE query overwrite existing values if the already existing value is the same as the one your updating with?

    I think the questioner was more concerned with real-world performance implications (which we have covered) rather than answering literally, don't you agree? Without access to the SQL Server source code, none of us can say for sure whether a page write occurs. Who knows, the behaviour might also have changed between SQL Server editions and builds. Without any real-world implications, my question stands as to why you care so much about this.

  • SQL Kiwi (8/23/2012)


    Sergiy (8/23/2012)


    The question was: "Does an UPDATE query overwrite existing values if the already existing value is the same as the one your updating with?

    I think the questioner was more concerned with real-world performance implications (which we have covered) rather than answering literally, don't you agree?

    Talking about real world.

    I mentioned at the beginning of this thread - I managed to remove 8+GB from daily LOG file drowth on a database simply by adding WHERE Status <> 0 to the UPDATE Status = 0 statement called within a regular Agent job.

    8+GB of useless data. Every day. On a database of 1.5 GB. Clogging data and backup repositories, causing performance issues, etc. Because of a single non-update UPDATE.

    Is it "real world" enough for you?

    _____________
    Code for TallyGenerator

Viewing 15 posts - 31 through 45 (of 159 total)

You must be logged in to reply to this topic. Login to reply