Update all values in a row or just what changed? Several considerations...

  • We have an application that currently updates all values in a row no matter how many values changed. I understand from a normal read/write perspective, nothing should change. A page of data still has to be (maybe) read into the buffer, updated, and then written as a page back to disk.

    However, what if the table is very wide and 1 row spans multiple pages?

    What is the effect on the log?

    What if the table is replicated?

    What about latching (I believe still at a page level)?

    I presume more CPU?

    I thought I would post this question here first as I research each point so that my findings can be updated by myself or others and maybe I can get some answers faster. OR even more to consider! 🙂

    Jared
    CE - Microsoft

  • Do you mean a case where you have overflow or LOB data?

  • Overflow, it's close to 100 columns.

    Jared
    CE - Microsoft

  • SQLKnowItAll (12/2/2015)


    We have an application that currently updates all values in a row no matter how many values changed. I understand from a normal read/write perspective, nothing should change. A page of data still has to be (maybe) read into the buffer, updated, and then written as a page back to disk.

    However, what if the table is very wide and 1 row spans multiple pages?

    What is the effect on the log?

    What if the table is replicated?

    What about latching (I believe still at a page level)?

    I presume more CPU?

    I thought I would post this question here first as I research each point so that my findings can be updated by myself or others and maybe I can get some answers faster. OR even more to consider! 🙂

    So... Some research:

    1. A wider table affects CPU and log. CPU time/utilization goes up as more columns are added to the update. I cannot determine what proportion is due to calculating the entry for the log vs all of the other things that get updated unnecessarily; i.e. statistics, indexes, system tables, etc.

    2. I found that there were more rows added to the log, but that is because besides the row itself being updated... so were many other things in the database as mentioned above; i.e. statistics, indexes (many of which did not need to be). So, more unnecessary writes to the log.

    3. If the table is replicated, it now has more log entries to scan needlessly. In addition, the default SCALL method for stored procedures will receive only the updated values and a bitmask to determine which columns were updated. It now gets all values needlessly. Not sure yet if this will have an impact on the bitmask comparison (not 100% sure how that is executed yet).

    4. Latching... Again, because there are unnecessary index updates along with other things, there will be more latches to be acquired.

    5. Saw an increase in CPU vs. updating 1 value over the entire row.

    Moral... When someone tells you there is no performance difference between updating every value on the row vs only the changed, they are wrong if they don't specify that they are talking about data reads and writes. Even then, does an index need to be updated or not? Probably more than you would like. At least, that's my take away. Of course, you need to balance the cost of determining what to update vs the performance impact.

    Jared
    CE - Microsoft

  • It's also a bitch for "field level auditing".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Useful findings. If I understand you correctly, those observations seem to be related to the fact that more columns were included but not specifically due to additional pages being included. That is to say, even a row which fit on one page would see these consequences if an update was changed to increase the number of updated columns.

    I do know that CDC is relatively smart about this, and updating a column value to itself will not produce rows in the change table. I would expect transactional (and merge?) replication to behave the same way since it uses the same underlying mechanism.

    I guess it makes sense that the same is not true of indexes, since these have to be updated right away without the lazy-reading luxury of figuring out whether or not anything really changed in the same way that CDC can.

  • A dummy update (updating a field to its current value) is a common 'trick' used in transactional and (IIRC) merge repl.

    Any foreign keys in the update's plan?

  • Don Halloran (12/6/2015)


    Useful findings. If I understand you correctly, those observations seem to be related to the fact that more columns were included but not specifically due to additional pages being included. That is to say, even a row which fit on one page would see these consequences if an update was changed to increase the number of updated columns.

    I do know that CDC is relatively smart about this, and updating a column value to itself will not produce rows in the change table. I would expect transactional (and merge?) replication to behave the same way since it uses the same underlying mechanism.

    I guess it makes sense that the same is not true of indexes, since these have to be updated right away without the lazy-reading luxury of figuring out whether or not anything really changed in the same way that CDC can.

    Correct. The fact that data is spanning multiple pages is irrelevant here, at least as far as I could see. Logically that makes sense because you can't modify half of a row or we would lose ACID properties.

    Not sure about CDC, but that makes sense because of its purpose. However, Transactional Replication does not care about the value staying the same, as that is not its purpose. It simply replays the commands that it is supposed to without regard as to what those commands are. You can see this by updating a value to itself and the command will in fact replicate.

    Jared
    CE - Microsoft

  • Updating every column's going to make update triggers more complex, as you now can't use IF UPDATE(... or similar, as every column will return true.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • SQLKnowItAll (12/7/2015)


    Not sure about CDC, but that makes sense because of its purpose. However, Transactional Replication does not care about the value staying the same, as that is not its purpose. It simply replays the commands that it is supposed to without regard as to what those commands are. You can see this by updating a value to itself and the command will in fact replicate.

    Yeah, that makes sense, you've got to actually push the transaction just in case triggers will fire etc. I wonder what happens with merge replication. I've actually never used either!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply