UPDATE when the values are the same

  • It is interesting that explicitly filtering out updates isn't always faster than performing a non-updating update either:

    USE tempdb;

    SET NOCOUNT ON;

    GO

    CREATE TABLE dbo.Test

    (

    Name char(200) COLLATE Latin1_General_CI_AI NULL

    );

    GO

    INSERT dbo.Test

    SELECT TOP (123456)

    'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

    FROM sys.columns AS c

    CROSS JOIN sys.columns AS c2;

    GO

    DBCC FREEPROCCACHE;

    CHECKPOINT;

    GO

    DECLARE @start datetime2 = SYSDATETIME();

    SET STATISTICS IO ON;

    UPDATE dbo.Test WITH (TABLOCKX)

    SET Name = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';

    SET STATISTICS IO OFF;

    SELECT test = 'Non-updating', time_ms = DATEDIFF(MILLISECOND, @start, SYSDATETIME());

    GO

    DECLARE @start datetime2 = SYSDATETIME();

    SET STATISTICS IO ON;

    UPDATE dbo.Test WITH (TABLOCKX)

    SET Name = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

    WHERE Name <> 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';

    SET STATISTICS IO OFF;

    SELECT test = 'Filtered', time_ms = DATEDIFF(MILLISECOND, @start, SYSDATETIME())

    GO

    DECLARE @start datetime2 = SYSDATETIME();

    SET STATISTICS IO ON

    UPDATE dbo.Test WITH (TABLOCKX)

    SET Name = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY';

    SET STATISTICS IO OFF

    SELECT test = 'Updating', time_ms = DATEDIFF(MILLISECOND, @start, SYSDATETIME());

    GO

    DROP TABLE dbo.Test;

    Table 'Test'. Scan count 1, logical reads 3249, physical reads 0, read-ahead reads 0.

    Table 'Test'. Scan count 1, logical reads 3249, physical reads 0, read-ahead reads 0.

    Table 'Test'. Scan count 1, logical reads 3249, physical reads 0, read-ahead reads 0.

    Of course, it is usually faster to filter redundant updates (and definitely recommended) but I thought I'd share this interesting exception.

  • My five cents to the topic.

    I made a simple experiment, created a table like 'create table t1(a int)', inserted one row and updated it with the same value, and with different value with attached debugger.

    What I observed was function "sqlservr!Record::FindDiff", after call to this one, the subsequent calls for updating and nonupdating differs.

    For non-updating: nothing happens.

    For updating: some extra checks before update performed, log record prepared and issued, row modified, page marks dirty.

    As I conclude from what I've seen in debugger, writing the same value to the page in memory is not performed in common case.

    Here is two screenshots. One for totalcmd compare window, comparing call trace. Another for shorten form of it, where trace output calls presented hierarchicaly.

    For myself I consider it quite a good proof that no writes are actually performed in common case.


    I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
    Blog: http://somewheresomehow.ru[/url]
    Twitter: @SomewereSomehow

  • SomewhereSomehow (9/11/2012)


    What I observed was function "sqlservr!Record::FindDiff", after call to this one, the subsequent calls for updating and nonupdating differs. For non-updating: nothing happens. For updating: some extra checks before update performed, log record prepared and issued, row modified, page marks dirty.

    Thanks for going to all the trouble, Dima. I did a similar test (though your debugger looks prettier than WinDbg output, for sure). The method calls do confirm what we already knew about log records, dirtypages and so on.

    As I conclude from what I've seen in debugger, writing the same value to the page in memory is not performed in common case.

    I tend to agree, but I am not certain the debugger output proves it beyond all doubt. There is certainly no suggestion that a 'before copy' of the page is made, but from what I saw, it is *possible* that the engine constructs a new FixedVarRecord incorporating the 'update' and passes that to Record::FindDiff. On the other hand, it could be that FindDiff simply compares the column value on the page with the value the query processor asked it to be updated to. Without stepping through the (obfuscated) machine code line by line looking for memcpy versus memcmp calls in the right place, I don't think we could ever be 100% sure.

  • Hi, Paul! Actually, I agree with GilaMonster and you, that there is no so big trouble, if it actually writes to the page in memory or not, but due to persistence of Sergiy, I decided to look.

    SQL Kiwi (9/11/2012)


    Without stepping through the (obfuscated) machine code line by line looking for memcpy versus memcmp calls in the right place, I don't think we could ever be 100% sure.

    Absolutely agree. At first I was going to examine memory, stack and registers to look for values update 43981 (ABCDh) -> 48879 (BEEFh).

    But I refused of it, to save my time for the reason:

    SQL Kiwi (9/11/2012)


    The method calls do confirm what we already knew about log records, dirtypages and so on.

    As the method calls confirmed what is well known, the probability, that these calls stand for updating row is very high. And because we don't see any kind of that calls for nonupdating case, I conclude there is no update. This conclusion is based on two assumptions: first - "update stuff" is made by the presented calls (in this particular case), second - it is made only by these method calls, i.e. no other method calls perform writing to the page (in this particular case).

    These assumptions may be a point of doubt, of course. That is why I said, that for myself, I consider it to be convincing. I also said in common case, because this calls is just a reflection of the behavior for this particular case, that means that there might (and sure will) be many other cases, where there will be different execution path, and another call trace.

    So to say about it in common, for all possible scenarios, we should either look at sql server source code, or test all possible combinations, which is a huge room for investigation. But, as I said, for now I see no practical sense to do it.

    That is all IMHO, all the other opinions are welcome.

    p.s.

    SQL Kiwi (9/11/2012)


    though your debugger looks prettier than WinDbg output, for sure

    Well, in fact it is WinDbg. First screen is raw trace from WinDbg compared for both cases using Total Commander->Compare by Content, second is just more convenient for human representation in xml of the first one, viewed by XML Notepad.


    I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
    Blog: http://somewheresomehow.ru[/url]
    Twitter: @SomewereSomehow

  • ...

    I tend to agree, but I am not certain the debugger output proves it beyond all doubt.

    ...

    It's hard to see what will "prove it beyond all doubt" for Sergiy. May be forensic examination of the disk before and after? Adding the hardware to track the laser charge to the disk? What else? :w00t:

    I think, the evidences we have so far here are more than satisfactory.

    What I would think will be more helpful in this thread - is providing some advices for the best practices regarding that topic.

    There are different cases. Check for value change can improve or hurt performance depending on the real situation. In my experience I've seen both.

    I've seen significant performance improvements of some ETL processes for data warehouse after introduction of value-change checks. At the same time, having these checks in database layer of OLTP systems, often does lead to performance degradation. In my humble opinion, in OLTP, it's better to be done, if it is really required, in some pre-db layer, usually in DAL (Data Access Layer).

    Please note, that it has nothing to do with row-versioning, as in "value not changed" case, we don't really care if the data was changed by someone else.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (9/12/2012)


    ...

    I tend to agree, but I am not certain the debugger output proves it beyond all doubt.

    ...

    It's hard to see what will "prove it beyond all doubt" for Sergiy. May be forensic examination of the disk before and after? Adding the hardware to track the laser charge to the disk? What else? :w00t:

    + 9,223,372,036,854,775,806 (a bit more than nine quintillion)!!! :laugh:

    Tom

  • Eugene Elutin (9/12/2012)[hr

    It's hard to see what will "prove it beyond all doubt" for Sergiy.

    No, it's actually quite easy to see.

    Some proper experiment.

    Not the one-sided script specifically baked to satisfy desired conclusion.

    This was one of the proofs "beyond all doubt":

    SQL Kiwi (9/7/2012)


    In SQL Server 2008 or later:

    USE tempdb;

    GO

    CREATE TABLE Test

    (

    ID integer IDENTITY(1,1) NOT NULL,

    Name varchar(100) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL

    );

    GO

    INSERT dbo.Test

    (Name)

    VALUES

    ('Tom');

    GO

    CREATE PROCEDURE dbo.UpdateSame

    AS

    UPDATE dbo.Test

    SET Name = Name;

    GO

    CREATE PROCEDURE dbo.UpdateMixedCase

    AS

    UPDATE dbo.Test

    SET Name = 'Tom';

    GO

    CREATE PROCEDURE dbo.UpdateCaps

    AS

    UPDATE dbo.Test

    SET Name = 'TOM';

    GO

    CHECKPOINT;

    DBCC FREEPROCCACHE;

    GO

    EXECUTE dbo.UpdateSame;

    GO

    EXECUTE dbo.UpdateMixedCase;

    GO

    EXECUTE dbo.UpdateCaps;

    GO

    SELECT

    proc_name = OBJECT_NAME(deps.[object_id]),

    deps.last_logical_writes,

    deps.last_logical_reads

    FROM sys.dm_exec_procedure_stats AS deps

    WHERE

    deps.[object_id] IN

    (

    OBJECT_ID(N'dbo.UpdateSame', N'P'),

    OBJECT_ID(N'dbo.UpdateMixedCase', N'P'),

    OBJECT_ID(N'dbo.UpdateCaps', N'P')

    )

    GO

    DROP PROCEDURE

    dbo.UpdateSame,

    dbo.UpdateMixedCase,

    dbo.UpdateCaps;

    GO

    DROP TABLE

    dbo.Test;

    Output:

    It took less than 5 minutes for me to modify the script and prove beyond any doubt that the the output of the experiment contained false data and cannot be used for any conclusion.

    Unfortunately I don't have a possibility to verify the debugger output.

    But the experiment as it's performed is far from being able to provide any prove at all.

    To provide some certain outcome the experiment must include following cases:

    - updating string values with exactly same and "same by collation rules" values;

    - updating a table with a trigger using values inserted and deleted tables (since the tables contain separate sets of data they occupy different pages in memory, even if they contain the same values. According to Paul's logic the set in "inserted" table won't be created. It's important to see if UPDATE performs different set of operations in order to provide both deleted and inserted values to the trigger);

    - updating with WHERE clause filtering out rows with identical values.

    Then by comparing the traces generated by these queries we may get to some reasonable conclusion.

    _____________
    Code for TallyGenerator

  • Sergiy (9/16/2012)


    No, it's actually quite easy to see. Some proper experiment. Not the one-sided script specifically baked to satisfy desired conclusion. This was one of the proofs "beyond all doubt": [script removed] It took less than 5 minutes for me to modify the script and prove beyond any doubt that the the output of the experiment contained false data and cannot be used for any conclusion. Unfortunately I don't have a possibility to verify the debugger output. But the experiment as it's performed is far from being able to provide any prove at all.

    You have misunderstood. That post showed an exception to the rule that filtering updates is always faster. I found it interesting, and shared it because I thought other people would find it intriguing too. It was absolutely NOT to prove anything at all. It was a curiosity, for fun. (our debugger sessions were not based on that script).

    According to Paul's logic the set in "inserted" table won't be created.

    No! I have never said anything like this. The inserted and deleted pseudo-tables in a trigger will always contain all modified records (including non-updating updates). Having a trigger that scans the inserted or deleted tables will obviously mean non-updating updates have a much higher overhead. Filtering non-updating updates in the WHERE is the right thing to do, as I have said before.

  • Sergiy, I should add that the fun in the example was in working out *why* that particular case gave the performance results it did. With a long string of a suitable collation, it is more expensive to do the collation-aware comparison than it is to decide not to perform the update based on a binary comparison in the storage engine. This was a point you refused to accept earlier, so I thought you might find it educational.

  • SQL Kiwi (9/16/2012)


    According to Paul's logic the set in "inserted" table won't be created.

    No! I have never said anything like this. The inserted and deleted pseudo-tables in a trigger will always contain all modified records (including non-updating updates).

    Actually, you did.

    Trigger is executed after UPDATE. If SQL Server figures out that the values are the same and does not proceed with writing to the pages then trigger has no source for "inserted" table.

    Having a trigger that scans the inserted or deleted tables will obviously mean non-updating updates have a much higher overhead.

    I was talking not about overheads caused by a trigger.

    I was asking if SQL Server will do something different for a table with trigger in order to have "inserted" and "deleted" tables populated.

    Filtering non-updating updates in the WHERE is the right thing to do, as I have said before.

    Still not clear why do you think so.

    If SQL Server performs the same actions as in case with WHERE clause, and binary comparison of pages is faster than collation based comparison of strings - why do you advocate for the worst of these 2 options?

    _____________
    Code for TallyGenerator

  • Sergiy (9/16/2012)


    Trigger is executed after UPDATE. If SQL Server figures out that the values are the same and does not proceed with writing to the pages then trigger has no source for "inserted" table.

    If the query processor figures out that it is not going to update the row (this is decided on the basis of where clause and on clause) that row will not be in the inserted or deleted table. If the query passes the row update to the storage engine and the storage engine decides not to update the row in its buffered page (determined by the update describing the same binary value of each field as is already in the buffer) the row will be in the inserted and deleted tables. These are two completely different occurrences, involving different queries.

    If SQL Server performs the same actions as in case with WHERE clause, and binary comparison of pages is faster than collation based comparison of strings - why do you advocate for the worst of these 2 options?

    It doesn't perform the same actions in the two cases; in one case the storage engine is invoked (but doesn't modify the page) and the non-updating update can cause a trigger to be fired; in the other case the query processor doesn't do update because the where clause excludes it, the storage engine is not invoked, and there is no non-updating update to cause a trigger to be invoked. This seems to me to be a big and clear difference.

    But probably I'm wasting my time. 🙁 Either your English comprehension is not up to understanding what I have just written or I'm wasting my time feeding the troll.

    Tom

  • L' Eomot Inversé (9/16/2012)


    ... the storage engine decides not to update the row

    Can you please provide a reference to a document saying that storage engine operates rows?

    So far every one I saw was describing page operations.

    _____________
    Code for TallyGenerator

  • Sergiy (9/16/2012)


    Filtering non-updating updates in the WHERE is the right thing to do, as I have said before.

    Still not clear why do you think so.

    If SQL Server performs the same actions as in case with WHERE clause, and binary comparison of pages is faster than collation based comparison of strings - why do you advocate for the worst of these 2 options?

    As was already explained, non-updating updates still require an exclusive lock. So are you really wondering why he would prefer the process that does not require exclusive locks for those rows than one that does require exclusive locks? Seems like an easy decision to me.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Sergiy (9/16/2012)


    L' Eomot Inversé (9/16/2012)


    ... the storage engine decides not to update the row

    Can you please provide a reference to a document saying that storage engine operates rows?

    So far every one I saw was describing page operations.

    Do you really imagine that the query processor's interface to the storage engine operates on pages as opposed to on rows or attributes?

    Anyway, you've already made in clear when responding to Gail that you aren't interested in looking into any references that are provided for you, so why do you expect I should waste my time providing them for you?

    Tom

  • L' Eomot Inversé (9/16/2012)


    Anyway, you've already made in clear when responding to Gail that you aren't interested in looking into any references that are provided for you, so why do you expect I should waste my time providing them for you?

    Hahahahahaha! Brilliant! :hehe: :hehe: :hehe:


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

Viewing 15 posts - 136 through 150 (of 159 total)

You must be logged in to reply to this topic. Login to reply