UPDATE when the values are the same

Question

UPDATE when the values are the same

Viewing 10 posts - 151 through 159 (of 159 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110206 More actions · Answer 1

L' Eomot Inversé (9/16/2012)
Do you really imagine that the query processor's interface to the storage engine operates on pages as opposed to on rows or attributes?
How does it matter what do I imagine?
Anyway, you've already made in clear when responding to Gail that you aren't interested in looking into any references that are provided for you, so why do you expect I should waste my time providing them for you?

Did you open that link yourself?

I read the book Gail referred to from start to finish.

And did not find a single reference in there relevant to this topic.

Waste of time.

You clearly don't consider posting personal attacks as wasting time, unlike posting any professional comments.

You made it clear that you lack professionalism and don't have anything to say on the matter.

Would you mind to spare us from your rubbish input?

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110206 More actions · Answer 2

SQL Kiwi (9/16/2012)
Sergiy, I should add that the fun in the example was in working out *why* that particular case gave the performance results it did. With a long string of a suitable collation, it is more expensive to do the collation-aware comparison than it is to decide not to perform the update based on a binary comparison in the storage engine. This was a point you refused to accept earlier, so I thought you might find it educational.

If you'd think about it you'd most probably find it aducational for yourself.

🙂

I figured out why you built the test this way and I have no argument about the point:

comparing binary content of pages is and will always be more effective than comparing strings row by row (usually, several different comparisons over the data in the same page).

I quickly added another sample test to prove the point:

UPDATE dbo.Test WITH (TABLOCKX)

SET Name = CASE WHEN Name = @Value THEN @Value ELSE Name end;

@Value is the string from you example.

On my machine this changed version of "Non-updating" UPDATE usually takes about twice as long as "Filtered" one.

But I must admit that results vary from run to run quite significantly. Once I've got "CASE Non-updating" to finish faster than simple "Non-updating".

But at the end of the day, as it follows from your example:

- the query submits affected rows and creates updated versions of the pages;

- storage engine compares old and new versions and desides if further action is required.

Some still may argue that it's rows, not newer versions of pages which are compared by the storage engine.

But so far I'm aware of only page-level activity performed by the storage engine.

_____________
Code for TallyGenerator

Paul White SSC Guru Points: 150467 More actions · Answer 3

Sergiy (9/16/2012)
Actually, you did. Trigger is executed after UPDATE. If SQL Server figures out that the values are the same and does not proceed with writing to the pages then trigger has no source for "inserted" table.

You misunderstood. If the reading side of an update plans produces no rows (i.e. they are filtered out by e.g. a WHERE clause predicate) then no rows arrive at the update operator in the plan, so there is nothing in the inserted or deleted tables.

I was talking not about overheads caused by a trigger. I was asking if SQL Server will do something different for a table with trigger in order to have "inserted" and "deleted" tables populated.

That is asking about the overheads caused by a trigger. Yes, SQL Server does something different for a table with a trigger: it versions the changed rows (non-updating or otherwise). I demonstrated this in my recent blog post http://bit.ly/ForwardGhost

Still not clear why do you think so. If SQL Server performs the same actions as in case with WHERE clause, and binary comparison of pages is faster than collation based comparison of strings - why do you advocate for the worst of these 2 options?

This is just the same point you made in one of your first replies to this thread (the Event table update). In general, avoiding a non-updating update is the better option. I provided my counter-example to show an interesting (and fun) exception.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Sergiy SSC Guru Points: 110206 More actions · Answer 4

SQL Kiwi (9/16/2012)
Sergiy (9/16/2012)
Actually, you did. Trigger is executed after UPDATE. If SQL Server figures out that the values are the same and does not proceed with writing to the pages then trigger has no source for "inserted" table.
You misunderstood. If the reading side of an update plans produces no rows (i.e. they are filtered out by e.g. a WHERE clause predicate) then no rows arrive at the update operator in the plan, so there is nothing in the inserted or deleted tables.

I don't think so.

Even if no rows are updated the tables are still created.

SQL Server still allocates pages for the tables, names all the columns, etc.

They have nothing in them - well, that means the source of the records for the tables contained no records.

But it does not mean the source of the "nothing" was not there.

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110206 More actions · Answer 5

I did a little experiment using string values of different length.

Here is the base script:

USE tempdb;

SET NOCOUNT ON;

GO

CREATE TABLE dbo.Test

(

Name char(200)

COLLATE Latin1_General_CI_AI NULL

--COLLATE Latin1_General_BIN NULL

);

GO

INSERT dbo.Test

SELECT TOP (123456) REPLICATE('X', 200)

FROM sys.columns AS c

CROSS JOIN sys.columns AS c2;

GO

DBCC SHOWCONTIG ('dbo.Test') WITH all_indexes, TABLERESULTS

DBCC FREEPROCCACHE;

CHECKPOINT;

GO

DECLARE @start datetime2 = SYSDATETIME(), @Value char(200);

SET @Value = REPLICATE('X', 200);

SET STATISTICS IO ON;

UPDATE dbo.Test WITH (TABLOCKX)

SET Name = @Value

WHERE NOT Name = @Value;

SET STATISTICS IO OFF;

SELECT test = 'Filtered', time_ms = DATEDIFF(MILLISECOND, @start, SYSDATETIME())

GO

DBCC FREEPROCCACHE;

CHECKPOINT;

GO

DECLARE @start datetime2 = SYSDATETIME(), @Value char(200);

SET @Value = REPLICATE('X', 200);

SET STATISTICS IO ON;

UPDATE dbo.Test WITH (TABLOCKX)

SET Name = Name;

SET STATISTICS IO OFF;

SELECT test = 'Non-updating', time_ms = DATEDIFF(MILLISECOND, @start, SYSDATETIME());

GO

DBCC FREEPROCCACHE;

CHECKPOINT;

GO

DECLARE @start datetime2 = SYSDATETIME(), @Value char(200);

SET @Value = REPLICATE('X', 200);

SET STATISTICS IO ON;

UPDATE dbo.Test WITH (TABLOCKX)

SET Name = CASE WHEN Name = @Value THEN @Value ELSE Name end;

SET STATISTICS IO OFF;

SELECT test = 'Non-updating CASE', time_ms = DATEDIFF(MILLISECOND, @start, SYSDATETIME());

GO

DBCC FREEPROCCACHE;

CHECKPOINT;

GO

DECLARE @start datetime2 = SYSDATETIME(), @Value char(200);

SET @Value = REPLICATE('Y', 200);

SET STATISTICS IO ON

UPDATE dbo.Test WITH (TABLOCKX)

SET Name = @Value;

SET STATISTICS IO OFF

SELECT test = 'Updating', time_ms = DATEDIFF(MILLISECOND, @start, SYSDATETIME());

GO

DROP TABLE dbo.Test;

Basically no difference comparing to yours one, Paul, except I used REPLICATE to easily and synchronously change from char(200) to char(50) or char(8000).

I recorded the times and calculated the rate of effectiveness of other methods comparing to "Filtered"

Here is the outcome

char(50) COLLATE Latin1_General_CI_AI

Filtered87

Non-updating CASE248285%

Non-updating114131%

Updating206237%

char(200) COLLATE Latin1_General_CI_AI

Filtered295

Non-updating CASE472160%

Non-updating13746%

Updating27593%

char(500) COLLATE Latin1_General_CI_AI

Filtered716

Non-updating CASE945132%

Non-updating19427%

Updating52774%

char(1000) COLLATE Latin1_General_CI_AI

Filtered1421

Non-updating CASE1702120%

Non-updating26719%

Updating101071%

char(8000) COLLATE Latin1_General_CI_AI

Filtered10997

Non-updating CASE12796116%

Non-updating132112%

Updating909183%

No surprise so far.

Longer strings cause longer comparison operations, so the winning rate for "Non-updating" grows.

Then I decided to minimuse the impact of collation rules by chosing the least expensive collation.

Here is the outcome:

char(50) COLLATE Latin1_General_BIN

Filtered13

Non-updating CASE1511162%

Non-updating113869%

Updating2031562%

char(200) COLLATE Latin1_General_BIN

Filtered17

Non-updating CASE1781047%

Non-updating138812%

Updating2521482%

char(500) COLLATE Latin1_General_BIN

Filtered31

Non-updating CASE240774%

Non-updating191616%

Updating5481768%

char(1000) COLLATE Latin1_General_BIN

Filtered44

Non-updating CASE322732%

Non-updating269611%

Updating12172766%

char(8000) COLLATE Latin1_General_BIN

Filtered264

Non-updating CASE1708647%

Non-updating1328503%

Updating107854085%

"Filtered" solution wins clearly.

With the winning rate which does not leave any doubt: overhead of opening/closing transaction, applying different kind of locks (actually we have the same table lock forced in all cases), etc. cannot bring such a difference.

So, what do you think SQL Server is doing during those 1.1 seconds which is the difference between "filtered" and "Non-updating"?

Interpretations are welcome.

_____________
Code for TallyGenerator

Paul White SSC Guru Points: 150467 More actions · Answer 6

Sergiy (9/16/2012)
I don't think so.
Even if no rows are updated the tables are still created.
SQL Server still allocates pages for the tables, names all the columns, etc.
They have nothing in them - well, that means the source of the records for the tables contained no records.
But it does not mean the source of the "nothing" was not there.

The inserted and deleted tables are not separate real tables, SQL Server does not "allocate pages, name all the columns, etc". The inserted and deleted virtual tables are a 'view' of version store records in SQL Server 2005 and later. Before SQL Server 2005, a backward scan of the transaction log records was used.

Without a trigger, no versions are generated. With a trigger, versions may be generated. The script below demonstrates this:

CREATE TABLE dbo.Test (col1 varchar(max) NULL);

GO

INSERT dbo.Test (col1) VALUES ('Hello');

GO

-- No trigger, nothing in the version store

BEGIN TRANSACTION;

UPDATE dbo.Test SET col1 = 'Hello';

SELECT * FROM sys.dm_tran_version_store AS dtvs;

COMMIT TRANSACTION;

GO

-- A trigger that does nothing

CREATE TRIGGER trg

ON dbo.Test AFTER UPDATE

AS RETURN;

GO

CHECKPOINT;

GO

-- Same non-updating update

BEGIN TRANSACTION;

UPDATE dbo.Test SET col1 = 'Hello';

-- Now we have two version store records

SELECT

decoded = CONVERT(char(5), SUBSTRING(dtvs.record_image_first_part, 12, 5)),

*

FROM sys.dm_tran_version_store AS dtvs;

COMMIT TRANSACTION;

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 7

Sergiy (9/16/2012)
I figured out why you built the test this way and I have no argument about the point:
comparing binary content of pages is and will always be more effective than comparing strings row by row (usually, several different comparisons over the data in the same page).

You're almost there, but the point is the binary comparisons in the storage engine are per row, not per page. It is a shame you are not in a position to run the debugger session yourself, because it shows the row-by-row processing very clearly:

- The update operator asks for a row of update data from the table scan operator.

- The storage engine performs the 'update' for this row (quotes because a write might not actually occur).

- The update operator asks for the next row of information from the table scan operator.

- The process repeats until no more rows are available from the table scan.

A 'binary comparison of pages' simply does not occur.

But at the end of the day, as it follows from your example:
- the query submits affected rows and creates updated versions of the pages;
- storage engine compares old and new versions and desides if further action is required.
Some still may argue that it's rows, not newer versions of pages which are compared by the storage engine.
But so far I'm aware of only page-level activity performed by the storage engine.

This is the basic misunderstanding - processing occurs row by row. This 'iterator model' is common knowledge and documented e.g. in Microsoft Press internals books, by people who wrote the SQL Server code.

It would be horribly inefficient for the storage engine to compare pages (not rows) in the general case - we would need to copy the whole page before every row update, perform the update, and then compare the whole new page with the whole old page. Update information is not guaranteed to arrive at the update operator in page order (it depends e.g. on query plan shape) so it could mean copying and comparing a page hundreds of times over, once for each updated row on the page. That would be madness. It is much more efficient not to copy the page at all, and simply compare the binary value passed in for update with the binary value already in the row - no expensive page copy & compare needed.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 8

Sergiy (9/16/2012)
Basically no difference comparing to yours one, Paul, except I used REPLICATE to easily and synchronously change from char(200) to char(50) or char(8000).

I chose a literal string rather than REPLICATE for complicated reasons, and to be fair to all the methods (see my post http://bit.ly/ComputeScalar if you are interested). To avoid debates about which method might evaluate REPLICATE more times than another, I used a literal string.

"Filtered" solution wins clearly.

Which is why I say filtering is generally the best option (except in cases like my carefully-constructed example).

With the winning rate which does not leave any doubt: overhead of opening/closing transaction, applying different kind of locks (actually we have the same table lock forced in all cases), etc. cannot bring such a difference. So, what do you think SQL Server is doing during those 1.1 seconds which is the difference between "filtered" and "Non-updating"? Interpretations are welcome.

It's all about how many rows are produced by the Table Scan. When filtered, no rows are produced, so no update processing happens. The main cost is evaluating 'WHERE NOT Name = @Value' 123,456 times. With a binary collation, this comparison is fast; with a more complex collation it can be quite slow (slower than not filtering at all, and just passing all the rows to the update operator for binary comparison, row by row, in the storage engine).

In your CASE test, all 123,456 rows are produced one at a time from the table scan and processed one at a time by the update operator. We now have the overhead of passing rows one at a time between the scan and update operators, and we have to compare the result of the CASE expression with the value already present in the row, for each row. This explains the difference in execution time.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150467 More actions · Answer 9

L' Eomot Inversé (9/16/2012)
If the query processor figures out that it is not going to update the row (this is decided on the basis of where clause and on clause) that row will not be in the inserted or deleted table. If the query passes the row update to the storage engine and the storage engine decides not to update the row in its buffered page (determined by the update describing the same binary value of each field as is already in the buffer) the row will be in the inserted and deleted tables. These are two completely different occurrences, involving different queries.

It doesn't perform the same actions in the two cases; in one case the storage engine is invoked (but doesn't modify the page) and the non-updating update can cause a trigger to be fired; in the other case the query processor doesn't do update because the where clause excludes it, the storage engine is not invoked, and there is no non-updating update to cause a trigger to be invoked. This seems to me to be a big and clear difference.

That matches my understanding, Tom.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 10

For those that don't have debugger skills and want to look at the internals of this, I wrote a blog post that shows how to dig into the internals using Extended Events and materialized callstacks for each of the events being fired when the UPDATE tests are run.

http://sqlskills.com/blogs/jonathan/post/Looking-at-Non-Updating-Updates-with-Extended-Events.aspx

The materialized callstacks validate everything that Paul has been saying in this thread and refutes definitively the argument that SQL Server operates on pages not rows. It also validates that row versions are generated even when nothing changes if a trigger exists.