A Hazard of Using the SQL Update Statement

Question

A Hazard of Using the SQL Update Statement

Dwain Camps

SSC Guru

Points: 86978
More actions
September 8, 2013 at 9:27 pm

#297572

Comments posted to this topic are about the item A Hazard of Using the SQL Update Statement
My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 57 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 1

This is a great article in that it makes people, especially newbies who may have not studied the nuances of the T-SQL UPDATE statement to its fullest depth, aware of what most people would consider to be a fault.

I, however, do not consider the fact that it won’t give you an error for an attempted multiple-update on a single row a fault. Nay. I consider it to be a feature! I actually hate Oracle for trying to be so dutiful as to thwart my attempts to get something done by producing errors for things that I actually want to do.

I successfully used this UPDATE “feature” to build CSV columns in a high performance, set-based manner in SQL Server 7 before the advent of UDFs or FOR XML PATH to do the same. I’m not sure what I’d use the “feature” for now but, if it should ever come up, I’d get pretty well bent out of shape if I got an error because some product language programmer decided that it wasn’t ever supposed to be done.

To wit, if you make something idiot proof, only idiots will use it. 😉

As you say, "Let the controversy begin." 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

natalie.ignatieva SSChasing Mays Points: 632 More actions · Answer 2

Jeff Moden (9/8/2013)
I, however, do not consider the fact that it won’t give you an error for an attempted multiple-update on a single row a fault. Nay. I consider it to be a feature! I actually hate Oracle for trying to be so dutiful as to thwart my attempts to get something done by producing errors for things that I actually want to do.

I totally agree with you about that it's a feature and it's not a hazard.

ChrisM@Work SSC Guru Points: 186126 More actions · Answer 3

natalie.ignatieva (9/9/2013)
Jeff Moden (9/8/2013)
I, however, do not consider the fact that it won’t give you an error for an attempted multiple-update on a single row a fault. Nay. I consider it to be a feature! I actually hate Oracle for trying to be so dutiful as to thwart my attempts to get something done by producing errors for things that I actually want to do.
I totally agree with you about that it's a feature and it's not a hazard.

Prior to the introduction of MERGE I'd have agreed with this - we learned to ensure that multiple source rows for a target row had the same source value. Now though? It's inconsistent - and changing the behaviour of UPDATE to match the strictness of MERGE would break too much legacy code.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

Jeff Moden (9/8/2013)
This is a great article in that it makes people, especially newbies who may have not studied the nuances of the T-SQL UPDATE statement to its fullest depth, aware of what most people would consider to be a fault.
I, however, do not consider the fact that it won’t give you an error for an attempted multiple-update on a single row a fault. Nay. I consider it to be a feature! I actually hate Oracle for trying to be so dutiful as to thwart my attempts to get something done by producing errors for things that I actually want to do.
I successfully used this UPDATE “feature” to build CSV columns in a high performance, set-based manner in SQL Server 7 before the advent of UDFs or FOR XML PATH to do the same. I’m not sure what I’d use the “feature” for now but, if it should ever come up, I’d get pretty well bent out of shape if I got an error because some product language programmer decided that it wasn’t ever supposed to be done.
To wit, if you make something idiot proof, only idiots will use it. 😉
As you say, "Let the controversy begin." 😀

Actually I wasn't really suggesting that this is a bug (or a feature) that should be removed. Maybe a more clear statement of the hazard is in order:

- It is hazardous to allow UPDATE to perform its action on multiple rows where the results could be ambiguous, because even though it may allow the UPDATE to run, it might not give you the results that you expect.

No so with MERGE, which is pretty clear and doesn't allow it. Should it? I'll leave that for the intrepid MS SQL designers to tell us. But I think that newbie developers could get trapped by its (UPDATE's) behaviour, not understanding how it worked.

I am happy to be a bit controversial, even though in this case I'm not sure I really would take sides, despite what I said about the MERGE statement error message. On the surface, it would seem less ambiguity would be better.

I would be curious exactly to hear what you did to achieve your high-performance CSV columns (without being judgmental of course regarding data normalization) and whether there's any benefit to having it around now. Aside from the legacy code of course.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

paul.knibbs SSCoach Points: 15320 More actions · Answer 5

natalie.ignatieva (9/9/2013)
I totally agree with you about that it's a feature and it's not a hazard.

I would say it's entirely possible for it to be both. Yes, it's a feature, and quite possibly a useful feature in some circumstances, but it's also a potential hazard waiting to catch the unwary. I fortunately generally work with data where it's quite easy to guarantee the uniqueness of each row, but not everyone has that luxury!

Paul White SSC Guru Points: 150467 More actions · Answer 6

dwain.c - Article body
This is interesting. We updated the Value column of our #Test1 table twice for each row. Or did we? [...(later)...] In reality we find that somehow the SQL Server Query Optimizer was smart enough to decide that it only needed to update each row in our #Test1 table once. How it chose the row to use to make that update will ultimately remain its own dirty little secret.

There's a Stream Aggregate in the query plan. It applies an ANY() aggregate to the Value column, grouped by ID. Therefore, Only one update per row is applied at the Clustered Index Update iterator. How the ANY() aggregate chooses a Value per ID group is undocumented, and therefore undefined.

dwain.c - Article body
Since we always hear that SQL does not guarantee row ordering without an explicit ORDER BY clause, we assume that the same case is true here.

Indeed. ORDER BY only guarantees the final (presentation) order anyway, it says nothing about row order between query plan iterators, and nothing should be inferred.

dwain.c - Article body
In this particularly contrived example, there’s no way to ensure the distinctness of the rows you’re applying as the source to the target table for the update [...] Why couldn’t our UPDATE statement throw a nice clear warning like that?

Well, you could write the query using SQL standard syntax:

UPDATE #Test1

SET Value =

(

SELECT b.Value

FROM #Test2 AS b

WHERE b.ID = #Test1.ID

)

The query plans contains the same Stream Aggregate, but also an Assert to check the subquery returns one row per iteration. An error is thrown when multiple Values are encountered.

dwain.c - Article body
Is it a good thing to have a piece of code like this running in Production, where the results of the update may be unpredictable?

Probably not, unless non-deterministic results are intended. A non-deterministic update is almost always not the programmer's intention in my experience.

A 2008 Connect suggestion by Hugo Kornelis to deprecate the UPDATE FROM and DELETE FROM T-SQL extension syntax generated some good discussion, but overall it is one of the most down-voted I have encountered:

http://connect.microsoft.com/SQLServer/feedback/details/332437

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Dwain Camps SSC Guru Points: 86978 More actions · Answer 7

Excellent feedback Paul! Absolutely nothing in it I could argue with.

Didn't know about the SQL Connect article (quite interesting) but I did know the syntax wasn't ANSI standard, even though I tend to use it a lot. Sort of glad it got voted down.

I will say I am honored that you took the time to look in on this and keep me honest.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

TomD-982312 SSC-Addicted Points: 495 More actions · Answer 8

Good article and not that good example.

----------------------------------------
I miss SQL Server so much. Now I'm working on Oracle and hate it

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 9

ChrisM@Work (9/9/2013)
Prior to the introduction of MERGE I'd have agreed with this - we learned to ensure that multiple source rows for a target row had the same source value. Now though? It's inconsistent - and changing the behaviour of UPDATE to match the strictness of MERGE would break too much legacy code.

Are you aware of the problems that they've had with MERGE? Some are "problems" with the way people implement it, just like the "problems" with UPDATE.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChrisM@Work SSC Guru Points: 186126 More actions · Answer 10

Jeff Moden (9/9/2013)
ChrisM@Work (9/9/2013)
Prior to the introduction of MERGE I'd have agreed with this - we learned to ensure that multiple source rows for a target row had the same source value. Now though? It's inconsistent - and changing the behaviour of UPDATE to match the strictness of MERGE would break too much legacy code.
Are you aware of the problems that they've had with MERGE? Some are "problems" with the way people implement it, just like the "problems" with UPDATE.

Oh yes - Hugo K's excellent talk in Nottingham earlier this year is still fresh in my mind, as is Dwain's article - and bugs too. My point is that they are inconsistent, not that MERGE is any better.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Roger L Reid SSCrazy Points: 2004 More actions · Answer 11

No reason for controversy except the inflammatory title (and if publicity is your goal, then even that's well done).

Your point is that the generalist programmer often doesn't understand working in sets; and that (beyond the known issue of writing inelegant code) fact can cause them to get incorrect results. Is that about it? No controversy there AFAIC.

Roger L Reid

Miles Neale SSChampion Points: 13147 More actions · Answer 12

Roger L Reid (9/9/2013)
No reason for controversy except the inflammatory title (and if publicity is your goal, then even that's well done).
Your point is that the generalist programmer often doesn't understand working in sets; and that (beyond the known issue of writing inelegant code) fact can cause them to get incorrect results. Is that about it? No controversy there AFAIC.

+1 And I appreciate the information. Although I am not sure that knowing that Junior Developers and DBA's might not understand what is really going on. Often some some who have been around a few years could have the same problem.

Not all gray hairs are Dinosaurs!

Max-146500 SSCarpal Tunnel Points: 4691 More actions · Answer 13

Good article.

Paul White (9/9/2013)
dwain.c - Article body
Since we always hear that SQL does not guarantee row ordering without an explicit ORDER BY clause, we assume that the same case is true here.
Indeed. ORDER BY only guarantees the final (presentation) order anyway, it says nothing about row order between query plan iterators, and nothing should be inferred.

I was interested to verify the behaviour based on the PK ordering, and was relieved to see that the pk order is still used to establish the first matching row. This can be seen in the results generated by modifying the original temporary table #Test2 create statement:

CREATE TABLE #Test2 (

ID int,

Row_No int,

Value varchar(10),

PRIMARY KEY (ID, Row_No DESC) );

In the case of explicitly specifying the relevance and granularity of which #Test2.Value row to use, I might take the approach of using a derived table and filtering the partitioned resultset:

UPDATEa

SETa.Value = b.Value

FROM#Test1 a

INNER JOIN(SELECTID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Value DESC) AS Row_Order, Value

FROM#Test2 ) b

ONa.ID = b.ID

ANDb.Row_Order = 1;

Max

James_DBA Default port Points: 1422 More actions · Answer 14

Good article Dwain.

IMHO, it more illustrates how important it is for the DBA to understand better about SET based logic. I'm a huge fan of Itzik Ben-Gan's book "Inside Microsoft SQL Server 2008 T-SQL Querying". The first few chapters alone describe how the query is constructed internally by SQL and how it's important to learn to use SET based logic.

The first thing I thought of when looking at the code example is how the join was only part of the natural key used in table #test2. SET based programming immediately makes me think that both the ID and Row_No should be used, after all the point of that key in that table is to guarantee a unique row. Which if you think of including the Row_No field it immediately makes you eliminate the potential for duplicates (not to say this is the proper or only answer, as demonstrated by others earlier).

I would've immediately triggered in my mind which Row_No do I want to use? Assuming it's Row_No = 1, I would then re-write the update statement as this:

UPDATE a

SET Value = b.Value

FROM #Test1 a

JOIN #Test2 b ON a.ID = b.ID AND b.Row_No = 1;

~ Without obstacles, you cannot progress ~
http://sqln.blogspot.com/