Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

More RBAR and "Tuning" UPDATEs Expand / Collapse
Author
Message
Posted Thursday, March 13, 2008 9:26 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 10:15 PM
Points: 35,399, Visits: 31,959
Comments posted to this topic are about the item More RBAR and "Tuning" UPDATEs

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #469155
Posted Friday, March 14, 2008 12:52 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, March 30, 2012 2:35 AM
Points: 460, Visits: 190
what about this:

update table1
set table1.column1 = #table.column1, table1.column2 = #table.column2
from table1, #table
where table1.pk=#table.pk
Post #469207
Posted Friday, March 14, 2008 1:13 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, October 8, 2014 8:55 AM
Points: 57, Visits: 125
The links to UPDATE are to the UPDATE() function, not the UPDATE statement syntax, which is 1 line above in the index. :)
Post #469212
Posted Friday, March 14, 2008 2:01 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Yesterday @ 1:59 AM
Points: 648, Visits: 1,877
Very good article.

I just think RBAR would catch on more, if it were pronounced AR-BAR. (Sorry ;) )
Post #469223
Posted Friday, March 14, 2008 2:26 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 8:09 AM
Points: 6,735, Visits: 8,495
Indeed 70% or our tuning time is actualy sql-education.
If our dev's would only apply the basic rules and also try to write their
code at least in two different ways,
they would soon discover where performance is to be found.


Once again, thank you for sharing the knowledge


Johan


Don't drive faster than your guardian angel can fly ...
but keeping both feet on the ground won't get you anywhere

- How to post Performance Problems
- How to post data/code to get the best help


- How to prevent a sore throat after hours of presenting ppt ?


"press F1 for solution", "press shift+F1 for urgent solution"


Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me
Post #469228
Posted Friday, March 14, 2008 3:10 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, March 3, 2010 5:43 AM
Points: 305, Visits: 134
That is simply one of the best articles on RBAR I have ever seen!

Excellent!! :)
Post #469235
Posted Friday, March 14, 2008 5:02 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 1:18 PM
Points: 6,056, Visits: 8,342
Hi all,

I'm sorry for having to be the party pooper, but there is just too much wrong with this article to let it slip.

many people don't know how to do joins in the UPDATE statement and they end up using correlated sub-queries, instead. Of course, that's a major form of RBAR that should be avoided


A sub-query is not an automatic RBAR. That's the conceptual explanation of subqueries, but in reality, the optimizer will often internally convert the query to an equivalent join. It is clear that this did not happen in the example in your article, but please don't start thinking that all subqueries are bad.

A lot of folks don't know that the UPDATE statement in SQL Server has a FROM clause, so they'll do it like folks do it in, say, Oracle. They'll use a correlated sub-query which is kind of a derivative of the "Direct Data" UPDATE


And there are other folks who don't know that what you call Oracle syntax is actually ANSI-standard syntax, and that the optional FROM clause in SQL Server is non-standard. In fact, even Access (also microsoft) has a different syntax for the same thing! So beware - if you choose to use this, you can't port your code.

What's worse, it is document to produce nondeterministic results in some scenarios. So beware - if you choose to use this, test very carefully (and due to the nature of nondeterministic results combined with Murphy's Law, you'll probably find that it always goes right in test, but fails in production)

I have blogged about how UPDATE FROM is no longer needed once we have MERGE (in SQL Server 2008, that is), and even opened a suggestion on Connect to deprecate UPDATE FROM. To my surprise, the Connect suggestion was not turned down, but instead, Microsoft confirmed that they are looking at the future of UPDATE FROM. So beware - if you choose to use this, you might find yourself forced to rewrite a few years from now.

Notice that "sometable" in the "UPDATE sometable" and the "FROM sometable st1" are the SAME table. Some other folks noticed that, too. So, they came up with this method... it's actually undocumented in BOL, but it works very well and seems to make it less confusing for some developers. Notice the use of the table alias instead of the table name in the UPDATE?


I was surprised to see that this second version is indeed not doocumented. I have used this an I know it works.

However, the first version you posted should be forbidden. Once you give "sometable" the alias "st1", you should no longer be able to refer to "sometable" without getting an error (and if you try this in a SELECT query, you'll see that it does indeed work this way). So it's acutally quite weird that you can still refer to the table name in the UPDATE. BOL forbids the use of an alias in case of a self-join, but I think that aliases should always be forbidden for the target of the UPDATE - or the alias should be specified as the target (as in your undocumented second example). Well, I guess this kind of confusion on weird rules is just one more reason to deprecate this proprietary feature.

The "Death by SQL" Update


This one is just as undocumented as the one before - BOL says that you can use a FROM clause in an UPDATE statement, but there is no mention at all whether the target table should or should not be included in it. Yet another reason to deprecate... :P

However, I can assure you that it is completely equivalent to an UPDATE FROM with the target table in the FROM clause:

 UPDATE sometable
SET somecolumn = st2.somecolumn
FROM sometable2 st2 --LOOK!!! FROM ID MISSING OBJECT OF THE UPDATE!!!
WHERE sometable.somecolumn = st2.somecolumn
AND some condition exists;
-- This is completely equivalent (check execution plans!) to
UPDATE sometable
SET somecolumn = st2.somecolumn
FROM sometable2 st2
INNER JOIN sometable
ON sometable.somecolumn = st2.somecolumn
WHERE sometable.somecolumn = st2.somecolumn
AND some condition exists;

It looks so harmless that, like my DBA and I, you will overlook it as the problem for hours.


You obviously have been bitten by this one day, but I suspect the real cause was something else. Maybe it was harder to find because of this (and I personally do indeed always include the target of the update in the FROM clause if I use UPDATE FROM at all), but it was not caused specifically by this syntax.


Okay, with the basics out of the way, lets go down to the actual code.

we're first going to make a copy of the target table of the UPDATE. Here's the code to do that.


Since you used SELECT INTO, you are making a copy with no indexes at all. Not even a primary key or a clustered index. I have added the below to my test runs (note that it didn't change the results, but it's still bad practice to use tables with no keys).

ALTER TABLE Sales.SPTest
ADD CONSTRAINT PK_SPTest PRIMARY KEY(SalesPersonID);

The Microsoft Example Code


I know it's not your fault (you just copied from BOL), but I found this example to be extremely weird.

* First, SalesYTD should not be stored in the table. It's a derived attribute, so it should be in a view.
* Second, if there are indeed good reasons (performance?) to persist SalesYTD, a trigger should be used to keep it current. Not a query that has to be run daily in order to keep the information correct.
* Third, if both the previous argumentts are disregarded, the query presented here is not doing what it should do. Imagine having a great day with lots of sales, and then calling in sick the next day. Since you won't make any sales, the last OrderDate recorded for you will be yesterday, so those huge sale numbers get added into SalesYTD once more. Great way to get closer to your yearly goals!

Okay, I'll skip over the part where you explain the reasoning, right to the finished code:

... and, we're done... notice that the code is a wee bit (a lot, actually) longer than the original code


An important observation that you did not make is that the queries are not 100% equivalent. If a salesperson never made any sales, the BOL example will find no rows, calculate NULL as the result of the SUM() function in the subquery, add that to SalesYTD (resulting in NULL), and attempt to store that. This will fail, since SalesYTD is defined as NOT NULL. (The BOL example should have used COALESCE to prevent this).
Your query on the other hand has no rows for thie salesperson in the derived table, so the join will remove that salesperson from the set of rows to be updated, and his or her SalesYTD will not change.

You might think that this makes your query superior, and in this case it indeed is - but what if we attempted to update a column SalesLastDay? The subquery method (with COALESCE added) will nicely set SalesLastDay to 0 if a salesperson has no sales. Your method leaves it unchanged. So if I entered a sale incorrectly yesterday (giving a sale to an inactive salesperson) and correct it today, the SalesLastDay for the inactive salesperson will not be cleared.

There are situation where you want a row to be excluded from the UPDATE if there is no match. There are other situations where you don't want that. This behaviour difference is a common pitfall when people try to move between the two UPDATE syntax variations, and it's a shame that your article fails to point that out.

When I run that code, it only takes 15 milliseconds to update what used to take 313 milliseconds...


Well done. I got the same performance when I ran the test (we obviously have similar computing power under our desks :)). But to put things in perspective - this is NOT a result of replacing UPDATE with a subquery with UDPATE FROM - as I can achieve the eexact same performance with an ANSI-standard UPDATE with subquery by using a CTE

PRINT REPLICATE('=',70);
PRINT 'Correlated sub query using CTE method...';
SET STATISTICS TIME ON;
WITH LastDay AS
(SELECT SalesPersonID, MAX(OrderDate) AS MaxDate
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
GROUP BY SalesPersonID),
TotalLastDay AS
(SELECT so.SalesPersonID, SUM(SubTotal) AS Total
FROM Sales.SalesOrderHeader AS so
INNER JOIN LastDay AS l
ON l.SalesPersonID = so.SalesPersonID
AND l.MaxDate = so.OrderDate
GROUP BY so.SalesPersonID)
UPDATE Sales.SPTest
SET SalesYTD = SalesYTD +
(SELECT Total
FROM TotalLastDay AS t
WHERE Sales.SPTest.SalesPersonID = t.SalesPersonID);
SET STATISTICS TIME OFF;
GO

I think this code is similar enough to your code that I don't need to explain it. But do let me know if you don't understand it (I'll activate email notification for this thread).

For the Microsoft example, it would take 0.313/17*1000000 milliseconds or 18.4 seconds to update a million rows. Not bad... until you realize that the performance enabled version does it in 0.015/17*1000000 milliseconds or only 0.882 seconds.


I currently can't justify spending the time to really test this, but I hope someone will. I think that, once you hit this amount of rows, the optimizer will consider more possibilities so there is a better chance that it creates a faster plan for the currently slower version. Remember that the optimizer is cost-based - so for example, if a query will probably run in 300 milliseconds and the next step in the optimization process will increase the compilation time by 500 milliseconds, it won't bother.

We also saw how easy it was to rewrite one piece of code to get it to run 19 times faster that it originally did and we didn't touch the server or indexes.


Which is a shame, since I think that an index on (SalesPersonID, OrderDate) with SubTotal as included column would have speeded up things a lot more. In fact, I just tested this and this brings execution time for "your" joined method down to 6 ms, "my" subquery with CTE to 7 ms, and BOL's original method down to a mere 2 ms. How's that, eh?

And that's another lesson learned: you can spend a few hours rewriting bad performing code, or you can spend five minutes adding an index, and often, the latter solution works best (but do take care not to add too many indexes, since they will slow any data modification!)

Please note that this does not imply that you should not rewrite bad code. I agree with the comment earlier in this thread that one should always try to write at least two completely different queries for the same task and compare performance. But I do want to point out that "Tune the Code... that's where the performance is!" tells only half of the story.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #469285
Posted Friday, March 14, 2008 5:24 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, July 30, 2010 11:27 AM
Points: 435, Visits: 1,403
Excellent! Both the article (thanks Jeff) and the comments (thanks Hugo.)
This is the beauty of public forums; none of us know more than all of us!
Post #469293
Posted Friday, March 14, 2008 5:49 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Wednesday, August 10, 2011 12:13 PM
Points: 307, Visits: 565
The timing of this article was perfect for me.

I'm in the process of evaluating some stored procedures that are poor performers (which is causing a host of other problems) and I had narrowed it down to some update queries that were architecturally identical to the 'death by SQL' example (intuitively, they just didn't look right). This article reinforces what I believed and shows me why.

Like you say, it is data dependent, because in my test environment (small data set), I couldn't see any performance difference when comparing to the same queries rewritten with the 'target' table in the 'FROM' (which is how I would have done it if I had written it). The execution plans didn't reveal much, if any, difference. In Production, the dataset is huge and the queries involve transactions and linked servers, which are probably compounding the problem (from a stand point of execution time).

Like I said, this is not my code, so I have to 'prove' to the owner that there is a better way to write the queries (a sensitive issue) so as to improve the performance, and now I have some reinforcement to do that.

Once I get this code changed, I'll report back a before and after result (but don't hold your breath...it will take awhile where I work).

Thanks Jeff!


If it was easy, everybody would be doing it!;)
Post #469307
Posted Friday, March 14, 2008 6:59 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Monday, October 27, 2014 8:03 AM
Points: 3,518, Visits: 1,521
I always know that I'm going to learn something when Jeff posts an article. It looks like I'm going to have to look closer when I am coding in a WHERE IN.
Post #469351
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse