Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

More RBAR and "Tuning" UPDATEs Expand / Collapse
Author
Message
Posted Friday, March 14, 2008 7:25 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, July 25, 2014 5:21 AM
Points: 1,205, Visits: 923
:P:P Excellent article Jeff, I have a question for you though. Is there a FROM clause for the DELETE statement as well and how can I code that?

Manie Verster
Developer
Johannesburg
South Africa

I can do all things through Christ who strengthens me. - Holy Bible
I am a man of fixed and unbending principles, the first of which is to be flexible at all times. - Everett Mckinley Dirkson (Well, I am trying. - Manie Verster)
Post #469376
Posted Friday, March 14, 2008 7:31 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 6:48 AM
Points: 10,295, Visits: 13,276
Hugo Kornelis (3/14/2008)

And there are other folks who don't know that what you call Oracle syntax is actually ANSI-standard syntax, and that the optional FROM clause in SQL Server is non-standard. In fact, even Access (also microsoft) has a different syntax for the same thing! So beware - if you choose to use this, you can't port your code.

What's worse, it is document to produce nondeterministic results in some scenarios. So beware - if you choose to use this, test very carefully (and due to the nature of nondeterministic results combined with Murphy's Law, you'll probably find that it always goes right in test, but fails in production)

Well done. I got the same performance when I ran the test (we obviously have similar computing power under our desks :)). But to put things in perspective - this is NOT a result of replacing UPDATE with a subquery with UDPATE FROM - as I can achieve the eexact same performance with an ANSI-standard UPDATE with subquery by using a CTE

PRINT REPLICATE('=',70);
PRINT 'Correlated sub query using CTE method...';
SET STATISTICS TIME ON;
WITH LastDay AS
(SELECT SalesPersonID, MAX(OrderDate) AS MaxDate
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
GROUP BY SalesPersonID),
TotalLastDay AS
(SELECT so.SalesPersonID, SUM(SubTotal) AS Total
FROM Sales.SalesOrderHeader AS so
INNER JOIN LastDay AS l
ON l.SalesPersonID = so.SalesPersonID
AND l.MaxDate = so.OrderDate
GROUP BY so.SalesPersonID)
UPDATE Sales.SPTest
SET SalesYTD = SalesYTD +
(SELECT Total
FROM TotalLastDay AS t
WHERE Sales.SPTest.SalesPersonID = t.SalesPersonID);
SET STATISTICS TIME OFF;
GO

I think this code is similar enough to your code that I don't need to explain it. But do let me know if you don't understand it (I'll activate email notification for this thread).

We also saw how easy it was to rewrite one piece of code to get it to run 19 times faster that it originally did and we didn't touch the server or indexes.


Which is a shame, since I think that an index on (SalesPersonID, OrderDate) with SubTotal as included column would have speeded up things a lot more. In fact, I just tested this and this brings execution time for "your" joined method down to 6 ms, "my" subquery with CTE to 7 ms, and BOL's original method down to a mere 2 ms. How's that, eh?

And that's another lesson learned: you can spend a few hours rewriting bad performing code, or you can spend five minutes adding an index, and often, the latter solution works best (but do take care not to add too many indexes, since they will slow any data modification!)



Hugo,

I'm not going to disagree with you on any of the points you make, especially since you obviously know more than I do about the subject. I also appreciate the fact that you not only raised the points, but provided alternate solutions instead of just ripping the article like I have seen many times.

I am going to make a couple of points though. The UPDATE FROM may be SQL Server specific, but from my perspective that's fine since I am on a SQL Server site and am looking for help with SQL Server. To not use it becasue it could be deprecated, is in my opinion, not a reason to refrain from using it. Any SQL Server specific feature could be deprecated at some point. Could you post the where the documentation of it producing indeterminate results is? I'd be interested in reading that.

Your CTE example is excellent and may be the best way to solve the problem in SQL 2005, but there are an awful lot of SQL 200 installs out there still, and the CTE won't work there. One thing Jeff usually does is post a solution that will work in 2000, although he is getting more and more into 2005 solutions now. OF course your earlier points about indeterminate results in the UPDATE FROM certainly need to be considered.

You mention adding and index and the performance gained from it. You do mention, in a parenthetical, being careful not to add too many indexes, but I think you could have more clearly stated that you need to really study out the afffects of adding an index and that will take a lot more than 5 minutes. There are many beginners that are on this site regularly who may read that and start adding indexes whenever they encounter performance issues with a single query and then start having problems elsewhere.

I also wanted to comment on this statement:


An important observation that you did not make is that the queries are not 100% equivalent. If a salesperson never made any sales, the BOL example will find no rows, calculate NULL as the result of the SUM() function in the subquery, add that to SalesYTD (resulting in NULL), and attempt to store that. This will fail, since SalesYTD is defined as NOT NULL. (The BOL example should have used COALESCE to prevent this).
Your query on the other hand has no rows for thie salesperson in the derived table, so the join will remove that salesperson from the set of rows to be updated, and his or her SalesYTD will not change.

You might think that this makes your query superior, and in this case it indeed is - but what if we attempted to update a column SalesLastDay? The subquery method (with COALESCE added) will nicely set SalesLastDay to 0 if a salesperson has no sales. Your method leaves it unchanged. So if I entered a sale incorrectly yesterday (giving a sale to an inactive salesperson) and correct it today, the SalesLastDay for the inactive salesperson will not be cleared.

There are situation where you want a row to be excluded from the UPDATE if there is no match. There are other situations where you don't want that. This behaviour difference is a common pitfall when people try to move between the two UPDATE syntax variations, and it's a shame that your article fails to point that out


I think you made a great point here and I missed it as well. It is very important that you take this into account when doing any kind of aggregation based on dates, products, etc... when there are chances of there being missing data.

I have to admit that when I first started working with SQL Server I was more likely to use correlated subqueries, but in the last couple of years I have moved to derived tables (not necessarily because of performance), and now to CTE's which I like best, mainly for readability.

Thanks to Jeff and Hugo for sharing their expertise and making me think before I code!







Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #469381
Posted Friday, March 14, 2008 7:36 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 9:50 PM
Points: 7,112, Visits: 15,486
Manie Verster (3/14/2008)
:P:P Excellent article Jeff, I have a question for you though. Is there a FROM clause for the DELETE statement as well and how can I code that?


Manie - there IS a FROM in the Delete statement. Just like update, you can only delete from one table at a time, but that doesn't mean you can't use a join to arrive at the data set to delete.

Syntax looks like this:

DELETE Mytable  --this is the table where records are being deleted
FROM
MyTable
inner join MyTable2 on mytable.id=mytable2.ID
WHERE
.....



----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #469386
Posted Friday, March 14, 2008 7:45 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, July 30, 2010 11:27 AM
Points: 435, Visits: 1,403
RE CTEs. I love them and use them, but I had gotten the impression that they too were a MS nonstandard extension. Are they part of the SQL standard and are they available in Oracle. (Same syntax?)
Post #469395
Posted Friday, March 14, 2008 7:50 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 9:50 PM
Points: 7,112, Visits: 15,486

And there are other folks who don't know that what you call Oracle syntax is actually ANSI-standard syntax, and that the optional FROM clause in SQL Server is non-standard. In fact, even Access (also microsoft) has a different syntax for the same thing! So beware - if you choose to use this, you can't port your code.


Interesting read. I have to ask though - I have to see any combinations of DBMS' where a straight port of code is simply a matter of cut and paste. Is that really a valid concern?

Yes it does work some of the time, but once you get into "real-life code", every one of the major's seems to have their own intricacies, so you're going to HAVE to re-write some things. Even using "standard code", just because it runs doesn't mean it runs well.

I keep hearing that "what if your RDBMS changed tomorrow" line being thrown around, and just have to take a skeptical approach to that. I've seen ports from one to the other, but they always involve a LOT of re-writing, reoptimizing. If all of them do the SAME thing, using the same code, same optimization tricks, etc... - then they will be no reason for them to be separate products. The cheapest one to license will be the one who takes over, and the others will just fall by the wayside.

Until they act the same, having the same lines of code, well - is worthless IMO. My first priority is efficient and correct, not portable.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #469399
Posted Friday, March 14, 2008 7:54 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:02 PM
Points: 35,397, Visits: 31,955
Zarko Jovanovic (3/14/2008)
what about this:

update table1
set table1.column1 = #table.column1, table1.column2 = #table.column2
from table1, #table
where table1.pk=#table.pk


That's fine because the table being updated is in the FROM clause and is properly joined... although a lot of people prefer the "new" ANSI joins like INNER JOIN.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #469401
Posted Friday, March 14, 2008 7:55 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 1:18 PM
Points: 6,056, Visits: 8,342
Jack Corbett (3/14/2008)
[quote]Hugo Kornelis (3/14/2008)

Could you post the where the documentation of it producing indeterminate results is? I'd be interested in reading that.


Sure. It's in BOL, or the online equivalent on MSDN: http://msdn2.microsoft.com/en-us/library/ms177523.aspx, scroll down to the heading "Using UPDATE with the FROM Clause". You'll see this:

The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic.

(followed by an example)

You mention adding and index and the performance gained from it. You do mention, in a parenthetical, being careful not to add too many indexes, but I think you could have more clearly stated that you need to really study out the afffects of adding an index and that will take a lot more than 5 minutes. There are many beginners that are on this site regularly who may read that and start adding indexes whenever they encounter performance issues with a single query and then start having problems elsewhere.


Fair enough. Adding indexes by trial and error is always a bad idea, you really have to understand what they are, how they work, and how they impact every aspect of SQL Server.

Thanks to Jeff and Hugo for sharing their expertise and making me think before I code!


And thanks to you for your welll though out reaction!

(edit - changed the BOL address to a proper hyperlink)



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #469402
Posted Friday, March 14, 2008 7:55 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:02 PM
Points: 35,397, Visits: 31,955
David McKinney (3/14/2008)
Very good article.

I just think RBAR would catch on more, if it were pronounced AR-BAR. (Sorry ;) )


Maybe "our Bar" would work even better :D


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #469404
Posted Friday, March 14, 2008 8:01 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:02 PM
Points: 35,397, Visits: 31,955
Nick M. (3/14/2008)
The links to UPDATE are to the UPDATE() function, not the UPDATE statement syntax, which is 1 line above in the index. :)


Thanks for pointing that out, Nick. I sure am sorry about that and I'll resubmit the article with the correction...

The correct link is...
http://msdn2.microsoft.com/en-us/library/ms177523(SQL.100).aspx

I apoligize for the confusion.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #469407
Posted Friday, March 14, 2008 8:04 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 1:18 PM
Points: 6,056, Visits: 8,342
Jim Russell (3/14/2008)
RE CTEs. I love them and use them, but I had gotten the impression that they too were a MS nonstandard extension. Are they part of the SQL standard and are they available in Oracle. (Same syntax?)


They are ANSI standard. They were not in SQL-92, but they are in SQL:2003, described as part of "7.13 ". The standard does seam to have a different way to express recursive CTEs though.

I have no idea if Oracle implements them.



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Post #469410
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse