Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Performance of Joins over Updates Expand / Collapse
Author
Message
Posted Friday, September 07, 2012 7:40 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: 2 days ago @ 2:32 PM
Points: 293, Visits: 500
I was wondering if anyone knows which is faster, to build data for my reporting table..

To write new data using three outer joins or to use one outer join, and get most of the data, then use three updates to load the other three columns?

Granted there is not a lot of rows being loaded.. about 500K each night, 70MB of data. The linked tables are not large.. except for the material table that has several mil rows.
Post #1355980
Posted Friday, September 07, 2012 8:11 AM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 12:17 PM
Points: 3,081, Visits: 11,231
You haven't provided enough information for anyone to begin to answer that question.

You should just try each way to see which is faster.



Post #1355999
Posted Friday, September 07, 2012 8:29 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: 2 days ago @ 2:32 PM
Points: 293, Visits: 500
Michael Valentine Jones (9/7/2012)
You haven't provided enough information for anyone to begin to answer that question.

You should just try each way to see which is faster.





The problem with that is.. before the second run I would need to make sure the data is flushed from memory... not sure how to do that. Otherwise the second run should always be faster since it does not have hard drive I/O.
Post #1356009
Posted Friday, September 07, 2012 8:38 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 5:08 AM
Points: 6,772, Visits: 12,875
dwilliscp (9/7/2012)
Michael Valentine Jones (9/7/2012)
You haven't provided enough information for anyone to begin to answer that question.

You should just try each way to see which is faster.





The problem with that is.. before the second run I would need to make sure the data is flushed from memory... not sure how to do that. Otherwise the second run should always be faster since it does not have hard drive I/O.


Then do several runs - A after A, A after B, B after A, B after B.
I've rarely seen an UPDATE to an intermediate table, as you describe, perform faster than a straight SELECT. It's most often seen when the developer has missed something.


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1356014
Posted Friday, September 07, 2012 1:20 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 12:17 PM
Points: 3,081, Visits: 11,231
dwilliscp (9/7/2012)
Michael Valentine Jones (9/7/2012)
You haven't provided enough information for anyone to begin to answer that question.

You should just try each way to see which is faster.





The problem with that is.. before the second run I would need to make sure the data is flushed from memory... not sure how to do that. Otherwise the second run should always be faster since it does not have hard drive I/O.


CHECKPOINT;
DBCC DROPCLEANBUFFERS;



Post #1356207
Posted Friday, September 07, 2012 2:08 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Yesterday @ 2:24 PM
Points: 185, Visits: 906
dwilliscp (9/7/2012)
To write new data using three outer joins or to use one outer join, and get most of the data, then use three updates to load the other three columns?
Usually the 1st approach is faster.
If any of these 3 columns have variable length (say varchar) then your update statement will cause a lot of page splits and therefore fragmentation.



Alex Suprun
Post #1356233
Posted Monday, September 10, 2012 8:04 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: 2 days ago @ 2:32 PM
Points: 293, Visits: 500
Alexander Suprun (9/7/2012)
dwilliscp (9/7/2012)
To write new data using three outer joins or to use one outer join, and get most of the data, then use three updates to load the other three columns?
Usually the 1st approach is faster.
If any of these 3 columns have variable length (say varchar) then your update statement will cause a lot of page splits and therefore fragmentation.


I did run some tests.. but the results were not steady.. there is a lot of activity on this box. The updates do tend to have less variation, thus it can take 5% longer or up to 20% less time.. depending on the test. Again due to the load on the box, I just do not put any faith in the tests.

All the fields from the join are 50 - 200 varchar. So that could explain why this had the widest variation of run times. So since we have I/O pressure.. on this box and the production one this will get released too, I am going to go with the updates.
Post #1356754
Posted Tuesday, September 11, 2012 2:06 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 5:08 AM
Points: 6,772, Visits: 12,875
dwilliscp (9/10/2012)
Alexander Suprun (9/7/2012)
dwilliscp (9/7/2012)
To write new data using three outer joins or to use one outer join, and get most of the data, then use three updates to load the other three columns?
Usually the 1st approach is faster.
If any of these 3 columns have variable length (say varchar) then your update statement will cause a lot of page splits and therefore fragmentation.


I did run some tests.. but the results were not steady.. there is a lot of activity on this box. The updates do tend to have less variation, thus it can take 5% longer or up to 20% less time.. depending on the test. Again due to the load on the box, I just do not put any faith in the tests.

All the fields from the join are 50 - 200 varchar. So that could explain why this had the widest variation of run times. So since we have I/O pressure.. on this box and the production one this will get released too, I am going to go with the updates.


Why not post the actual plan for both versions here?


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1357185
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse