Performance of Joins over Updates

  • I was wondering if anyone knows which is faster, to build data for my reporting table..

    To write new data using three outer joins or to use one outer join, and get most of the data, then use three updates to load the other three columns?

    Granted there is not a lot of rows being loaded.. about 500K each night, 70MB of data. The linked tables are not large.. except for the material table that has several mil rows.

  • You haven't provided enough information for anyone to begin to answer that question.

    You should just try each way to see which is faster.

  • Michael Valentine Jones (9/7/2012)


    You haven't provided enough information for anyone to begin to answer that question.

    You should just try each way to see which is faster.

    The problem with that is.. before the second run I would need to make sure the data is flushed from memory... not sure how to do that. Otherwise the second run should always be faster since it does not have hard drive I/O.

  • dwilliscp (9/7/2012)


    Michael Valentine Jones (9/7/2012)


    You haven't provided enough information for anyone to begin to answer that question.

    You should just try each way to see which is faster.

    The problem with that is.. before the second run I would need to make sure the data is flushed from memory... not sure how to do that. Otherwise the second run should always be faster since it does not have hard drive I/O.

    Then do several runs - A after A, A after B, B after A, B after B.

    I've rarely seen an UPDATE to an intermediate table, as you describe, perform faster than a straight SELECT. It's most often seen when the developer has missed something.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • dwilliscp (9/7/2012)


    Michael Valentine Jones (9/7/2012)


    You haven't provided enough information for anyone to begin to answer that question.

    You should just try each way to see which is faster.

    The problem with that is.. before the second run I would need to make sure the data is flushed from memory... not sure how to do that. Otherwise the second run should always be faster since it does not have hard drive I/O.

    CHECKPOINT;

    DBCC DROPCLEANBUFFERS;

  • dwilliscp (9/7/2012)


    To write new data using three outer joins or to use one outer join, and get most of the data, then use three updates to load the other three columns?

    Usually the 1st approach is faster.

    If any of these 3 columns have variable length (say varchar) then your update statement will cause a lot of page splits and therefore fragmentation.


    Alex Suprun

  • Alexander Suprun (9/7/2012)


    dwilliscp (9/7/2012)


    To write new data using three outer joins or to use one outer join, and get most of the data, then use three updates to load the other three columns?

    Usually the 1st approach is faster.

    If any of these 3 columns have variable length (say varchar) then your update statement will cause a lot of page splits and therefore fragmentation.

    I did run some tests.. but the results were not steady.. there is a lot of activity on this box. The updates do tend to have less variation, thus it can take 5% longer or up to 20% less time.. depending on the test. Again due to the load on the box, I just do not put any faith in the tests.

    All the fields from the join are 50 - 200 varchar. So that could explain why this had the widest variation of run times. So since we have I/O pressure.. on this box and the production one this will get released too, I am going to go with the updates.

  • dwilliscp (9/10/2012)


    Alexander Suprun (9/7/2012)


    dwilliscp (9/7/2012)


    To write new data using three outer joins or to use one outer join, and get most of the data, then use three updates to load the other three columns?

    Usually the 1st approach is faster.

    If any of these 3 columns have variable length (say varchar) then your update statement will cause a lot of page splits and therefore fragmentation.

    I did run some tests.. but the results were not steady.. there is a lot of activity on this box. The updates do tend to have less variation, thus it can take 5% longer or up to 20% less time.. depending on the test. Again due to the load on the box, I just do not put any faith in the tests.

    All the fields from the join are 50 - 200 varchar. So that could explain why this had the widest variation of run times. So since we have I/O pressure.. on this box and the production one this will get released too, I am going to go with the updates.

    Why not post the actual plan for both versions here?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply