JOIN HINTS

  • Comments posted to this topic are about the item JOIN HINTS

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • This was removed by the editor as SPAM

  • At first I was a little confused by the use of the decimal point ( . ) for a thousands separator [should be ( , ) surely] as I thought "75 customers to three decimal places"! Then I saw the double decimal point in the million orders, and understood.

    Who thinks that we should go for the International standard decimal point and thousands separator on the forum eh?

    You never know: reading my book: "All about your computer" might just tell you something you never knew!
    lulu.com/kaspencer

  • Useful question!

    Thanks!

  • good question. thanks for sharing

  • Thanks for good question.

    There was a connect item to change this behavior of join hints

    https://connect.microsoft.com/SQLServer/feedback/details/753790/allow-join-hints-without-force-order

  • A very good question and I am absolutely happy with it.

    One thing that is missing is the explanation of WHY in this specific case enforcing the order impacts the performance. The reason for this is that the Merge Join operator has two modes of operating: one-to-many (which can be used for one-to-one relationships as well), and many-to-many. A many-to-one relationship will normally not occur because during compilation SQL Server will simply switch the order of the two inputs to get a one-to-many.

    A Marge Join with one-to-many will use a single pass over both inputs, using basically the balanced line algorithm. For many-to-many, extra complexity has to be added - because operators can not step back, the Merge Join will use a worktable to store copies from the bottom input, and then re-process those copies if the next row from the top input has the same key value, or wipe the worktable and start storing copies again if the kay value in the top input changes. This introduces a huge amount of overhead.

    In this case the relationship of the tables as they appear in the join is many-to-one. The plan shown in the question demonstrates that, without hint, the optimizer has swapped the inputs; if you have this as an actual plan instead of a picture you can bring up the properties of the Merge Join operator to verify that the "Many to Many" property is False.

    If you then run the code with hint and look at the plan, you will see that the two inputs of the Merge Join are in the same order as in the query (not swapped), and the Merge Join operator will have its "Many to Many" property set to True.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hallo Hugo,

    thank you for this - EXCELLENT - explanation. To be honest; i wasn't at any time so deep into this operator and I really appreciate this deep and detailed explanation. Lots of new information for me, too 🙂

    Microsoft Certified Master: SQL Server 2008
    MVP - Data Platform (2013 - ...)
    my blog: http://www.sqlmaster.de (german only!)

  • Great question to make me think, so thank you very much for it. And thank you, Hugo, for a good explanation. Execution plans tell the tale of how a query runs, but there are no many nuances to it that I know I don't have them all straight. In the end, I try to consider everything and test...test...test.

  • Thank you Uwe for this interesting question, very good explanation and the links.

    I learned today something new.:-)

  • Hugo Kornelis (1/14/2016)


    A very good question and I am absolutely happy with it.

    One thing that is missing is the explanation of WHY in this specific case enforcing the order impacts the performance. The reason for this is that the Merge Join operator has two modes of operating: one-to-many (which can be used for one-to-one relationships as well), and many-to-many. A many-to-one relationship will normally not occur because during compilation SQL Server will simply switch the order of the two inputs to get a one-to-many.

    A Marge Join with one-to-many will use a single pass over both inputs, using basically the balanced line algorithm. For many-to-many, extra complexity has to be added - because operators can not step back, the Merge Join will use a worktable to store copies from the bottom input, and then re-process those copies if the next row from the top input has the same key value, or wipe the worktable and start storing copies again if the kay value in the top input changes. This introduces a huge amount of overhead.

    In this case the relationship of the tables as they appear in the join is many-to-one. The plan shown in the question demonstrates that, without hint, the optimizer has swapped the inputs; if you have this as an actual plan instead of a picture you can bring up the properties of the Merge Join operator to verify that the "Many to Many" property is False.

    If you then run the code with hint and look at the plan, you will see that the two inputs of the Merge Join are in the same order as in the query (not swapped), and the Merge Join operator will have its "Many to Many" property set to True.

    This is why it is far better to use (where possible) a Query Hint instead of a Join Hint (ie OPTION(MERGE JOIN) at the end of the query) when forcing a merge join since a query hint specifying join method won't force order. But if you want to force one join to merge and another in the same query to hash you probably can't do that with a query hint, only with inline join hints (and you can't get round it by having both a query level join hint and inline join hints in the same query, because the server will throw it out saying "Conflicting JOIN optimizer hints specified".

    There is a connect item covering this. In August 2012 Microsoft said join hints didn't force order. In October 2012 they admitted inline join hints did do it, and said that this would be considered for a fix in the next major release. At some point the connect item was closed as "won't fix" without any further comment from microsoft.

    Of course using a query hint instead of a join hint is a sometimes good (albeit not universally applicable) workaround, so maybe Microsoft aren't being quite as dense and cavalier this time as they often are with connect items.

    edit: for anyone interested that connect item is here.

    Tom

  • Thanks - very instructive question.

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Great question. It reminds me of the merge join transform in SSIS, the inputs have to be ordered in like fashion.

    ----------------------------------------------------

  • Hugo Kornelis (1/14/2016)


    A very good question and I am absolutely happy with it.

    One thing that is missing is the explanation of WHY in this specific case enforcing the order impacts the performance. The reason for this is that the Merge Join operator has two modes of operating: one-to-many (which can be used for one-to-one relationships as well), and many-to-many. A many-to-one relationship will normally not occur because during compilation SQL Server will simply switch the order of the two inputs to get a one-to-many.

    A Marge Join with one-to-many will use a single pass over both inputs, using basically the balanced line algorithm. For many-to-many, extra complexity has to be added - because operators can not step back, the Merge Join will use a worktable to store copies from the bottom input, and then re-process those copies if the next row from the top input has the same key value, or wipe the worktable and start storing copies again if the kay value in the top input changes. This introduces a huge amount of overhead.

    In this case the relationship of the tables as they appear in the join is many-to-one. The plan shown in the question demonstrates that, without hint, the optimizer has swapped the inputs; if you have this as an actual plan instead of a picture you can bring up the properties of the Merge Join operator to verify that the "Many to Many" property is False.

    If you then run the code with hint and look at the plan, you will see that the two inputs of the Merge Join are in the same order as in the query (not swapped), and the Merge Join operator will have its "Many to Many" property set to True.

    Thanks for the excellent explanation Hugo.

    I was wondering why specifically a performance degradation would happen if the inputs were swapped.

    At first sight I thought "what difference would it make?".

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • MMartin1 (1/14/2016)


    Great question. It reminds me of the merge join transform in SSIS, the inputs have to be ordered in like fashion.

    It's not exactly the same. In SSIS, the data itself has to be sorted, but you can swap the inputs of you like.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply