JOIN HINTS

Question

JOIN HINTS

Uwe Ricken

Hall of Fame

Points: 3267
More actions
January 13, 2016 at 10:21 pm

#310805

Comments posted to this topic are about the item JOIN HINTS
Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

kaspencer SSCarpal Tunnel Points: 4915 More actions · Answer 1

At first I was a little confused by the use of the decimal point ( . ) for a thousands separator [should be ( , ) surely] as I thought "75 customers to three decimal places"! Then I saw the double decimal point in the million orders, and understood.

Who thinks that we should go for the International standard decimal point and thousands separator on the forum eh?

You never know: reading my book: "All about your computer" might just tell you something you never knew!
lulu.com/kaspencer

Carlo Romagnano SSC-Insane Points: 22929 More actions · Answer 2

Carlo Romagnano

SSC-Insane

Points: 22929

January 14, 2016 at 2:58 am

#1851843

Useful question!

Thanks!

Umer Akram SSC-Insane Points: 22209 More actions · Answer 3

Umer Akram

SSC-Insane

Points: 22209

January 14, 2016 at 3:00 am

#1851844

good question. thanks for sharing

serg-52 SSCrazy Eights Points: 9913 More actions · Answer 4

Thanks for good question.

There was a connect item to change this behavior of join hints

https://connect.microsoft.com/SQLServer/feedback/details/753790/allow-join-hints-without-force-order

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 5

A very good question and I am absolutely happy with it.

One thing that is missing is the explanation of WHY in this specific case enforcing the order impacts the performance. The reason for this is that the Merge Join operator has two modes of operating: one-to-many (which can be used for one-to-one relationships as well), and many-to-many. A many-to-one relationship will normally not occur because during compilation SQL Server will simply switch the order of the two inputs to get a one-to-many.

A Marge Join with one-to-many will use a single pass over both inputs, using basically the balanced line algorithm. For many-to-many, extra complexity has to be added - because operators can not step back, the Merge Join will use a worktable to store copies from the bottom input, and then re-process those copies if the next row from the top input has the same key value, or wipe the worktable and start storing copies again if the kay value in the top input changes. This introduces a huge amount of overhead.

In this case the relationship of the tables as they appear in the join is many-to-one. The plan shown in the question demonstrates that, without hint, the optimizer has swapped the inputs; if you have this as an actual plan instead of a picture you can bring up the properties of the Merge Join operator to verify that the "Many to Many" property is False.

If you then run the code with hint and look at the plan, you will see that the two inputs of the Merge Join are in the same order as in the query (not swapped), and the Merge Join operator will have its "Many to Many" property set to True.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Uwe Ricken Hall of Fame Points: 3267 More actions · Answer 6

Hallo Hugo,

thank you for this - EXCELLENT - explanation. To be honest; i wasn't at any time so deep into this operator and I really appreciate this deep and detailed explanation. Lots of new information for me, too 🙂

Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)

Ed Wagner SSC Guru Points: 287027 More actions · Answer 7

Great question to make me think, so thank you very much for it. And thank you, Hugo, for a good explanation. Execution plans tell the tale of how a query runs, but there are no many nuances to it that I know I don't have them all straight. In the end, I try to consider everything and test...test...test.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

George Vobr SSChampion Points: 10281 More actions · Answer 8

Thank you Uwe for this interesting question, very good explanation and the links.

I learned today something new.:-)

TomThomson SSC Guru Points: 104767 More actions · Answer 9

Hugo Kornelis (1/14/2016)
A very good question and I am absolutely happy with it.
One thing that is missing is the explanation of WHY in this specific case enforcing the order impacts the performance. The reason for this is that the Merge Join operator has two modes of operating: one-to-many (which can be used for one-to-one relationships as well), and many-to-many. A many-to-one relationship will normally not occur because during compilation SQL Server will simply switch the order of the two inputs to get a one-to-many.
A Marge Join with one-to-many will use a single pass over both inputs, using basically the balanced line algorithm. For many-to-many, extra complexity has to be added - because operators can not step back, the Merge Join will use a worktable to store copies from the bottom input, and then re-process those copies if the next row from the top input has the same key value, or wipe the worktable and start storing copies again if the kay value in the top input changes. This introduces a huge amount of overhead.
In this case the relationship of the tables as they appear in the join is many-to-one. The plan shown in the question demonstrates that, without hint, the optimizer has swapped the inputs; if you have this as an actual plan instead of a picture you can bring up the properties of the Merge Join operator to verify that the "Many to Many" property is False.
If you then run the code with hint and look at the plan, you will see that the two inputs of the Merge Join are in the same order as in the query (not swapped), and the Merge Join operator will have its "Many to Many" property set to True.

This is why it is far better to use (where possible) a Query Hint instead of a Join Hint (ie OPTION(MERGE JOIN) at the end of the query) when forcing a merge join since a query hint specifying join method won't force order. But if you want to force one join to merge and another in the same query to hash you probably can't do that with a query hint, only with inline join hints (and you can't get round it by having both a query level join hint and inline join hints in the same query, because the server will throw it out saying "Conflicting JOIN optimizer hints specified".

There is a connect item covering this. In August 2012 Microsoft said join hints didn't force order. In October 2012 they admitted inline join hints did do it, and said that this would be considered for a fix in the next major release. At some point the connect item was closed as "won't fix" without any further comment from microsoft.

Of course using a query hint instead of a join hint is a sometimes good (albeit not universally applicable) workaround, so maybe Microsoft aren't being quite as dense and cavalier this time as they often are with connect items.

edit: for anyone interested that connect item is here.

Tom

webrunner SSC-Dedicated Points: 32228 More actions · Answer 10

Thanks - very instructive question.

- webrunner

-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

MMartin1 One Orange Chip Points: 27881 More actions · Answer 11

Great question. It reminds me of the merge join transform in SSIS, the inputs have to be ordered in like fashion.

----------------------------------------------------

Koen Verbeeck SSC Guru Points: 259215 More actions · Answer 12

Hugo Kornelis (1/14/2016)
A very good question and I am absolutely happy with it.
One thing that is missing is the explanation of WHY in this specific case enforcing the order impacts the performance. The reason for this is that the Merge Join operator has two modes of operating: one-to-many (which can be used for one-to-one relationships as well), and many-to-many. A many-to-one relationship will normally not occur because during compilation SQL Server will simply switch the order of the two inputs to get a one-to-many.
A Marge Join with one-to-many will use a single pass over both inputs, using basically the balanced line algorithm. For many-to-many, extra complexity has to be added - because operators can not step back, the Merge Join will use a worktable to store copies from the bottom input, and then re-process those copies if the next row from the top input has the same key value, or wipe the worktable and start storing copies again if the kay value in the top input changes. This introduces a huge amount of overhead.
In this case the relationship of the tables as they appear in the join is many-to-one. The plan shown in the question demonstrates that, without hint, the optimizer has swapped the inputs; if you have this as an actual plan instead of a picture you can bring up the properties of the Merge Join operator to verify that the "Many to Many" property is False.
If you then run the code with hint and look at the plan, you will see that the two inputs of the Merge Join are in the same order as in the query (not swapped), and the Merge Join operator will have its "Many to Many" property set to True.

Thanks for the excellent explanation Hugo.

I was wondering why specifically a performance degradation would happen if the inputs were swapped.

At first sight I thought "what difference would it make?".

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Koen Verbeeck SSC Guru Points: 259215 More actions · Answer 13

MMartin1 (1/14/2016)
Great question. It reminds me of the merge join transform in SSIS, the inputs have to be ordered in like fashion.

It's not exactly the same. In SSIS, the data itself has to be sorted, but you can swap the inputs of you like.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Y.B. SSChampion Points: 11652 More actions · Answer 14

Hugo Kornelis (1/14/2016)
A very good question and I am absolutely happy with it.
One thing that is missing is the explanation of WHY in this specific case enforcing the order impacts the performance. The reason for this is that the Merge Join operator has two modes of operating: one-to-many (which can be used for one-to-one relationships as well), and many-to-many. A many-to-one relationship will normally not occur because during compilation SQL Server will simply switch the order of the two inputs to get a one-to-many.
A Marge Join with one-to-many will use a single pass over both inputs, using basically the balanced line algorithm. For many-to-many, extra complexity has to be added - because operators can not step back, the Merge Join will use a worktable to store copies from the bottom input, and then re-process those copies if the next row from the top input has the same key value, or wipe the worktable and start storing copies again if the kay value in the top input changes. This introduces a huge amount of overhead.
In this case the relationship of the tables as they appear in the join is many-to-one. The plan shown in the question demonstrates that, without hint, the optimizer has swapped the inputs; if you have this as an actual plan instead of a picture you can bring up the properties of the Merge Join operator to verify that the "Many to Many" property is False.
If you then run the code with hint and look at the plan, you will see that the two inputs of the Merge Join are in the same order as in the query (not swapped), and the Merge Join operator will have its "Many to Many" property set to True.

Absolutely appreciate the explanation! I wouldn't have expected the order to matter as in a Nested Loop or Hash Match.

SELECT quote FROM brain WHERE original = 1
0 rows returned