SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


The Right Kind Of Join III


The Right Kind Of Join III

Author
Message
Paul White
Paul White
SSC-Insane
SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)

Group: General Forum Members
Points: 20412 Visits: 11359
Comments posted to this topic are about the item The Right Kind of Join III


SELECT COUNT_BIG(*)
FROM @A
RIGHT JOIN @B ON
[@B].b = [@A].a
OPTION (LOOP JOIN)





Previous questions in this series:
The Right Kind of Join II
The Right Kind of Join I



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Attachments
QotD.png (228 views, 13.00 KB)
SQLRNNR
SQLRNNR
SSC-Forever
SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)SSC-Forever (40K reputation)

Group: General Forum Members
Points: 40925 Visits: 18565
Thanks again Paul. Excellent explanation.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw

Christian Buettner-167247
Christian Buettner-167247
SSCarpal Tunnel
SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)SSCarpal Tunnel (4.1K reputation)

Group: General Forum Members
Points: 4113 Visits: 3889
Very nice question series. By the way, you could make the query fail using
SET FORCEPLAN ON
as well.

Best Regards,

Chris Büttner
Toreador
Toreador
Hall of Fame
Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)

Group: General Forum Members
Points: 3140 Visits: 8088
As with the others in this series, I didn't have the foggiest idea, so wrongly guessed that it would be the answer not already used by the other 2 questions :-)
Paul White
Paul White
SSC-Insane
SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)

Group: General Forum Members
Points: 20412 Visits: 11359
Toreador (9/8/2011)
As with the others in this series, I didn't have the foggiest idea, so wrongly guessed that it would be the answer not already used by the other 2 questions :-)

Just for you then:


DECLARE @A TABLE (a INT NOT NULL)
DECLARE @B TABLE (b INT NOT NULL)

SELECT COUNT_BIG(*)
FROM @A
RIGHT JOIN @B ON
@B.b = @A.a
OPTION (LOOP JOIN)




Msg 137, Level 16, State 1, Line 7
Must declare the scalar variable "@B".
Msg 137, Level 16, State 1, Line 7
Must declare the scalar variable "@A".




Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
rfr.ferrari
rfr.ferrari
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1669 Visits: 13630
i agree with you!!!!

Great question and excellent explanation!!!!


rfr.ferrari
DBA - SQL Server 2008
MCITP | MCTS

remember is live or suffer twice!
the period you fastest growing is the most difficult period of your life!

Tom Thomson
Tom Thomson
SSCoach
SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)SSCoach (17K reputation)

Group: General Forum Members
Points: 17656 Visits: 12338
Good question and answer.

But the behaviour still seems bizarre to me. Why should the loop join hint be taken to imply the (Force Order) option - if I want that option, I can specify it, if I want a loop join in whatever order is best why should I have (FORCE ORDER) rammed down my throat if I don't specify it? I guess that's a question for Microsoft though, not a question for Paul.

Tom

Ninja's_RGR'us
Ninja's_RGR'us
SSC-Dedicated
SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)

Group: General Forum Members
Points: 37625 Visits: 9671
Tom.Thomson (9/8/2011)
Good question and answer.

But the behaviour still seems bizarre to me. Why should the loop join hint be taken to imply the (Force Order) option - if I want that option, I can specify it, if I want a loop join in whatever order is best why should I have (FORCE ORDER) rammed down my throat if I don't specify it? I guess that's a question for Microsoft though, not a question for Paul.


Somehow I'm sure he knows the answer anyways! :-D
Paul White
Paul White
SSC-Insane
SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)

Group: General Forum Members
Points: 20412 Visits: 11359
Tom.Thomson (9/8/2011)
Good question and answer.

But the behaviour still seems bizarre to me. Why should the loop join hint be taken to imply the (Force Order) option - if I want that option, I can specify it, if I want a loop join in whatever order is best why should I have (FORCE ORDER) rammed down my throat if I don't specify it? I guess that's a question for Microsoft though, not a question for Paul.

It is bizarre, and entirely counter-intuitive to most people (though it is at least documented these days). There is often considerable consternation when an attempt to enforce a particular physical join type between two tables results in a completely stupid plan because of the implied FORCE ORDER.

The query optimizer in SQL Server uses the Cascades framework - a cost-based top-down optimizer that works by matching patterns (logical tree fragments) using rules. There are nearly 400 of these rules - some of which implement logical operations as physical operators (e.g. implementing a relational JOIN using sort-merge), some of which generate logically-equivalent relational alternatives (e.g. pushing an aggregate below a join or considering a different join order), and still others that remove redundancies (e.g. removing empty tables or aggregates where the input is known to be unique).

The second factor is how to interpret A INNER LOOP JOIN B, or C RIGHT HASH JOIN D. Did the query writer intend to drive the query from table A in the first case, and to build the hash table on input C in the second case? In most cases where the query writer wishes to enforce a physical join type, this heuristic is a good one. To meet the writer's expectation, SQL Server needs to guarantee that it will produce a plan where A drives a nested loops join to B, and the hash join between C and D has C as the build input and D as the probe.

Now, in principle the optimizer could make that guarantee in one of two ways. First, it could somehow fix the join in place and try to optimize around it. That just isn't practical for an extensible general-purpose optimizer based on Cascades - all kinds of horribly specific and hard-to-maintain code would have to be written. Instead, SQL Server guarantees the physical join type (LOOP, HASH, MERGE, REMOTE) by setting a required physical property on the logical JOIN in the query tree. This allows other physical join type rules to match elsewhere in the final plan, but only the one specified physical implementation can match the required property on the join in question.

Meeting the input-order requirement is more difficult to generalize. One consideration is that the inputs to the join may be an arbitrarily complex tree rather than a simple table. Anyway, rather than deal with all the complexity, SQL Server uses a Big Hammer to enforce join order: all the simplification and transformation rules that could change the join order from that shown in the logical tree produced by the parser and algebrizer, are disabled. Not surprisingly, this can have all sorts of unexpected effects on the final plan, but it does guarantee the join order.

As an example, consider this join:


SELECT th.ProductID
FROM Production.Product AS p
JOIN Production.TransactionHistory AS th ON
th.ProductID = p.ProductID



That results in a simple scan of the history table, because there are constraints that guarantee that a single matching product record exists. If we now force a join type:


SELECT th.ProductID
FROM Production.Product AS p
INNER LOOP
JOIN Production.TransactionHistory AS th ON
th.ProductID = p.ProductID



The resulting plan contains a nested loops join as we specified, and the pointless join can no longer be eliminated (that simplification rule is switched off temporarily). To see that the effect is plan-wide, we can UNION ALL the query to itself, specifying the join hint in only one part:


SELECT th.ProductID
FROM Production.Product AS p
INNER LOOP
JOIN Production.TransactionHistory AS th ON
th.ProductID = p.ProductID
UNION ALL
SELECT th.ProductID
FROM Production.Product AS p
JOIN Production.TransactionHistory AS th ON
th.ProductID = p.ProductID



Now, neither query is simplified, even though there's logically no reason not to in the second case. A serial plan shows a loops join for the top query and a merge join for the lower one; a parallel plan chooses a hash join instead of merge.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
adb2303
adb2303
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1523 Visits: 3177
SQLkiwi (9/8/2011)
Tom.Thomson (9/8/2011)
Good question and answer.

But the behaviour still seems bizarre to me. Why should the loop join hint be taken to imply the (Force Order) option - if I want that option, I can specify it, if I want a loop join in whatever order is best why should I have (FORCE ORDER) rammed down my throat if I don't specify it? I guess that's a question for Microsoft though, not a question for Paul.

It is bizarre, and entirely counter-intuitive to most people (though it is at least documented these days). There is often considerable consternation when an attempt to enforce a particular physical join type between two tables results in a completely stupid plan because of the implied FORCE ORDER.

The query optimizer in SQL Server uses the Cascades framework - a cost-based top-down optimizer that works by matching patterns (logical tree fragments) using rules. There are nearly 400 of these rules - some of which implement logical operations as physical operators (e.g. implementing a relational JOIN using sort-merge), some of which generate logically-equivalent relational alternatives (e.g. pushing an aggregate below a join or considering a different join order), and still others that remove redundancies (e.g. removing empty tables or aggregates where the input is known to be unique).

The second factor is how to interpret A INNER LOOP JOIN B, or C RIGHT HASH JOIN D. Did the query writer intend to drive the query from table A in the first case, and to build the hash table on input C in the second case? In most cases where the query writer wishes to enforce a physical join type, this heuristic is a good one. To meet the writer's expectation, SQL Server needs to guarantee that it will produce a plan where A drives a nested loops join to B, and the hash join between C and D has C as the build input and D as the probe.

Now, in principle the optimizer could make that guarantee in one of two ways. First, it could somehow fix the join in place and try to optimize around it. That just isn't practical for an extensible general-purpose optimizer based on Cascades - all kinds of horribly specific and hard-to-maintain code would have to be written. Instead, SQL Server guarantees the physical join type (LOOP, HASH, MERGE, REMOTE) by setting a required physical property on the logical JOIN in the query tree. This allows other physical join type rules to match elsewhere in the final plan, but only the one specified physical implementation can match the required property on the join in question.

Meeting the input-order requirement is more difficult to generalize. One consideration is that the inputs to the join may be an arbitrarily complex tree rather than a simple table. Anyway, rather than deal with all the complexity, SQL Server uses a Big Hammer to enforce join order: all the simplification and transformation rules that could change the join order from that shown in the logical tree produced by the parser and algebrizer, are disabled. Not surprisingly, this can have all sorts of unexpected effects on the final plan, but it does guarantee the join order.

As an example, consider this join:


SELECT th.ProductID
FROM Production.Product AS p
JOIN Production.TransactionHistory AS th ON
th.ProductID = p.ProductID



That results in a simple scan of the history table, because there are constraints that guarantee that a single matching product record exists. If we now force a join type:


SELECT th.ProductID
FROM Production.Product AS p
INNER LOOP
JOIN Production.TransactionHistory AS th ON
th.ProductID = p.ProductID



The resulting plan contains a nested loops join as we specified, and the pointless join can no longer be eliminated (that simplification rule is switched off temporarily). To see that the effect is plan-wide, we can UNION ALL the query to itself, specifying the join hint in only one part:


SELECT th.ProductID
FROM Production.Product AS p
INNER LOOP
JOIN Production.TransactionHistory AS th ON
th.ProductID = p.ProductID
UNION ALL
SELECT th.ProductID
FROM Production.Product AS p
JOIN Production.TransactionHistory AS th ON
th.ProductID = p.ProductID



Now, neither query is simplified, even though there's logically no reason not to in the second case. A serial plan shows a loops join for the top query and a merge join for the lower one; a parallel plan chooses a hash join instead of merge.



Yeah, that's what I was going to say... ;-)
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search