SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Hash Match


Hash Match

Author
Message
Ford Fairlane
Ford Fairlane
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2184 Visits: 836
Interesting Question,

Hope this helps...

Ford Fairlane
Rock and Roll Detective





free_mascot
free_mascot
Hall of Fame
Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)

Group: General Forum Members
Points: 3947 Visits: 2235
Good One, thanks.

---------------------------------------------------
"Thare are only 10 types of people in the world:
Those who understand binary, and those who don't."
Stewart "Arturius" Campbell
Stewart "Arturius" Campbell
SSCrazy Eights
SSCrazy Eights (8.7K reputation)SSCrazy Eights (8.7K reputation)SSCrazy Eights (8.7K reputation)SSCrazy Eights (8.7K reputation)SSCrazy Eights (8.7K reputation)SSCrazy Eights (8.7K reputation)SSCrazy Eights (8.7K reputation)SSCrazy Eights (8.7K reputation)

Group: General Forum Members
Points: 8745 Visits: 7279
Interesting question, thanks Steve.

____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”
TomThomson
TomThomson
SSChampion
SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)SSChampion (14K reputation)

Group: General Forum Members
Points: 14170 Visits: 12197
PHYData DBA (5/1/2014)
TomThomson (5/1/2014)
PHYData DBA (5/1/2014)
TomThomson (5/1/2014)
gbritton1 (5/1/2014)
Before I answered this question, I read

[url=http://technet.microsoft.com/en-us/library/ms189313(v=sql.105).aspx][/url]

It reads, in part:

The hash join has two inputs: the build input and probe input. The query optimizer assigns these roles so that the smaller of the two inputs is the build input.


then later:

The hash join first scans or computes the entire build input and then builds a hash table in memory.


I'm having trouble reconciling the official description with the "correct" answer.

Without looking anything up, I answered smallest (ie build) input first. When I saw the answer, I constructed a test (using SQL Server 2012) to see what happened. Given one table with 65537 rows and another with 393222 rows (the sizes of teh test tables I built - admittedly not very big, but big enough to refute a silly "correct" answer) I found that whether I wrote the join so that the small table was the first or second referenced in the select clause and whether the small table was teh first or second listed in teh from clause, the actual (not estimated but actual) execution plan took the smaller table as first (ie build) table.

So I don't just have trouble reconciling the official correct answer with the documentation, I have clear and solid experimantal evidence that the officially correct answer is just plain wrong.


Tom - See my earlier post... there is no need for the answer to be wrong ( run test using HASH join hint) to match with your tests so far. When writing a hash join using the Hint you put the smaller to the right to make it the build table. Try you experiment with the Hash Join hint and see what happens.

Your experiment proves what the Query Optimizer does, but not how it is done.

Of course I would not know about any of this if I had not learned all of this by having to work with an Execution Plan that for some reason had chosen the Larger (2 million records) set as the Build list years ago.

I suppose it depends on what the answer means by "first". If it means the "the table which ir processed first in order to build the the hash table against which rows from teh other table will be tested" then " that's first" is a pointless tautology. It's not at all clear to me that any sane person can take it as meaning that. If on the other hand it means that the table used as build table can be any of the tables involved, regardless of size, which is the only sensible interpretation of the words given that (a) "is the one that is used first used first?" is not a sensible question and (b) the statement "Size does not matter for this operator" in the explanation it seems clear both that the BOL documentation quoted by gbritton1 indicates that the "correct" answer is wrong fpr SQL 2008R2 (not a terribly interesting argument, given how often BOL gets it wrong, but in fact - see below - BOL didn't go wrong here) and that the experiment I undertook after seeing this crazy answer proved conclusively that the "correct" answer is wrong for SQL 2012. It may of course be different for SQL 2014, but there are several releases currently in full support and we can see that it is wrong for the majority of them (I ran the experiment for 2008 R2 as well before composing this reply, and this is not one of the places where BOL got it wrong).
So we have to work out what Jason (author of the referenced document) meant by "first(top)" and "second(bottom)". Jason doesn't usually get things wrong. I think he was referring to the layout of the operators in the query plan in the standard query plan display provided by SSMS - certainly that's what I thought he meant when I read it a month or two ago. There's no imaginable way that today's question could be interpreted so that "first" mant that, so I think that Steve has misunderstood what Jason was saying and produced a question and answer based on that misunderstanding.


It seems you are once again ranting and/or trolling for something you are not going to find here.
Have fun with that.

I trust you enjoyed demonstrating that you too arrogant to be capable of rational and polite debate.

The only way to make this question seem correct is to assume that "first" in the answer means "occurring nearest to top left in the query plan displayed (as actual query plan) by SSMS". That is true even in the event of role reversal, since the data engine starts processing the input picked as build by the optimiser first and will not switch before a spill to disc, so the optimiser's chosen build input is handled before the acess to the other input (the actual build input) begins. If you don't understand that, I suggest you read the page again, carefully.

Tom

pchirags
pchirags
SSCrazy
SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)SSCrazy (2.3K reputation)

Group: General Forum Members
Points: 2339 Visits: 615
interesting question..
thanks Steve.
Hany Helmy
Hany Helmy
Hall of Fame
Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)

Group: General Forum Members
Points: 3007 Visits: 1117
PHYData DBA (5/1/2014)
See lots of discussion about why the right answer is...

Here is a reference to the right answer from TechNet BOL.
Looks like this answer to this question has not changed for at least 15 years...

http://technet.microsoft.com/en-us/library/aa237090(v=SQL.80).aspx

For any joins, use the first (top) input to build the hash table and the second (bottom) input to probe the hash table. Output matches (or nonmatches) as dictated by the join type. If multiple joins use the same join column, these operations are grouped into a hash team.
Cool

Thanx 4 the reference.
Hany Helmy
Hany Helmy
Hall of Fame
Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)

Group: General Forum Members
Points: 3007 Visits: 1117
Never used this one before, it`s always good to know something new here.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)SSC Guru (84K reputation)

Group: General Forum Members
Points: 84555 Visits: 41064
I think a much better question would be, "What determines what will be the first (top) table (according to books online) in a hash match join"? ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Steve Jones
Steve Jones
SSC Guru
SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)

Group: Administrators
Points: 61451 Visits: 19097
Jeff Moden (5/3/2014)
I think a much better question would be, "What determines what will be the first (top) table (according to books online) in a hash match join"? ;-)


There's no reliable way to determine this. It should be the smaller input most of the time, but sometimes the optimizer doesn't get that one and that usually means the DBA's phone rings with a complaint.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search